Tasks¶
List of ParlAI tasks defined in the file task_list.py.
All tasks
ChitChat tasks
Cloze tasks
Debug tasks
Dodeca tasks
Entailment tasks
Goal tasks
Grounded tasks
LIGHT tasks
MT tasks
Math tasks
MovieDD tasks
MultiPartyConvo tasks
NLI tasks
Negotiation tasks
Personalization tasks
QA tasks
Reasoning tasks
TOD tasks
Visual tasks
all tasks
common ground tasks
decanlp tasks
engaging tasks
improv tasks
improve tasks
open-ended tasks
All Tasks¶
ChitChat Tasks¶
Blended Skill Talk¶
Usage: --task blended_skill_talk
Links: code
A dataset of 7k conversations explicitly designed to exhibit multiple conversation modes: displaying personality, having empathy, and demonstrating knowledge.
Cmu Document Grounded Conversations¶
Usage: --task cmu_dog
A document grounded dataset for text conversations, where the documents are Wikipedia articles about popular movies. Consists of 4112 conversations with an average of 21.43 turns per conversation.
Cornell Movie¶
Usage: --task cornell_movie
Fictional conversations extracted from raw movie scripts.
Dialogue Nli¶
Usage: --task dialogue_nli
Dialogue NLI is a dataset that addresses the issue of consistency in dialogue models.
Dstc7 Subtrack 1 - Ubuntu¶
Usage: --task dstc7
DSTC7 is a competition which provided a dataset of dialogs very similar to the ubuntu dataset. In particular, the subtrack 1 consists in predicting the next utterance.
Movie Dialog Reddit¶
Usage: --task moviedialog:Task:4
Dialogs discussing Movies from Reddit (the Movies SubReddit).
Open Subtitles¶
Usage: --task opensubtitles
Links: version 2018 website, version 2009 website, related work (arXiv), code
Dataset of dialogs from movie scripts.
Ubuntu¶
Usage: --task ubuntu
Dialogs between an Ubuntu user and an expert trying to fix issue, we use the V2 version, which cleaned the data to some extent.
Convai2¶
Usage: --task convai2
A chit-chat dataset based on PersonaChat for a NIPS 2018 competition.
Convai Chitchat¶
Usage: --task convai_chitchat
Human-bot dialogues containing free discussions of randomly chosen paragraphs from SQuAD.
Persona-Chat¶
Usage: --task personachat
A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.
Taskmaster-1-2019¶
Usage: --task taskmaster
A chit-chat dataset by GoogleAI providing high quality goal-oriented conversationsThe dataset hopes to provoke interest in written vs spoken languageBoth the datasets consists of two-person dialogs:Spoken: Created using Wizard of Oz methodology. Written: Created by crowdsourced workers who were asked to write the full conversation themselves playing roles of both the user and assistant.
Msr End-To-End¶
Usage: --task msr_e2e
MSR-E2E is a dataset of human-human conversations in which one human plays the role of an Agent and the other one plays the roleof a User. Data is collected from Amazon Mechanical Turk.
Twitter¶
Usage: --task twitter
Twitter data found on GitHub. No train/valid/test split was provided so 10k for valid and 10k for test was chosen at random.
Convai2 Wild Evaluation¶
Usage: --task convai2_wild_evaluation
Dataset collected during the wild evaluation of ConvaAI2 participants bots. 60% train, 20% valid and 20% test is chosen at random from the whole dataset.
Image Chat¶
Usage: --task image_chat
Links: website, website2, code
202k dialogues and 401k utterances over 202k images from the YFCC100m dataset using 215 possible personality traits
:::{admonition,note} Notes
If you have already downloaded the images, please specify with the --yfcc-path
flag, as the image download script takes a very long time to run
:::
Image Chat Generation¶
Usage: --task image_chat:Generation
Links: code
Image Chat task to train generative model
Wizard Of Wikipedia¶
Usage: --task wizard_of_wikipedia
A dataset with conversations directly grounded with knowledge retrieved from Wikipedia. Contains 201k utterances from 22k dialogues spanning over 1300 diverse topics, split into train, test, and valid sets. The test and valid sets are split into two sets each: one with overlapping topics with the train set, and one with unseen topics.
:::{admonition,note} Notes
To access the different valid/test splits (unseen/seen), specify the corresponding split (random_split
for seen, topic_split
for unseen) after the last colon in the task. E.g. wizard_of_wikipedia:WizardDialogKnowledgeTeacher:random_split
:::
Wizard Of Wikipedia Generator¶
Usage: --task wizard_of_wikipedia:Generator
Links: code
Wizard of Wikipedia task to train generative models
Daily Dialog¶
Usage: --task dailydialog
A dataset of chitchat dialogues with strong annotations for topic, emotion and utterance act. This version contains both sides of every conversation, and uses the official train/valid/test splits from the original authors.
Empathetic Dialogues¶
Usage: --task empathetic_dialogues
A dataset of 25k conversations grounded in emotional situations to facilitate training and evaluating dialogue systems. Dataset has been released under the CC BY-NC license. :::{admonition,note} Notes EmpatheticDialoguesTeacher returns examples like so:
[text]: context line (previous utterance by ‘speaker’)
[labels]: label line (current utterance by ‘listener’)
with additional task specific fields:
[situation]: a 1-3 sentence description of the situation that the conversation is
[emotion]: one of 32 emotion words
Other optional fields:
[prepend_ctx]: fasttext prediction on context line - or None
[prepend_cand]: fasttext prediction on label line (candidate) - or None
[deepmoji_ctx]: vector encoding from deepmoji penultimate layer - or None
[deepmoji_cand]: vector encoding from deepmoji penultimate layer for label line (candidate) - or None :::
Image Grounded Conversations¶
Usage: --task igc
A dataset of (image, context, question, answer) tuples, comprised of eventful images taken from Bing, Flickr, and COCO.
Holl-E¶
Usage: --task holl_e
Sequence of utterances and responses with background knowledge aboutmovies. From the Holl-E dataset.
Redial¶
Usage: --task redial
Annotated dataset of dialogues where users recommend movies to each other.
Style-Controlled Generation¶
Usage: --task style_gen
Links: code
Dialogue datasets (BlendedSkillTalk, ConvAI2, EmpatheticDialogues, and Wizard of Wikipedia) labeled with personalities taken from the Image-Chat dataset. Used for the style-controlled generation project
Dialogue Contradiction Detection (Decode)¶
Usage: --task decode
Task for detect whether the last utterance contradicts previous dialogue history.
Wizard Of Internet¶
Usage: --task wizard_of_internet
Links: code
A dataset with conversations directly grounded with knowledge retrieved from internet. One of the participants has access to internet search. The other side has an assigned persona that provides the topic of the conversation. Contains 93.7k utterances from 9.6k conversations, split into train, test, and valid sets.
Multisessionchat¶
Usage: --task msc
Links: code
A multi-session human-human chit-chat dataset consist of session 2-5 follow up from PersonaChat It contains 5k full converesations from session 2 to session 5 (session 1 being PersonaChat)
Xpersona¶
Usage: --task xpersona
XPersona is an extension of ConvAI2 with six more languages: Chinese, French, Indonesian, Italian, Korean, and Japanese.
Cloze Tasks¶
Booktest¶
Usage: --task booktest
Sentence completion given a few sentences as context from a book. A larger version of CBT.
Children’S Book Test (Cbt)¶
Usage: --task cbt
Sentence completion given a few sentences as context from a children’s book.
Qa Cnn¶
Usage: --task qacnn
Cloze dataset based on a missing (anonymized) entity phrase from a CNN article
Debug Tasks¶
Dodeca Tasks¶
Cornell Movie¶
Usage: --task cornell_movie
Fictional conversations extracted from raw movie scripts.
Light-Dialogue¶
Usage: --task light_dialog
LIGHT is a text adventure game with actions and dialogue collected. The source data is collected between crowdworkers playing the game.
Ubuntu¶
Usage: --task ubuntu
Dialogs between an Ubuntu user and an expert trying to fix issue, we use the V2 version, which cleaned the data to some extent.
Convai2¶
Usage: --task convai2
A chit-chat dataset based on PersonaChat for a NIPS 2018 competition.
Twitter¶
Usage: --task twitter
Twitter data found on GitHub. No train/valid/test split was provided so 10k for valid and 10k for test was chosen at random.
Image Chat Generation¶
Usage: --task image_chat:Generation
Links: code
Image Chat task to train generative model
Wizard Of Wikipedia Generator¶
Usage: --task wizard_of_wikipedia:Generator
Links: code
Wizard of Wikipedia task to train generative models
Daily Dialog¶
Usage: --task dailydialog
A dataset of chitchat dialogues with strong annotations for topic, emotion and utterance act. This version contains both sides of every conversation, and uses the official train/valid/test splits from the original authors.
Empathetic Dialogues¶
Usage: --task empathetic_dialogues
A dataset of 25k conversations grounded in emotional situations to facilitate training and evaluating dialogue systems. Dataset has been released under the CC BY-NC license. :::{admonition,note} Notes EmpatheticDialoguesTeacher returns examples like so:
[text]: context line (previous utterance by ‘speaker’)
[labels]: label line (current utterance by ‘listener’)
with additional task specific fields:
[situation]: a 1-3 sentence description of the situation that the conversation is
[emotion]: one of 32 emotion words
Other optional fields:
[prepend_ctx]: fasttext prediction on context line - or None
[prepend_cand]: fasttext prediction on label line (candidate) - or None
[deepmoji_ctx]: vector encoding from deepmoji penultimate layer - or None
[deepmoji_cand]: vector encoding from deepmoji penultimate layer for label line (candidate) - or None :::
Entailment Tasks¶
Multinli¶
Usage: --task multinli
A dataset designed for use in the development and evaluation of machine learning models for sentence understanding. Each example contains a premise and hypothesis. Model has to predict whether premise and hypothesis entail, contradict or are neutral to each other.
The Stanford Natural Language Inference (Snli) Corpus¶
Usage: --task snli
The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE)
Adversarial Natural Language Inference (Anli) Corpus¶
Usage: --task anli
The ANLI corpus (version 1.0) is a new large-scale NLI benchmark dataset,collected via an iterative, adversarial human-and-model-in-the-loop procedurewith the labels entailment, contradiction, and neutral. A total of three rounds of data are collected that progressively increase in difficulty and complexity.
Natural Language Inference (Nli) Corpus¶
Usage: --task nli
Links: code
A collection of 3 popular Natural Language Inference(NLI) benchmark tasks: ANLI v0.1, MultiNLI 1.0, SNLI 1.0.
Dialogue Contradiction Detection (Decode)¶
Usage: --task decode
Task for detect whether the last utterance contradicts previous dialogue history.
Goal Tasks¶
Coached Conversational Preference Elicitation¶
Usage: --task ccpe
A dataset consisting of 502 dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an ‘assistant’, while the other plays the role of a ‘user’.
Dialog Based Language Learning: Babi Task¶
Usage: --task dbll_babi
Short dialogs based on the bAbI tasks, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve. :::{admonition,note} Notes Tasks can be accessed with a format like: ‘parlai display_data -t dbll_babi:task:2_p0.5’ which specifies task 2, and policy with 0.5 answers correct, see the paper for more details of the tasks. :::
Dialog Based Language Learning: Wikimovies Task¶
Usage: --task dbll_movie
Short dialogs based on WikiMovies, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve.
Dialog Babi+¶
Usage: --task dialog_babi_plus
bAbI+ is an extension of the bAbI Task 1 dialogues with everyday incremental dialogue phenomena (hesitations, restarts, and corrections) which model the disfluencies and communication problems in everyday spoken interaction in real-world environments.
Mutualfriends¶
Usage: --task mutualfriends
Task where two agents must discover which friend of theirs is mutual based on the friends’s attributes.
Movie Dialog Qa Recommendations¶
Usage: --task moviedialog:Task:3
Dialogs discussing questions about movies as well as recommendations.
Personalized Dialog Full Set¶
Usage: --task personalized_dialog:AllFull
Simulated dataset of restaurant booking focused on personalization based on user profiles.
Personalized Dialog Small Set¶
Usage: --task personalized_dialog:AllSmall
Simulated dataset of restaurant booking focused on personalization based on user profiles.
Task N’ Talk¶
Usage: --task taskntalk
Dataset of synthetic shapes described by attributes, for agents to play a cooperative QA game.
Scan¶
Usage: --task scan
SCAN is a set of simple language-driven navigation tasks for studying compositional learning and zero-shot generalization. The SCAN tasks were inspired by the CommAI environment, which is the origin of the acronym (Simplified versions of the CommAI Navigation tasks).
Multiwoz 2.0¶
Usage: --task multiwoz_v20
A fully labeled collection of human-written conversations spanningover multiple domains and topics.
Multiwoz 2.1¶
Usage: --task multiwoz_v21
A fully labeled collection of human-written conversations spanningover multiple domains and topics.
Multiwoz 2.2¶
Usage: --task multiwoz_v22
A fully labeled collection of human-written conversations spanningover multiple domains and topics. Schemas are included.
Onecommon¶
Usage: --task onecommon
A collaborative referring task which requires advanced skills of common grounding under continuous and partially-observable context. This code also includes reference-resolution annotation.
Airdialogue¶
Usage: --task airdialogue
Task for goal-oriented dialogue using airplane booking conversations between agents and customers.
Redial¶
Usage: --task redial
Annotated dataset of dialogues where users recommend movies to each other.
Googlesgd¶
Usage: --task google_sgd
Links: code
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant.
Googlesgd Simulation Splits¶
Usage: --task google_sgd_simulation_splits
Links: code
Custom processing of the Google SGD dataset into In-Domain and Out-of-Domain splits for the use of zero and few-shotting with other task-oriented data.
Taskmaster2¶
Usage: --task taskmaster2
Links: code
The second version of TaskMaster, containing Wizard-of-Oz dialogues for task oriented dialogue in 7 domains.
Tickettalk (Taskmaster3)¶
Usage: --task taskmaster3
Links: code
Taskmaster3 is a dataset of movie ticket dialogues collected in a self-chat manner. To induce conversationalvariety, crowd workers were asked to generate conversations given dozens of different instructions ofdifferent level of specificity, some purposefully including conversational errors.
Grounded Tasks¶
Cmu Document Grounded Conversations¶
Usage: --task cmu_dog
A document grounded dataset for text conversations, where the documents are Wikipedia articles about popular movies. Consists of 4112 conversations with an average of 21.43 turns per conversation.
Light-Dialogue¶
Usage: --task light_dialog
LIGHT is a text adventure game with actions and dialogue collected. The source data is collected between crowdworkers playing the game.
LIGHT Tasks¶
MT Tasks¶
Math Tasks¶
Asdiv¶
Usage: --task asdiv
A diverse corpus for evaluating and developing English math Wword problem solvers.
MovieDD Tasks¶
Movie Dialog Qa¶
Usage: --task moviedialog:Task:1
Closed-domain QA dataset asking templated questions about movies, answerable from Wikipedia, similar to WikiMovies.
Movie Dialog Qa Recommendations¶
Usage: --task moviedialog:Task:3
Dialogs discussing questions about movies as well as recommendations.
Movie Dialog Recommendations¶
Usage: --task moviedialog:Task:2
Questions asking for movie recommendations.
MultiPartyConvo Tasks¶
NLI Tasks¶
Dialogue Nli¶
Usage: --task dialogue_nli
Dialogue NLI is a dataset that addresses the issue of consistency in dialogue models.
Adversarial Natural Language Inference (Anli) Corpus¶
Usage: --task anli
The ANLI corpus (version 1.0) is a new large-scale NLI benchmark dataset,collected via an iterative, adversarial human-and-model-in-the-loop procedurewith the labels entailment, contradiction, and neutral. A total of three rounds of data are collected that progressively increase in difficulty and complexity.
Negotiation Tasks¶
Deal Or No Deal¶
Usage: --task dealnodeal
End-to-end negotiation task which requires two agents to agree on how to divide a set of items, with each agent assigning different values to each item.
Personalization Tasks¶
Personalized Dialog Full Set¶
Usage: --task personalized_dialog:AllFull
Simulated dataset of restaurant booking focused on personalization based on user profiles.
QA Tasks¶
Amazonqa¶
Usage: --task amazon_qa
This dataset contains Question and Answer data from Amazon, totaling around 1.4 million answered questions.
Aqua¶
Usage: --task aqua
Dataset containing algebraic word problems with rationales for their answers.
Babi 1K¶
Usage: --task babi:All1k
20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models. :::{admonition,note} Notes You can access just one of the bAbI tasks with e.g. ‘babi:Task1k:3’ for task 3. :::
Babi 10K¶
Usage: --task babi:All10k
20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models. :::{admonition,note} Notes You can access just one of the bAbI tasks with e.g. ‘babi:Task10k:3’ for task 3. :::
Conversational Question Answering Challenge¶
Usage: --task coqa
CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. CoQA is pronounced as coca.
Hotpotqa¶
Usage: --task hotpotqa
HotpotQA is a dataset for multi-hop question answering. The overall setting is that given some context paragraphs(e.g., a few paragraphs, or the entire Web) and a question,a QA system answers the question by extracting a span of textfrom the context. It is necessary to perform multi-hop reasoningto correctly answer the question.
Movie Dialog Qa¶
Usage: --task moviedialog:Task:1
Closed-domain QA dataset asking templated questions about movies, answerable from Wikipedia, similar to WikiMovies.
Movie Dialog Recommendations¶
Usage: --task moviedialog:Task:2
Questions asking for movie recommendations.
Mturk Wikimovies¶
Usage: --task mturkwikimovies
Closed-domain QA dataset asking MTurk-derived questions about movies, answerable from Wikipedia.
Narrativeqa¶
Usage: --task narrative_qa
A dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. :::{admonition,note} Notes You can access summaries only task for NarrativeQA by using task ‘narrative_qa:summaries’. By default, only stories are provided. :::
Natural Questions¶
Usage: --task natural_questions
An open domain question answering dataset. Each example contains real questions that people searched for in Google and the content of the a Wikipedia article that was amongst the top 5 search results for that query, and its annotations. The annotations have the options of a long answer that is seleced from span of major content entities in the Wikipedia article (e.g., paragraphs, tables), a short answer that is selected from one or more short span of words in the article, or ‘yes/no’. The existence of any of these answer formats depends on whether the main question can be answered, given the article; if not they are left empty. :::{admonition,note} Notes Since this task uses ChunkTeacher, it should be used with streaming. :::
Question Answering In Context¶
Usage: --task quac
Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.
Squad2¶
Usage: --task squad2
Open-domain QA dataset answerable from a given paragraph from Wikipedia.
Wikimovies¶
Usage: --task wikimovies
Closed-domain QA dataset asking templated questions about movies, answerable from Wikipedia.
Insuranceqa¶
Usage: --task insuranceqa
Task which requires agents to identify high quality answers composed by professionals with deep domain knowledge.
Ms Marco¶
Usage: --task ms_marco
A large scale Machine Reading Comprehension Dataset with questions sampled from real anonymized user queries and contexts from web documents.
Qangaroo¶
Usage: --task qangaroo
Reading Comprehension with Multiple Hop. Including two datasets: WIKIHOP built on on wikipedia, MEDHOP built on paper abstracts from PubMed.
Eli5¶
Usage: --task eli5
This dataset contains Question and Answer data from Reddit explainlikeimfive posts and comments.
Dream¶
Usage: --task dream
A multiple-choice answering dataset based on multi-turn, multi-party dialogue.
Commonsenseqa¶
Usage: --task commonsenseqa
CommonSenseQA is a multiple-choice Q-A dataset that relies on commonsense knowlegde to predict correct answers.
Reasoning Tasks¶
Choice Of Plausible Alternatives¶
Usage: --task copa
The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally into development and test sets of 500 questions each.
Entailmentbank¶
Usage: --task entailment_bank
2k multi-step entailment trees, explaining the answers to ARC science questions.
Asdiv¶
Usage: --task asdiv
A diverse corpus for evaluating and developing English math Wword problem solvers.
TOD Tasks¶
Visual Tasks¶
Fvqa¶
Usage: --task fvqa
The FVQA, a VQA dataset which requires, and supports, much deeper reasoning. We extend a conventional visual question answering dataset, which contains image-question-answer triplets, through additional image-question-answer-supporting fact tuples. The supporting fact is represented as a structural triplet, such as <Cat,CapableOf,ClimbingTrees>.
Visdial¶
Usage: --task visdial
Task which requires agents to hold a meaningful dialog about visual content.
Mnist Qa¶
Usage: --task mnist_qa
Links: code
Task which requires agents to identify which number they are seeing. From the MNIST dataset.
Clevr¶
Usage: --task clevr
A visual reasoning dataset that tests abilities such as attribute identification, counting, comparison, spatial relationships, and logical operations.
Nlvr¶
Usage: --task nlvr
Cornell Natural Language Visual Reasoning (NLVR) is a language grounding dataset based on pairs of natural language statements grounded in synthetic images.
Flickr30K¶
Usage: --task flickr30k
Links: website, paper1, paper2, code
30k captioned images pulled from Flickr compiled by UIUC.
Image Chat¶
Usage: --task image_chat
Links: website, website2, code
202k dialogues and 401k utterances over 202k images from the YFCC100m dataset using 215 possible personality traits
:::{admonition,note} Notes
If you have already downloaded the images, please specify with the --yfcc-path
flag, as the image download script takes a very long time to run
:::
all Tasks¶
Spolin¶
Usage: --task spolin
Conversation pairs from the SPOLIN dataset. The pairs abide by the Yes-and principle ofimprovisational theatre (improv).
common ground Tasks¶
decanlp Tasks¶
Multinli¶
Usage: --task multinli
A dataset designed for use in the development and evaluation of machine learning models for sentence understanding. Each example contains a premise and hypothesis. Model has to predict whether premise and hypothesis entail, contradict or are neutral to each other.
Iwslt14¶
Usage: --task iwslt14
2014 International Workshop on Spoken Language task, currently only includes en_de and de_en.
Convai Chitchat¶
Usage: --task convai_chitchat
Human-bot dialogues containing free discussions of randomly chosen paragraphs from SQuAD.
Sst Sentiment Analysis¶
Usage: --task sst
Links: website, website2, code
Dataset containing sentiment trees of movie reviews. We use the modified binary sentence analysis subtask given by the DecaNLP paper here.
Cnn/Dm Summarisation¶
Usage: --task cnn_dm
Dataset collected from CNN and the Daily Mail with summaries as labels, Implemented as part of the DecaNLP task.
Qa-Srl Semantic Role Labeling¶
Usage: --task qasrl
QA dataset implemented as part of the DecaNLP task.
Qa-Zre Relation Extraction¶
Usage: --task qazre
Zero Shot relation extraction task implemented as part of the DecaNLP task.
Woz Restuarant Reservation (Goal-Oriented Dialogue)¶
Usage: --task woz
Dataset containing dialogues dengotiating a resturant reservation. Implemented as part of the DecaNLP task, focused on the change in the dialogue state.
Wikisql Semantic Parsing Task¶
Usage: --task wikisql
Dataset for parsing sentences to SQL code, given a table. Implemented as part of the DecaNLP task.
engaging Tasks¶
Spolin¶
Usage: --task spolin
Conversation pairs from the SPOLIN dataset. The pairs abide by the Yes-and principle ofimprovisational theatre (improv).
improv Tasks¶
improve Tasks¶
open-ended Tasks¶
Spolin¶
Usage: --task spolin
Conversation pairs from the SPOLIN dataset. The pairs abide by the Yes-and principle ofimprovisational theatre (improv).
Uncategorized Tasks¶
Bot Adversarial Dialogue¶
Usage: --task bot_adversarial_dialogue
Datasets described in the paper Recipes for Safety in Open-domain Chatbots. Datasets consist of classification tasks in which the goal is to determine if the utterance is offensive or not given a dialogue context.
Safety Mix¶
Usage: --task safety_mix
Links: code
Datasets described in the paper: Learning from data in the mixed adversarial non-adversarial case:Finding the helpers and ignoring the trolls. Datasets based on Bot Adversarial Dialogue and consist of a mixture of different troll users.Artificial noise is introduced to the dataset given the troll user type.
Glue¶
Usage: --task glue
Links: website, website2, code
GLUE, the General Language Understanding Evaluation benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
Prosocial Dialog¶
Usage: --task prosocial_dialog
Prosocial Dialog dataset of 58K dialogues between a speaker showing potentially unsafe behavior and a speaker giving constructive feedback for more socially acceptable behavior.
Reframe Unhelpful Thoughts¶
Usage: --task reframe_thoughts
Links: code
Dataset of about 10k examples of thoughts containing unhelpful thought patterns conditioned on a given persona, accompanied by about 27k positive reframes.
Superglue¶
Usage: --task superglue
Links: website, website2, code
SuperGLUE (https://super.gluebenchmark.com/) is a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard.
Dialogue Qe¶
Usage: --task dialogue_qe
Links: code
Human-bot dialogues labelled for quality at the level of dialogues. Can be used to train dialogue-level metric for dialogue systems.
Wikipedia¶
Usage: --task wikipedia
Links: code
Dump of Wikipedia articles from 2/3/18 :::{admonition,note} Notes Specify ‘:full’ for the full articles to be returned, otherwise defaults to ‘:summary’, which provides the first paragraphs. To put the article in the labels and the title in the text, specify ‘:key-value’ at the end (for a title/content key-value association) :::
Decanlp: The Natural Language Decathlon¶
Usage: --task decanlp
A collection of 10 tasks (SQuAD, IWSLT, CNN/DM, MNLI, SST, QA‑SRL,QA‑ZRE, WOZ, WikiSQL and MWSC) designed to challenge a model with a range of different tasks. Note that we use IWSLT 2014 instead of 2016/2013test/2014test for train/dev/test as given in the DecaNLP paper.
Dialogue Safety¶
Usage: --task dialogue_safety
Several datasets described in the paper Built it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack. All datasets are classification tasks in which the goal is to determine if the text is offensive or ‘safe’.
Selfchat¶
Usage: --task self_chat
Links: code
Not a dataset, but a generic world for model self-chats.
Funpedia¶
Usage: --task funpedia
Links: code
Task for rephrasing sentences from Wikipedia conditioned on a persona.
Genderationbiascontroltask¶
Usage: --task genderation_bias:controllable_task
Links: code
A teacher that wraps other ParlAI tasks and appends control tokens to the text field indicating the presence of gender words in the label(s).
Sensitive Topics Evaluation Topics Valid Teacher¶
Usage: --task sensitive_topics_evaluation
Task for evaluating a classifier trained to identify conversational messages on the following sensitive topics: Politics, Drugs, Medical Advice, Religion, Relationships & Dating / NSFW.