TFIDF Retriever¶
The TFIDF Retriever is an agent that constructs a TF-IDF matrix for all entries in a given task. It generates responses via returning the highest-scoring documents for a query. It uses a SQLite database for storing the sparse tfidf matrix, adapted from here.
Basic Examples¶
Construct a TFIDF matrix for use in retrieval for the personachat task
parlai train_model -m tfidf_retriever -t personachat -mf /tmp/personachat_tfidf -dt train:ordered -eps 1
After construction, load and evaluate that model on the Persona-Chat test set.
parlai eval_model -t personachat -mf /tmp/personachat_tfidf -dt test
Alternatively, interact with a Wikipedia-based TFIDF model from the model zoo
parlai interactive -mf zoo:wikipedia_full/tfidf_retriever/model
TfidfRetrieverAgent Options¶
Retriever Arguments
Argument |
Description |
---|---|
|
Number of CPU processes (for tokenizing, etc) |
|
Use up to N-size n-grams (e.g. 2 = unigrams + bigrams) |
|
Number of buckets to use for hashing ngrams |
|
String option specifying tokenizer type to use. |
|
How many docs to retrieve. |
|
Whether to remove the title from the retrieved passage |
|
Whether to retrieve the stored key or the stored value. For example, if you want to return the text of an example, use keys here; if you want to return the label, use values here. |
|
Whether to index into database by doc id as an integer. This defaults to true for DBs built using ParlAI. |
|
Number of past utterances to remember when building flattened batches of data in multi-example episodes. |
|
Specifies whether or not to include labels as past utterances when building flattened batches of data in multi-example episodes. |