Evaluation

From Evaluate a RAG application:

Transclude of Evaluate-a-RAG-application#^be63e8

From Evaluation Metrics For Information Retrieval:

  • Order-unaware:
    • Precision: amount of results that are relevant given the query.
    • Recall: amount of relevant results retrieved of all relevant results.
  • Order aware

Multilanguage

For dense vectors we don’t have to do anything (foundational models have already been trained on terms from different languages appearing together, so the embeddings are already capturing that). For sparse vectors (TF-IDF) there are some approaches: [2209.14281] Multilingual Search with Subword TF-IDF already implemented GitHub - artitw/text2text: Text2Text Language Modeling Toolkit.