Retrieval-augmented generation example that answers questions from Arxiv abstracts and titles.
- Copy
secrets-example.jsonand replace with your own key. - Fetch
arxiv-metadata-oai-snapshot.jsonkaggle datasets download -d Cornell-University/arxiv
- Run
preprocess_dataset.py- Input file:
arxiv-metadata-oai-snapshot.json - Output file:
documents.json(a bit smaller)
- Input file:
docker compose up -dto run MeiliSearch and Qdrant- Then
ingest_to_meilisearch.pyingest_to_qdrant.py- You'll want a GPU 😁, use
nvitopto check it's using GPU. - Example performance: g5.xlarge (1x A10G), ~600k abstracts, ~12 minutes
- You'll want a GPU 😁, use
- Finally
query.pyto ask some questions.
- You can connect to a nice server to test Meilisearch keyword lookup on
http://localhost:8080/ cli.pycould be useful but at the moment only exposesmeilisearch_indexandmeilisearch_client
