Name		Name	Last commit message	Last commit date
parent directory ..
inference		inference
README.md		README.md
calculate_score.py		calculate_score.py
inference.sh		inference.sh
model_inference.sh		model_inference.sh

README.md

Evaluation Guidelines

We provide detailed instructions for evaluation. To execute our evaluation script, please ensure that the structure of your model outputs is the same as ours.

Model Inference

Download our dataset from huggingface.

Clone the official repo of open-sourced models into the following folder:

LLaVAv1, v1.6 [repo]
LLaVA-Med [repo]
MiniGPTv2 [repo]
CheXagent [repo]
BiomedGPT [repo]
Med-Flamingo [repo]

Set up the environment for each open-sourced model as instructed by their original repo and run inference. For API-based models: GPT-4o, GPT-4V, and Gemini Pro set up your API key in the provided scripts under the /inference folder.

For the open-source models, we also provide our inference scripts for your reference. To utilize those, move the inference scripts under the /inference folder to the corresponding folders you clone from the original repos by referring to the path in model_inference.sh.

After setting up, run inference.sh to get model outputs on the question files.

Get Evaluation results and scores

After getting the output, run calculate_score.py to get scores for all models.

Your folder structure should look like this:

.   
project-root
├── LLaVA
│   └── ...
├── LLaVA-Med
│   └── ...
└── ...
│
├── probmed.json
├── response_file
│   └── llava_v1.json
│   └── llavamed.json
│   └── xxx.json
├── ablation
│   └── ablation.json
│   └── llava_v1.jsonl
│   └── llavamed.jsonl
│   └── xxx.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval

eval

README.md

Evaluation Guidelines

Model Inference

Get Evaluation results and scores

Files

eval

Directory actions

More options

Directory actions

More options

Latest commit

History

eval

Folders and files

parent directory

README.md

Evaluation Guidelines

Model Inference

Get Evaluation results and scores