These are 3D visual grounding models adapted for the mmscan-devkit. Currently, two models have been released: EmbodiedScan and ScanRefer.
-
Follow the ScanRefer to setup the environment. For data preparation, you need not load the datasets, only need to download the preprocessed GLoVE embeddings (~990MB) and put them under
data/
-
Install MMScan API.
-
Overwrite the
lib/config.py/CONF.PATH.OUTPUT
to your desired output directory. -
Run the following command to train ScanRefer (one GPU):
python -u scripts/train.py --use_color --epoch {10/25/50}
-
Run the following command to evaluate ScanRefer (one GPU):
python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth"
Epoch | gTop-1 @ 0.25 | gTop-1 @0.50 | Config | Download |
---|---|---|---|---|
50 | 4.74 | 2.52 | config | model | log |
-
Follow the EmbodiedScan to setup the environment. Download the Multi-View 3D Detection model's weights and change the "load_from" path in the config file under
configs/grounding
to the path where the weights are saved. -
Install MMScan API.
-
Run the following command to train EmbodiedScan (multiple GPUs):
# Single GPU training python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save # Multiple GPUs training python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save --launcher="pytorch"
-
Run the following command to evaluate EmbodiedScan (multiple GPUs):
# Single GPU testing python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth # Multiple GPUs testing python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth --launcher="pytorch"
Input Modality | Det Pretrain | Epoch | gTop-1 @ 0.25 | gTop-1 @ 0.50 | Config | Download |
---|---|---|---|---|---|---|
Point Cloud | ✔ | 12 | 19.66 | 8.82 | config | model | log |
These are 3D question answering models adapted for the mmscan-devkit. Currently, two models have been released: LL3DA and LEO.
-
Follow the LL3DA to setup the environment. For data preparation, you need not load the datasets, only need to:
(1) download the release pre-trained weights. and put them under
./pretrained
(2) Download the pre-processed BERT embedding weights and store them under the
./bert-base-embedding
folder -
Install MMScan API.
-
Edit the config under
./scripts/opt-1.3b/eval.mmscanqa.sh
and./scripts/opt-1.3b/tuning.mmscanqa.sh
-
Run the following command to train LL3DA (4 GPUs):
bash scripts/opt-1.3b/tuning.mmscanqa.sh
-
Run the following command to evaluate LL3DA (4 GPUs):
bash scripts/opt-1.3b/eval.mmscanqa.sh
Optinal: You can use the GPT evaluator by this after getting the result. 'qa_pred_gt_val.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
python eval_utils/evaluate_gpt.py --file path/to/qa_pred_gt_val.json --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4
Detector | Captioner | Iters | Overall GPT Score | Download |
---|---|---|---|---|
Vote2Cap-DETR | LL3DA | 100k | 45.7 | model | log |
-
Follow the LEO to setup the environment. For data preparation, you need not load the datasets, only need to:
(1) Download Vicuna-7B and update cfg_path in configs/llm/*.yaml
(2) Download the sft_noact.pth and store it under the
./weights
folder -
Install MMScan API.
-
Edit the config under
scripts/train_tuning_mmscan.sh
andscripts/test_tuning_mmscan.sh
-
Run the following command to train LEO (4 GPUs):
bash scripts/train_tuning_mmscan.sh
-
Run the following command to evaluate LEO (4 GPUs):
bash scripts/test_tuning_mmscan.sh
Optinal: You can use the GPT evaluator by this after getting the result. 'test_embodied_scan_l_complete.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
python evaluator/GPT_eval.py --file path/to/test_embodied_scan_l_complete.json --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4
LLM | 2D Backbone | 3D Backbone | Epoch | Overall GPT Score | Config | Download |
---|---|---|---|---|---|---|
Vicuna7b | ConvNeXt | PointNet++ | 1 | 54.6 | config | model |