EmbodiedScan/models at mmscan · OpenRobotLab/EmbodiedScan

History

Name		Name	Last commit message	Last commit date
parent directory ..
EmbodiedScan		EmbodiedScan
LEO		LEO
LL3DA		LL3DA
Scanrefer		Scanrefer
README.md		README.md

README.md

3D Visual Grounding Models

These are 3D visual grounding models adapted for the mmscan-devkit. Currently, two models have been released: EmbodiedScan and ScanRefer.

ScanRefer

Follow the ScanRefer to setup the environment. For data preparation, you need not load the datasets, only need to download the preprocessed GLoVE embeddings (~990MB) and put them under data/
Install MMScan API.
Overwrite the lib/config.py/CONF.PATH.OUTPUT to your desired output directory.

Run the following command to train ScanRefer (one GPU):

python -u scripts/train.py --use_color --epoch {10/25/50}

Run the following command to evaluate ScanRefer (one GPU):

python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth"

Results and Models

Epoch	gTop-1 @ 0.25	gTop-1 @0.50	Config	Download
50	4.74	2.52	config	model \| log

EmbodiedScan

Follow the EmbodiedScan to setup the environment. Download the Multi-View 3D Detection model's weights and change the "load_from" path in the config file under configs/grounding to the path where the weights are saved.
Install MMScan API.

Run the following command to train EmbodiedScan (multiple GPUs):

# Single GPU training
python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save

# Multiple GPUs training
python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save --launcher="pytorch"

Run the following command to evaluate EmbodiedScan (multiple GPUs):

# Single GPU testing
python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth

# Multiple GPUs testing
python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth --launcher="pytorch"

Results and Models

Input Modality	Det Pretrain	Epoch	gTop-1 @ 0.25	gTop-1 @ 0.50	Config	Download
Point Cloud	✔	12	19.66	8.82	config	model \| log

3D Question Answering Models

These are 3D question answering models adapted for the mmscan-devkit. Currently, two models have been released: LL3DA and LEO.

LL3DA

Follow the LL3DA to setup the environment. For data preparation, you need not load the datasets, only need to:

(1) download the release pre-trained weights. and put them under ./pretrained

(2) Download the pre-processed BERT embedding weights and store them under the ./bert-base-embedding folder
Install MMScan API.
Edit the config under ./scripts/opt-1.3b/eval.mmscanqa.sh and ./scripts/opt-1.3b/tuning.mmscanqa.sh
Run the following command to train LL3DA (4 GPUs):
```
bash scripts/opt-1.3b/tuning.mmscanqa.sh
```
Run the following command to evaluate LL3DA (4 GPUs):
```
bash scripts/opt-1.3b/eval.mmscanqa.sh
```
Optinal: You can use the GPT evaluator by this after getting the result. 'qa_pred_gt_val.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
```
python eval_utils/evaluate_gpt.py --file path/to/qa_pred_gt_val.json
--tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
--nproc 4
```

Results and Models

Detector	Captioner	Iters	Overall GPT Score	Download
Vote2Cap-DETR	LL3DA	100k	45.7	model \| log

LEO

Follow the LEO to setup the environment. For data preparation, you need not load the datasets, only need to:

(1) Download Vicuna-7B and update cfg_path in configs/llm/*.yaml

(2) Download the sft_noact.pth and store it under the ./weights folder
Install MMScan API.
Edit the config under scripts/train_tuning_mmscan.sh and scripts/test_tuning_mmscan.sh
Run the following command to train LEO (4 GPUs):
```
bash scripts/train_tuning_mmscan.sh
```
Run the following command to evaluate LEO (4 GPUs):
```
bash scripts/test_tuning_mmscan.sh
```
Optinal: You can use the GPT evaluator by this after getting the result. 'test_embodied_scan_l_complete.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
```
python evaluator/GPT_eval.py --file path/to/test_embodied_scan_l_complete.json
--tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
--nproc 4
```

ckpts & Logs

LLM	2D Backbone	3D Backbone	Epoch	Overall GPT Score	Config	Download
Vicuna7b	ConvNeXt	PointNet++	1	54.6	config	model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

README.md

3D Visual Grounding Models

ScanRefer

Results and Models

EmbodiedScan

Results and Models

3D Question Answering Models

LL3DA

Results and Models

LEO

ckpts & Logs

Files

models

Directory actions

More options

Directory actions

More options

Latest commit

History

models

Folders and files

parent directory

README.md

3D Visual Grounding Models

ScanRefer

Results and Models

EmbodiedScan

Results and Models

3D Question Answering Models

LL3DA

Results and Models

LEO

ckpts & Logs