Skip to content

Latest commit

 

History

History
 
 

models

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

3D Visual Grounding Models

These are 3D visual grounding models adapted for the mmscan-devkit. Currently, two models have been released: EmbodiedScan and ScanRefer.

ScanRefer

  1. Follow the ScanRefer to setup the environment. For data preparation, you need not load the datasets, only need to download the preprocessed GLoVE embeddings (~990MB) and put them under data/

  2. Install MMScan API.

  3. Overwrite the lib/config.py/CONF.PATH.OUTPUT to your desired output directory.

  4. Run the following command to train ScanRefer (one GPU):

    python -u scripts/train.py --use_color --epoch {10/25/50}
  5. Run the following command to evaluate ScanRefer (one GPU):

    python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth"

Results and Models

Epoch gTop-1 @ 0.25 gTop-1 @0.50 Config Download
50 4.74 2.52 config model | log

EmbodiedScan

  1. Follow the EmbodiedScan to setup the environment. Download the Multi-View 3D Detection model's weights and change the "load_from" path in the config file under configs/grounding to the path where the weights are saved.

  2. Install MMScan API.

  3. Run the following command to train EmbodiedScan (multiple GPUs):

    # Single GPU training
    python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save
    
    # Multiple GPUs training
    python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save --launcher="pytorch"
  4. Run the following command to evaluate EmbodiedScan (multiple GPUs):

    # Single GPU testing
    python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth
    
    # Multiple GPUs testing
    python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth --launcher="pytorch"

Results and Models

Input Modality Det Pretrain Epoch gTop-1 @ 0.25 gTop-1 @ 0.50 Config Download
Point Cloud 12 19.66 8.82 config model | log

3D Question Answering Models

These are 3D question answering models adapted for the mmscan-devkit. Currently, two models have been released: LL3DA and LEO.

LL3DA

  1. Follow the LL3DA to setup the environment. For data preparation, you need not load the datasets, only need to:

    (1) download the release pre-trained weights. and put them under ./pretrained

    (2) Download the pre-processed BERT embedding weights and store them under the ./bert-base-embedding folder

  2. Install MMScan API.

  3. Edit the config under ./scripts/opt-1.3b/eval.mmscanqa.sh and ./scripts/opt-1.3b/tuning.mmscanqa.sh

  4. Run the following command to train LL3DA (4 GPUs):

    bash scripts/opt-1.3b/tuning.mmscanqa.sh
  5. Run the following command to evaluate LL3DA (4 GPUs):

    bash scripts/opt-1.3b/eval.mmscanqa.sh

    Optinal: You can use the GPT evaluator by this after getting the result. 'qa_pred_gt_val.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.

    python eval_utils/evaluate_gpt.py --file path/to/qa_pred_gt_val.json
    --tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
    --nproc 4

Results and Models

Detector Captioner Iters Overall GPT Score Download
Vote2Cap-DETR LL3DA 100k 45.7 model | log

LEO

  1. Follow the LEO to setup the environment. For data preparation, you need not load the datasets, only need to:

    (1) Download Vicuna-7B and update cfg_path in configs/llm/*.yaml

    (2) Download the sft_noact.pth and store it under the ./weights folder

  2. Install MMScan API.

  3. Edit the config under scripts/train_tuning_mmscan.sh and scripts/test_tuning_mmscan.sh

  4. Run the following command to train LEO (4 GPUs):

    bash scripts/train_tuning_mmscan.sh
  5. Run the following command to evaluate LEO (4 GPUs):

    bash scripts/test_tuning_mmscan.sh

    Optinal: You can use the GPT evaluator by this after getting the result. 'test_embodied_scan_l_complete.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.

    python evaluator/GPT_eval.py --file path/to/test_embodied_scan_l_complete.json
    --tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
    --nproc 4

ckpts & Logs

LLM 2D Backbone 3D Backbone Epoch Overall GPT Score Config Download
Vicuna7b ConvNeXt PointNet++ 1 54.6 config model