Custom DataSet

In order to train on your own dataset, you should do the following three steps:

Data Preparation

Data Structure

To be simple, you should reorganize your dataset as follow.

├── your_dataset
│   ├── images
│   │   ├── training
│   │   │   ├── xxx{img_suffix}
│   │   │   ├── yyy{img_suffix}
│   │   │   ├── zzz{img_suffix}
│   │   ├── validation
│   ├── annotations
│   │   ├── training
│   │   │   ├── xxx{seg_map_suffix}
│   │   │   ├── yyy{seg_map_suffix}
│   │   │   ├── zzz{seg_map_suffix}
│   │   ├── validation

Images and labels are stored separately, and are part into training and testing set. The above four directory path will be specific in the script, in a relative way to the dataset root path, so you can rename them as you like.

Annotations Format

Only support for gray-scale image now, you should always transform your image_label into gray-scale image first if needed.

The pixel intensity means the class index, and you can set a ignore_index which doesn't attend the computation of metric.

Write A New Script

copy a existed dataset script in src/datasets, and replace the origin dataset name to your dataset name.
change the num_classes default parameter.
override the init function to specific some parameters especially self.file_list, which is a list of Image/Label path correspondence
- ADE20K scans all files in the image_dir, and replace img_suffix to seg_map_suffix in the filename.
- Cityscapes scans files with the given suffix, and get the correspondence in a sorted way.
- You can refer to the two method above to create your dataset Image/Label correspondence. Both method need to modify the img_suffix and seg_map_suffix in the correct place.

4. In the init.py, add your own dataset into if-elif structure in the get_dataset function.

Create Yaml Config

copy a existed yaml config
change the following parameters in Data :DATASET, DATA_PATH，NUM_CLASSES
change parameters as you need. To do this, you'd better have a good understanding about the meaning of parameters in config.

Now you can use the command to train on your own dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!