Skip to content

Latest commit

ย 

History

History
506 lines (385 loc) ยท 27.5 KB

torchvision_tutorial.rst

File metadata and controls

506 lines (385 loc) ยท 27.5 KB

TorchVision ๊ฐ์ฒด ๊ฒ€์ถœ ๋ฏธ์„ธ์กฐ์ •(Finetuning) ํŠœํ† ๋ฆฌ์–ผ

Tip

์ด ํŠœํ† ๋ฆฌ์–ผ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜์‹œ๋ ค๋ฉด, ๋‹ค์Œ์˜ ๋งํฌ๋ฅผ ์ด์šฉํ•˜์‹œ๊ธธ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. Colab ๋ฒ„์ „. ์ด๋ฅผ ํ†ตํ•ด ์•„๋ž˜์— ์ œ์‹œ๋œ ์ •๋ณด๋กœ ์‹คํ—˜์„ ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Penn-Fudan Database for Pedestrian Detection and Segmentation ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ Mask R-CNN ๋ชจ๋ธ์„ ๋ฏธ์„ธ์กฐ์ • ํ•ด ๋ณผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” ๋ณดํ–‰์ž ์ธ์Šคํ„ด์Šค(instance, ์—ญ์ž์ฃผ: ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์‚ฌ๋žŒ์˜ ์œ„์น˜ ์ขŒํ‘œ์™€ ํ”ฝ์…€ ๋‹จ์œ„์˜ ์‚ฌ๋žŒ ์—ฌ๋ถ€๋ฅผ ๊ตฌ๋ถ„ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.) 345๋ช…์ด ์žˆ๋Š” 170๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์šฐ๋ฆฌ๋Š” ์ด ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹์— ์ธ์Šคํ„ด์Šค ๋ถ„ํ• (Instance Segmentation) ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด torchvision์˜ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช… ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ์…‹ ์ •์˜ํ•˜๊ธฐ

๊ฐ์ฒด ๊ฒ€์ถœ, ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋ฐ ์‚ฌ์šฉ์ž ํ‚คํฌ์ธํŠธ(Keypoint) ๊ฒ€์ถœ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ์ฐธ์กฐ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹ ์ถ”๊ฐ€๋ฅผ ์‰ฝ๊ฒŒ ์ง„ํ–‰ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ ํ‘œ์ค€ torch.utils.data.Dataset ํด๋ž˜์Šค๋ฅผ ์ƒ์† ๋ฐ›์•„์•ผ ํ•˜๋ฉฐ, __len__ ์™€ __getitem__ ๋ฉ”์†Œ๋“œ๋ฅผ ๊ตฌํ˜„ํ•ด ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•„์š”ํ•œ ์œ ์ผํ•œ ํŠน์„ฑ์€ __getitem__ ๋ฉ”์†Œ๋“œ๊ฐ€ ๋‹ค์Œ์„ ๋ฐ˜ํ™˜ ํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:

  • ์ด๋ฏธ์ง€ : PIL(Python Image Library) ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ (H, W)
  • ๋Œ€์ƒ: ๋‹ค์Œ์˜ ํ•„๋“œ๋ฅผ ํฌํ•จํ•˜๋Š” ์‚ฌ์ „ ํƒ€์ž…
    • boxes (FloatTensor[N, 4]): N ๊ฐœ์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค(Bounding box)์˜ ์ขŒํ‘œ๋ฅผ [x0, y0, x1, y1] ํ˜•ํƒœ๋กœ ๊ฐ€์ง‘๋‹ˆ๋‹ค. x์™€ ๊ด€๋ จ๋œ ๊ฐ’ ๋ฒ”์œ„๋Š” 0 ๋ถ€ํ„ฐ W ์ด๊ณ  y์™€ ๊ด€๋ จ๋œ ๊ฐ’์˜ ๋ฒ”์œ„๋Š” 0 ๋ถ€ํ„ฐ H ๊นŒ์ง€์ž…๋‹ˆ๋‹ค.
    • labels (Int64Tensor[N]): ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋งˆ๋‹ค์˜ ๋ผ๋ฒจ ์ •๋ณด์ž…๋‹ˆ๋‹ค. 0 ์€ ํ•ญ์ƒ ๋ฐฐ๊ฒฝ์˜ ํด๋ž˜์Šค๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
    • image_id (Int64Tensor[1]): ์ด๋ฏธ์ง€ ๊ตฌ๋ถ„์ž์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์˜ ๋ชจ๋“  ์ด๋ฏธ์ง€ ๊ฐ„์— ๊ณ ์œ ํ•œ ๊ฐ’์ด์–ด์•ผ ํ•˜๋ฉฐ ํ‰๊ฐ€ ์ค‘์—๋„ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • area (Tensor[N]): ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ๋ฉด์ ์ž…๋‹ˆ๋‹ค. ๋ฉด์ ์€ ํ‰๊ฐ€ ์‹œ ์ž‘์Œ,์ค‘๊ฐ„,ํฐ ๋ฐ•์Šค ๊ฐ„์˜ ์ ์ˆ˜๋ฅผ ๋‚ด๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€์ด๋ฉฐ COCO ํ‰๊ฐ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
    • iscrowd (UInt8Tensor[N]): ์ด ๊ฐ’์ด ์ฐธ์ผ ๊ฒฝ์šฐ ํ‰๊ฐ€์—์„œ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค.
    • (์„ ํƒ์ ) masks (UInt8Tensor[N, H, W]): N ๊ฐœ์˜ ๊ฐ์ฒด ๋งˆ๋‹ค์˜ ๋ถ„ํ•  ๋งˆ์Šคํฌ ์ •๋ณด์ž…๋‹ˆ๋‹ค.
    • (์„ ํƒ์ ) keypoints (FloatTensor[N, K, 3]): N ๊ฐœ์˜ ๊ฐ์ฒด๋งˆ๋‹ค์˜ ํ‚คํฌ์ธํŠธ ์ •๋ณด์ž…๋‹ˆ๋‹ค. ํ‚คํฌ์ธํŠธ๋Š” [x, y, visibility] ํ˜•ํƒœ์˜ ๊ฐ’์ž…๋‹ˆ๋‹ค. visibility ๊ฐ’์ด 0์ธ ๊ฒฝ์šฐ ํ‚คํฌ์ธํŠธ๋Š” ๋ณด์ด์ง€ ์•Š์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data augmentation)์˜ ๊ฒฝ์šฐ ํ‚คํฌ์ธํŠธ ์ขŒ์šฐ ๋ฐ˜์ „์˜ ๊ฐœ๋…์€ ๋ฐ์ดํ„ฐ ํ‘œํ˜„์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋ฉฐ, ์ƒˆ๋กœ์šด ํ‚คํฌ์ธํŠธ ํ‘œํ˜„์— ๋Œ€ํ•ด "references/detection/transforms.py" ์ฝ”๋“œ ๋ถ€๋ถ„์„ ์ˆ˜์ • ํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์ด ์œ„์˜ ๋ฐฉ๋ฒ•๋Œ€๋กœ ๋ฆฌํ„ด์„ ํ•˜๋ฉด, ํ•™์Šต๊ณผ ํ‰๊ฐ€ ๋‘˜ ๋‹ค์— ๋Œ€ํ•ด์„œ ๋™์ž‘์„ ํ•  ๊ฒƒ์ด๋ฉฐ ํ‰๊ฐ€ ์Šคํฌ๋ฆฝํŠธ๋Š” pip install pycocotools` ๋กœ ์„ค์น˜ ๊ฐ€๋Šฅํ•œ pycocotools ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Note

์œˆ๋„์šฐ์ฆˆ์—์„œ๋Š” pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ pycocotools ๋ฅผ gautamchitnis ๋กœ๋ถ€ํ„ฐ ๊ฐ€์ ธ์™€ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

labels ์— ๋Œ€ํ•œ ์ฐธ๊ณ ์‚ฌํ•ญ. ์ด ๋ชจ๋ธ์€ ํด๋ž˜์Šค 0 ์„ ๋ฐฐ๊ฒฝ์œผ๋กœ ์ทจ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ค€๋น„ํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋ฐฐ๊ฒฝ์˜ ํด๋ž˜์Šค๊ฐ€ ์—†๋‹ค๋ฉด, labels ์—๋„ 0 ์ด ์—†์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณ ์–‘์ด ์™€ ๊ฐ•์•„์ง€ ์˜ ์˜ค์ง 2๊ฐœ์˜ ํด๋ž˜์Šค๋งŒ ๋ถ„๋ฅ˜ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด, (0 ์ด ์•„๋‹Œ) 1 ์ด ๊ณ ์–‘์ด ๋ฅผ, 2 ๊ฐ€ ๊ฐ•์•„์ง€ ๋ฅผ ๋‚˜ํƒ€๋‚ด๋„๋ก ์ •์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด ์˜ˆ์‹œ์—์„œ, ์–ด๋–ค ์ด๋ฏธ์ง€์— ๋‘ ๊ฐœ์˜ ํด๋ž˜์Šค๋ฅผ ๋ชจ๋‘ ์žˆ๋‹ค๋ฉด, labels ํ…์„œ๋Š” [1,2] ์™€ ๊ฐ™์€ ์‹์ด ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ถ”๊ฐ€๋กœ, ํ•™์Šต ์ค‘์— ๊ฐ€๋กœ ์„ธ๋กœ ๋น„์œจ ๊ทธ๋ฃนํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ(๊ฐ ๋ฐฐ์น˜์— ์œ ์‚ฌํ•œ ๊ฐ€๋กœ ์„ธ๋กœ ๋น„์œจ์ด ์žˆ๋Š” ์˜์ƒ๋งŒ ํฌํ•จ๋˜๋„๋ก), ์ด๋ฏธ์ง€์˜ ๋„“์ด, ๋†’์ด๋ฅผ ๋ฆฌํ„ดํ•  ์ˆ˜ ์žˆ๋„๋ก get_height_and_width ๋ฉ”์†Œ๋“œ๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ๋ฅผ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฉ”์†Œ๋“œ๊ฐ€ ๊ตฌํ˜„๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์€ __getitem__ ๋ฅผ ํ†ตํ•ด ๋ฉ”๋ชจ๋ฆฌ์— ์ด๋ฏธ์ง€๊ฐ€ ๋กœ๋“œ๋˜๋ฉฐ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฉ”์†Œ๋“œ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋Š๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

PennFudan๋ฅผ ์œ„ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹ ์ž‘์„ฑํ•˜๊ธฐ

PennFudan ๋ฐ์ดํ„ฐ์…‹์„ ์œ„ํ•œ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. `๋‹ค์šด๋กœ๋“œ ํ›„ ์••์ถ• ํŒŒ์ผ์„ ํ•ด์ œํ•˜๋ฉด<https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip>`__, ๋‹ค์Œ์˜ ํด๋” ๊ตฌ์กฐ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

PennFudanPed/
  PedMasks/
    FudanPed00001_mask.png
    FudanPed00002_mask.png
    FudanPed00003_mask.png
    FudanPed00004_mask.png
    ...
  PNGImages/
    FudanPed00001.png
    FudanPed00002.png
    FudanPed00003.png
    FudanPed00004.png

ํ•œ ์Œ์˜ ์˜์ƒ๊ณผ ๋ถ„ํ•  ๋งˆ์Šคํฌ์˜ ํ•œ ๊ฐ€์ง€ ์˜ˆ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

../../_static/img/tv_tutorial/tv_image01.png

../../_static/img/tv_tutorial/tv_image02.png

๊ฐ ์ด๋ฏธ์ง€์—๋Š” ํ•ด๋‹นํ•˜๋Š” ๋ถ„ํ•  ๋งˆ์Šคํฌ๊ฐ€ ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์„œ ๊ฐ๊ฐ์˜ ์ƒ‰์ƒ์€ ๋‹ค๋ฅธ ์ธ์Šคํ„ด์Šค์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์— ํ•ด๋‹นํ•˜๋Š” torch.utils.data.Dataset` ํด๋ž˜์Šค๋ฅผ ์ž‘์„ฑํ•ฉ์‹œ๋‹ค.

import os
import numpy as np
import torch
from PIL import Image


class PennFudanDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # ๋ชจ๋“  ์ด๋ฏธ์ง€ ํŒŒ์ผ๋“ค์„ ์ฝ๊ณ , ์ •๋ ฌํ•˜์—ฌ
        # ์ด๋ฏธ์ง€์™€ ๋ถ„ํ•  ๋งˆ์Šคํฌ ์ •๋ ฌ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # ์ด๋ฏธ์ง€์™€ ๋งˆ์Šคํฌ๋ฅผ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # ๋ถ„ํ•  ๋งˆ์Šคํฌ๋Š” RGB๋กœ ๋ณ€ํ™˜ํ•˜์ง€ ์•Š์Œ์„ ์œ ์˜ํ•˜์„ธ์š”
        # ์™œ๋ƒํ•˜๋ฉด ๊ฐ ์ƒ‰์ƒ์€ ๋‹ค๋ฅธ ์ธ์Šคํ„ด์Šค์— ํ•ด๋‹นํ•˜๋ฉฐ, 0์€ ๋ฐฐ๊ฒฝ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค
        mask = Image.open(mask_path)
        # numpy ๋ฐฐ์—ด์„ PIL ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค
        mask = np.array(mask)
        # ์ธ์Šคํ„ด์Šค๋“ค์€ ๋‹ค๋ฅธ ์ƒ‰๋“ค๋กœ ์ธ์ฝ”๋”ฉ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
        obj_ids = np.unique(mask)
        # ์ฒซ๋ฒˆ์งธ id ๋Š” ๋ฐฐ๊ฒฝ์ด๋ผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค
        obj_ids = obj_ids[1:]

        # ์ปฌ๋Ÿฌ ์ธ์ฝ”๋”ฉ๋œ ๋งˆ์Šคํฌ๋ฅผ ๋ฐ”์ด๋„ˆ๋ฆฌ ๋งˆ์Šคํฌ ์„ธํŠธ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค
        masks = mask == obj_ids[:, None, None]

        # ๊ฐ ๋งˆ์Šคํฌ์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # ๋ชจ๋“  ๊ฒƒ์„ torch.Tensor ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # ๊ฐ์ฒด ์ข…๋ฅ˜๋Š” ํ•œ ์ข…๋ฅ˜๋งŒ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค(์—ญ์ž์ฃผ: ์˜ˆ์ œ์—์„œ๋Š” ์‚ฌ๋žŒ๋งŒ์ด ๋Œ€์ƒ์ž…๋‹ˆ๋‹ค)
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = idx
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # ๋ชจ๋“  ์ธ์Šคํ„ด์Šค๋Š” ๊ตฐ์ค‘(crowd) ์ƒํƒœ๊ฐ€ ์•„๋‹˜์„ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

๋ฐ์ดํ„ฐ์…‹ ์ฝ”๋“œ๋Š” ์—ฌ๊ธฐ๊นŒ์ง€์ž…๋‹ˆ๋‹ค. ์ด์ œ ์ด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ •์˜ํ•ด ๋ด…์‹œ๋‹ค.

๋ชจ๋ธ ์ •์˜ํ•˜๊ธฐ

์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Faster R-CNN ์— ๊ธฐ๋ฐ˜ํ•œ Mask R-CNN ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. Faster R-CNN์€ ์ด๋ฏธ์ง€์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ์ฒด์— ๋Œ€ํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ํด๋ž˜์Šค ์ ์ˆ˜๋ฅผ ๋ชจ๋‘ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

../../_static/img/tv_tutorial/tv_image03.png

Mask R-CNN์€ ๊ฐ ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•œ ๋ถ„ํ•  ๋งˆ์Šคํฌ ์˜ˆ์ธกํ•˜๋Š” ์ถ”๊ฐ€ ๋ถ„๊ธฐ(๋ ˆ์ด์–ด)๋ฅผ Faster R-CNN์— ์ถ”๊ฐ€ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

../../_static/img/tv_tutorial/tv_image04.png

Torchvision ๋ชจ๋ธ์ฃผ(model zoo, ์—ญ์ž์ฃผ:๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์„ ๋ชจ์•„ ๋†“์€ ๊ณต๊ฐ„)์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ๋“ค ์ค‘ ํ•˜๋‚˜๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•˜๋ ค๋ฉด ๋ณดํ†ต ๋‘๊ฐ€์ง€ ์ƒํ™ฉ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์—์„œ ์‹œ์ž‘ํ•ด์„œ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด ์ˆ˜์ค€๋งŒ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๋ชจ๋ธ์˜ ๋ฐฑ๋ณธ์„ ๋‹ค๋ฅธ ๋ฐฑ๋ณธ์œผ๋กœ ๊ต์ฒดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.(์˜ˆ๋ฅผ ๋“ค๋ฉด, ๋” ๋น ๋ฅธ ์˜ˆ์ธก์„ ํ•˜๋ ค๊ณ  ํ• ๋•Œ) (์—ญ์ž์ฃผ: ๋ฐฑ๋ณธ ๋ชจ๋ธ์„ ResNet101 ์—์„œ MobilenetV2 ๋กœ ๊ต์ฒดํ•˜๋ฉด ์ˆ˜ํ–‰ ์†๋„ ํ–ฅ์ƒ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€์‹  ์ธ์‹ ์„ฑ๋Šฅ์€ ์ €ํ•˜ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.)

๋‹ค์Œ ์„น์…˜์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1 - ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ๋ฏธ์„ธ ์กฐ์ •

COCO์— ๋Œ€ํ•ด ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ํŠน์ • ํด๋ž˜์Šค๋ฅผ ์œ„ํ•ด ๋ฏธ์„ธ ์กฐ์ •์„ ์›ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค:

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# COCO๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ ์ฝ๊ธฐ
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")

# ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ƒˆ๋กœ์šด ๊ฒƒ์œผ๋กœ ๊ต์ฒดํ•˜๋Š”๋ฐ, num_classes๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค
num_classes = 2  # 1 ํด๋ž˜์Šค(์‚ฌ๋žŒ) + ๋ฐฐ๊ฒฝ
# ๋ถ„๋ฅ˜๊ธฐ์—์„œ ์‚ฌ์šฉํ•  ์ž…๋ ฅ ํŠน์ง•์˜ ์ฐจ์› ์ •๋ณด๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค
in_features = model.roi_heads.box_predictor.cls_score.in_features
# ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ๋จธ๋ฆฌ ๋ถ€๋ถ„์„ ์ƒˆ๋กœ์šด ๊ฒƒ์œผ๋กœ ๊ต์ฒดํ•ฉ๋‹ˆ๋‹ค
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

2 - ๋‹ค๋ฅธ ๋ฐฑ๋ณธ์„ ์ถ”๊ฐ€ํ•˜๋„๋ก ๋ชจ๋ธ์„ ์ˆ˜์ •ํ•˜๊ธฐ

import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# ๋ถ„๋ฅ˜ ๋ชฉ์ ์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ํŠน์ง•๋“ค๋งŒ์„ ๋ฆฌํ„ดํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค
backbone = torchvision.models.mobilenet_v2(weights="DEFAULT").features
# Faster RCNN์€ ๋ฐฑ๋ณธ์˜ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜๋ฅผ ์•Œ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.
# mobilenetV2์˜ ๊ฒฝ์šฐ 1280์ด๋ฏ€๋กœ ์—ฌ๊ธฐ์— ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
backbone.out_channels = 1280

# RPN(Region Proposal Network)์ด 5๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ์™€ 3๊ฐœ์˜ ๋‹ค๋ฅธ ์ธก๋ฉด ๋น„์œจ(Aspect ratio)์„ ๊ฐ€์ง„
# 5 x 3๊ฐœ์˜ ์•ต์ปค๋ฅผ ๊ณต๊ฐ„ ์œ„์น˜๋งˆ๋‹ค ์ƒ์„ฑํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
# ๊ฐ ํŠน์ง• ๋งต์ด ์ž ์žฌ์ ์œผ๋กœ ๋‹ค๋ฅธ ์‚ฌ์ด์ฆˆ์™€ ์ธก๋ฉด ๋น„์œจ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Tuple[Tuple[int]] ํƒ€์ž…์„ ๊ฐ€์ง€๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# ๊ด€์‹ฌ ์˜์—ญ์˜ ์ž๋ฅด๊ธฐ ๋ฐ ์žฌํ• ๋‹น ํ›„ ์ž๋ฅด๊ธฐ ํฌ๊ธฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ํ”ผ์ณ ๋งต์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
# ๋งŒ์•ฝ ๋ฐฑ๋ณธ์ด ํ…์„œ๋ฅผ ๋ฆฌํ„ดํ• ๋•Œ, featmap_names ๋Š” [0] ์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค.
# ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐฑ๋ณธ์€ OrderedDict[Tensor] ํƒ€์ž…์„ ๋ฆฌํ„ดํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
# ๊ทธ๋ฆฌ๊ณ  ํŠน์ง•๋งต์—์„œ ์‚ฌ์šฉํ•  featmap_names ๊ฐ’์„ ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
                                                output_size=7,
                                                sampling_ratio=2)

# ์กฐ๊ฐ๋“ค์„ Faster RCNN ๋ชจ๋ธ๋กœ ํ•ฉ์นฉ๋‹ˆ๋‹ค.
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

PennFudan ๋ฐ์ดํ„ฐ์…‹์„ ์œ„ํ•œ ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋ชจ๋ธ

์šฐ๋ฆฌ์˜ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ๋งค์šฐ ์ž‘๊ธฐ ๋•Œ๋ฌธ์—, ์šฐ๋ฆฌ๋Š” 1๋ฒˆ ์ ‘๊ทผ๋ฒ•์„ ๋”ฐ๋ฅผ ๊ฒƒ์ด๋ผ๋Š” ์ ์„ ๊ณ ๋ คํ•˜์—ฌ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์—์„œ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋งˆ์Šคํฌ๋„ ๊ณ„์‚ฐํ•˜๊ธฐ๋ฅผ ์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— Mask R-CNN๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor


def get_model_instance_segmentation(num_classes):
    # COCO ์—์„œ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋ชจ๋ธ์„ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT")

    # ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ์ž…๋ ฅ ํŠน์ง• ์ฐจ์›์„ ์–ป์Šต๋‹ˆ๋‹ค
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ํ—ค๋”๋ฅผ ์ƒˆ๋กœ์šด ๊ฒƒ์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # ๋งˆ์Šคํฌ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์œ„ํ•œ ์ž…๋ ฅ ํŠน์ง•๋“ค์˜ ์ฐจ์›์„ ์–ป์Šต๋‹ˆ๋‹ค
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # ๋งˆ์Šคํฌ ์˜ˆ์ธก๊ธฐ๋ฅผ ์ƒˆ๋กœ์šด ๊ฒƒ์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model

๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ชจ๋ธ ์„ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•  ์ค€๋น„๊ฐ€ ๋  ๊ฒ๋‹ˆ๋‹ค.

๋ชจ๋“  ๊ฒƒ์„ ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๊ธฐ

references/detection/ ํด๋” ๋‚ด์— ๊ฒ€์ถœ ๋ชจ๋ธ๋“ค์˜ ํ•™์Šต๊ณผ ํ‰๊ณผ๋ฅผ ์‰ฝ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๋„์›€ ํ•จ์ˆ˜๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ references/detection/engine.py, references/detection/utils.py, references/detection/transforms.py ๋ฅผ ์‚ฌ์šฉ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. references/detection ์•„๋ž˜์˜ ๋ชจ๋“  ํŒŒ์ผ๊ณผ ํด๋”๋“ค์„ ์‚ฌ์šฉ์ž์˜ ํด๋”๋กœ ๋ณต์‚ฌํ•œ ๋’ค ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ฆ๊ฐ• / ๋ณ€ํ™˜์„ ์œ„ํ•œ ๋„์›€ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด ๋ด…์‹œ๋‹ค

import transforms as T

def get_transform(train):
    transforms = []
    transforms.append(T.PILToTensor())
    transforms.append(T.ToDtype(torch.float, scale = True))
    if train:
        # (์—ญ์ž์ฃผ: ํ•™์Šต์‹œ 50% ํ™•๋ฅ ๋กœ ํ•™์Šต ์˜์ƒ์„ ์ขŒ์šฐ ๋ฐ˜์ „ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค)
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

(์„ ํƒ) forward() ๋ฉ”์†Œ๋“œ ํ…Œ์ŠคํŠธํ•˜๊ธฐ

๋ฐ์ดํ„ฐ์…‹์„ ๋ฐ˜๋ณตํ•˜๊ธฐ ์ „์—, ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๊ณผ ์ถ”๋ก  ์‹œ ๋ชจ๋ธ์ด ์˜ˆ์ƒ๋Œ€๋กœ ๋™์ž‘ํ•˜๋Š”์ง€ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
data_loader = torch.utils.data.DataLoader(
 dataset, batch_size=2, shuffle=True, num_workers=4,
 collate_fn=utils.collate_fn)
# ํ•™์Šต ์‹œ
images,targets = next(iter(data_loader))
images = list(image for image in images)
targets = [{k: v for k, v in t.items()} for t in targets]
output = model(images,targets)   # Returns losses and detections
# ์ถ”๋ก  ์‹œ
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)           # Returns predictions

ํ•™์Šต(train)๊ณผ ๊ฒ€์ฆ(validation)์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋ฉ”์ธ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ฉ์‹œ๋‹ค:

from engine import train_one_epoch, evaluate
import utils


def main():
    # ํ•™์Šต์„ GPU๋กœ ์ง„ํ–‰ํ•˜๋˜ GPU๊ฐ€ ๊ฐ€์šฉํ•˜์ง€ ์•Š์œผ๋ฉด CPU๋กœ ํ•ฉ๋‹ˆ๋‹ค
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    # ์šฐ๋ฆฌ ๋ฐ์ดํ„ฐ์…‹์€ ๋‘ ๊ฐœ์˜ ํด๋ž˜์Šค๋งŒ ๊ฐ€์ง‘๋‹ˆ๋‹ค - ๋ฐฐ๊ฒฝ๊ณผ ์‚ฌ๋žŒ
    num_classes = 2
    # ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ •์˜๋œ ๋ณ€ํ™˜๋“ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค
    dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
    dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))

    # ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต์šฉ๊ณผ ํ…Œ์ŠคํŠธ์šฉ์œผ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค(์—ญ์ž์ฃผ: ์—ฌ๊ธฐ์„œ๋Š” ์ „์ฒด์˜ 50๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ์—, ๋‚˜๋จธ์ง€๋ฅผ ํ•™์Šต์— ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค)
    indices = torch.randperm(len(dataset)).tolist()
    dataset = torch.utils.data.Subset(dataset, indices[:-50])
    dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

    # ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ํ•™์Šต์šฉ๊ณผ ๊ฒ€์ฆ์šฉ์œผ๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=2, shuffle=True, num_workers=4,
        collate_fn=utils.collate_fn)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=1, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)

    # ๋„์›€ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค
    model = get_model_instance_segmentation(num_classes)

    # ๋ชจ๋ธ์„ GPU๋‚˜ CPU๋กœ ์˜ฎ๊น๋‹ˆ๋‹ค
    model.to(device)

    # ์˜ตํ‹ฐ๋งˆ์ด์ €(Optimizer)๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # ํ•™์Šต๋ฅ  ์Šค์ผ€์ฅด๋Ÿฌ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)

    # 10 ์—ํฌํฌ๋งŒํผ ํ•™์Šตํ•ด๋ด…์‹œ๋‹ค
    num_epochs = 10

    for epoch in range(num_epochs):
        # 1 ์—ํฌํฌ๋™์•ˆ ํ•™์Šตํ•˜๊ณ , 10ํšŒ ๋งˆ๋‹ค ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # ํ•™์Šต๋ฅ ์„ ์—…๋ฐ์ดํŠธ ํ•ฉ๋‹ˆ๋‹ค
        lr_scheduler.step()
        # ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ‰๊ฐ€๋ฅผ ํ•ฉ๋‹ˆ๋‹ค
        evaluate(model, data_loader_test, device=device)

    print("That's it!")

์ฒซ๋ฒˆ์งธ ์—ํฌํฌ์˜ ์ถœ๋ ฅ๊ฐ’์€ ์•„๋ž˜์™€ ๊ฐ™์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค:

Epoch: [0]  [ 0/60]  eta: 0:01:18  lr: 0.000090  loss: 2.5213 (2.5213)  loss_classifier: 0.8025 (0.8025)  loss_box_reg: 0.2634 (0.2634)  loss_mask: 1.4265 (1.4265)  loss_objectness: 0.0190 (0.0190)  loss_rpn_box_reg: 0.0099 (0.0099)  time: 1.3121  data: 0.3024  max mem: 3485
Epoch: [0]  [10/60]  eta: 0:00:20  lr: 0.000936  loss: 1.3007 (1.5313)  loss_classifier: 0.3979 (0.4719)  loss_box_reg: 0.2454 (0.2272)  loss_mask: 0.6089 (0.7953)  loss_objectness: 0.0197 (0.0228)  loss_rpn_box_reg: 0.0121 (0.0141)  time: 0.4198  data: 0.0298  max mem: 5081
Epoch: [0]  [20/60]  eta: 0:00:15  lr: 0.001783  loss: 0.7567 (1.1056)  loss_classifier: 0.2221 (0.3319)  loss_box_reg: 0.2002 (0.2106)  loss_mask: 0.2904 (0.5332)  loss_objectness: 0.0146 (0.0176)  loss_rpn_box_reg: 0.0094 (0.0123)  time: 0.3293  data: 0.0035  max mem: 5081
Epoch: [0]  [30/60]  eta: 0:00:11  lr: 0.002629  loss: 0.4705 (0.8935)  loss_classifier: 0.0991 (0.2517)  loss_box_reg: 0.1578 (0.1957)  loss_mask: 0.1970 (0.4204)  loss_objectness: 0.0061 (0.0140)  loss_rpn_box_reg: 0.0075 (0.0118)  time: 0.3403  data: 0.0044  max mem: 5081
Epoch: [0]  [40/60]  eta: 0:00:07  lr: 0.003476  loss: 0.3901 (0.7568)  loss_classifier: 0.0648 (0.2022)  loss_box_reg: 0.1207 (0.1736)  loss_mask: 0.1705 (0.3585)  loss_objectness: 0.0018 (0.0113)  loss_rpn_box_reg: 0.0075 (0.0112)  time: 0.3407  data: 0.0044  max mem: 5081
Epoch: [0]  [50/60]  eta: 0:00:03  lr: 0.004323  loss: 0.3237 (0.6703)  loss_classifier: 0.0474 (0.1731)  loss_box_reg: 0.1109 (0.1561)  loss_mask: 0.1658 (0.3201)  loss_objectness: 0.0015 (0.0093)  loss_rpn_box_reg: 0.0093 (0.0116)  time: 0.3379  data: 0.0043  max mem: 5081
Epoch: [0]  [59/60]  eta: 0:00:00  lr: 0.005000  loss: 0.2540 (0.6082)  loss_classifier: 0.0309 (0.1526)  loss_box_reg: 0.0463 (0.1405)  loss_mask: 0.1568 (0.2945)  loss_objectness: 0.0012 (0.0083)  loss_rpn_box_reg: 0.0093 (0.0123)  time: 0.3489  data: 0.0042  max mem: 5081
Epoch: [0] Total time: 0:00:21 (0.3570 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:19  model_time: 0.2152 (0.2152)  evaluator_time: 0.0133 (0.0133)  time: 0.4000  data: 0.1701  max mem: 5081
Test:  [49/50]  eta: 0:00:00  model_time: 0.0628 (0.0687)  evaluator_time: 0.0039 (0.0064)  time: 0.0735  data: 0.0022  max mem: 5081
Test: Total time: 0:00:04 (0.0828 s / it)
Averaged stats: model_time: 0.0628 (0.0687)  evaluator_time: 0.0039 (0.0064)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.606
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.984
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.780
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.582
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.270
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.755
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.664
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.704
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.979
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.871
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.488
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.748
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.749
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758

๊ทธ๋ž˜์„œ 1 ์—ํฌํฌ(epoch) ํ•™์Šต์„ ๊ฑฐ์ณ 60.6์˜ COCO ์Šคํƒ€์ผ mAP์™€ 70.4์˜ ๋งˆ์Šคํฌ mAP๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

10 ์—ํฌํฌ ํ•™์Šต ํ›„, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์น˜๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.799
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.935
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.592
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.831
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.324
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.777
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.870
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.761
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.919
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.464
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.788
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.303
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.799
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.799
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.769
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818

ํ•˜์ง€๋งŒ ์˜ˆ์ธก๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‚˜์™”์„๊นŒ์š”? ๋ฐ์ดํ„ฐ ์…‹์— ์ด๋ฏธ์ง€ ํ•˜๋‚˜๋ฅผ ๊ฐ€์ ธ์™€์„œ ํ™•์ธํ•ด ๋ด…์‹œ๋‹ค.

../../_static/img/tv_tutorial/tv_image05.png

ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€์—์„œ 9๊ฐœ์˜ ์‚ฌ๋žŒ ์ธ์Šคํ„ด์Šค๋ฅผ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ค‘ ๋‘์–ด๊ฐœ๋ฅผ ํ™•์ธํ•ด ๋ด…์‹œ๋‹ค:

../../_static/img/tv_tutorial/tv_image06.png

../../_static/img/tv_tutorial/tv_image07.png

๊ฒฐ๊ณผ๊ฐ€ ๊ฝค ์ข‹์•„ ๋ณด์ž…๋‹ˆ๋‹ค!

์š”์•ฝ

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  ๋ชจ๋ธ์„ ์œ„ํ•œ ์ž์ฒด ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์› ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์˜์ƒ๊ณผ ์ •๋‹ต ๋ฐ ๋ถ„ํ•  ๋งˆ์Šคํฌ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” torch.utils.data.Dataset ํด๋ž˜์Šค๋ฅผ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•œ ์ „์†ก ํ•™์Šต(Transfer learning)์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด COCO train2017์— ๋Œ€ํ•ด ๋ฏธ๋ฆฌ ํ•™์Šต๋œ Mask R-CNN ๋ชจ๋ธ์„ ํ™œ์šฉ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์ค‘๋จธ์‹  / ๋‹ค์ค‘GPU ์—์„œ์˜ ํ•™์Šต์„ ํฌํ•จํ•˜๋Š” ๋” ๋ณต์žกํ•œ ์˜ˆ์ œ๋ฅผ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด torchvision ์ €์žฅ์†Œ์— ์žˆ๋Š” references/detection/train.py ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”.

์—ฌ๊ธฐ ์—์„œ ์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์˜ ์ „์ฒด ์†Œ์Šค์ฝ”๋“œ๋ฅผ ๋‹ค์šด ๋ฐ›์œผ์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.