๐Ÿ–ฅ๏ธ Computer Vision/๐Ÿ“ฐ Paper

RetinaNet ๊ตฌ์กฐ์™€ ํ•™์Šต ๋ฐฉ์‹

MewwSikk 2025. 6. 13. 21:20

1. ๊ฐ์ฒด ํƒ์ง€(Object-Detection)์™€ RetinaNet์˜ ๋“ฑ์žฅ ๋ฐฐ๊ฒฝ


๊ฐ์ฒด ํƒ์ง€(Object-Detection)์€ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ์ข…๋ฅ˜๋ฅผ ๋™์‹œ์— ์•Œ์•„๋‚ด๋Š” ์ปดํ“จํ„ฐ ๋น„์ „ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

ํฌ๊ฒŒ ๋ณด๋ฉด Two-stage ๋ฐฉ์‹๊ณผ One-stage ๋ฐฉ์‹์œผ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

- Two-stage ๋ฐฉ์‹ (ex. Faster R-CNN)
๋จผ์ € ํ›„๋ณด ์˜์—ญ(Region Proposal)์„ ๋ฝ‘๊ณ , ๊ทธ ํ›„์— ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
์ด ๋ฐฉ์‹์€ ๋‘ ๋‹จ๊ณ„๋ฅผ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•ํ•˜์ง€๋งŒ ๊ตฌ์กฐ๊ฐ€ ๋А๋ฆฌ๊ณ  ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค.

- One-stage ๋ฐฉ์‹(ex. YOLO, SSD)
์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋ฉฐ ๋น ๋ฅธ ์†๋„๊ฐ€ ์žฅ์ ์ž…๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ Two-stage์— ๋น„ํ•ด ์ •ํ™•๋„๊ฐ€ ๋‚ฎ๋‹ค๋Š” ๊ฒŒ ๋‹จ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฐ ๊ตฌ์กฐ์  ์ฐจ์ด ๋•Œ๋ฌธ์— One-stage ๋ฐฉ์‹์€ ๋Š˜ ์†๋„๊ฐ€ ๋น ๋ฅด์ง€๋งŒ ์ •ํ™•๋„์—์„œ๋Š” ๋ถ€์กฑํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋Š” ํ•™์Šต ์ค‘ ๋ฐฐ๊ฒฝ(background)์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ , ์ •๋‹ต ๊ฐ์ฒด๋Š” ๋„ˆ๋ฌด ์ ๋‹ค๋Š” ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋ฅผ ์ •๋ฉด์œผ๋กœ ํ•ด๊ฒฐํ•œ ๋…ผ๋ฌธ์ด ๋ฐ”๋กœ Focal Loss for dense Object Detection์ž…๋‹ˆ๋‹ค.

Dense Object Detection์ด๋ž€?
-> ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์œ„์น˜(grid cell)๋งˆ๋‹ค ์‚ฌ์ „ ์ •์˜๋œ ์—ฌ๋Ÿฌ anchor ๋ฐ•์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ์ฒด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์„ ์˜๋ฏธํ•œ๋‹ค.



2. RetinaNet์˜ ๊ตฌ์กฐ: FPN๊ณผ ๋ถ„๋ฆฌ๋œ ๋‘ ์„œ๋ธŒ๋„ท


RetinaNet์€ ๋‹จ์ˆœํ•˜์ง€๋งŒ ๊ฐ•๋ ฅํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

ํฌ๊ฒŒ ๋ณด๋ฉด ๋‹ค์Œ ์„ธ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

1) Backbone: ResNet + FPN

๋ฐฑ๋ณธ์—์„œ ์ถ”์ถœํ•œ feature๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, top-down ๊ฒฝ๋กœ์™€ lateral connection์„ ๊ฒฐํ•ฉํ•˜์—ฌ FPN์˜ ํ”ผ์ฒ˜๋งต(P3~P7)์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ตฌ์กฐ๋Š” ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„์˜ ํ”ผ์ฒ˜๋งต์„ ๋™์‹œ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ, ๋‹ค์ค‘ ์Šค์ผ€์ผ ๊ฐ์ฒด ํƒ์ง€๋ผ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.


2) Detection Head (์„œ๋ธŒ๋„ท)

* ์ด ๋ถ€๋ถ„์—์„œ ์ฃผ์˜ํ•  ๋ถ€๋ถ„์€, FPN์˜ ๊ฐ ๋ ˆ๋ฒจ(P3 ~ P7)์˜ feature map์ด upsampling ์—†์ด ๊ทธ๋Œ€๋กœ ์„œ๋ธŒ๋„ท์— ์ „๋‹ฌ๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Classification ์„œ๋ธŒ๋„ท์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:
(3 * 3 conv, 256 filters + Relu) * 4 + (3 * 3 conv * num_anchors filters * num_classes)
-> ๊ฐ ์œ„์น˜์˜ anchor์— ๋Œ€ํ•ด ํด๋ž˜์Šค๋ณ„ ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

Box Regression ์„œ๋ธŒ๋„ท๋„ ์œ ์‚ฌํ•œ ๊ตฌ์กฐ์ด๋ฉฐ:
(3 * 3 conv, 256 filters + Relu) * 4 + (3 * 3 conv * num_anchors * 4 filters)
-> ๊ฐ anchor๊ฐ€ ground-truth box๋กœ ๊ฐ€๋ ค๋ฉด ์–ผ๋งˆ๋‚˜ ์ด๋™(offset)ํ•ด์•ผํ•˜๋Š”์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.


3) Anchors


(์ด ๊ฐœ๋…์„ ์ดํ•ดํ•˜๋Š”๋ฐ ํ•œ์ฐธ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค. ๋ถ€๋”” ์ด ๊ธ€์„ ์ฝ์œผ์‹œ๋Š” ๋ถ„๋“ค์€ ๋น ๋ฅด๊ฒŒ ์ดํ•ดํ•˜์‹œ๊ธธโ€ฆ)



RetinaNet์—์„œ๋Š” ๊ฐ ์œ„์น˜๋งˆ๋‹ค 3๊ฐ€์ง€ scale๊ณผ 3๊ฐ€์ง€ aspect ratio์˜ ์กฐํ•ฉ์œผ๋กœ ์ด 9๊ฐœ์˜ anchor๊ฐ€ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
- 3๊ฐ€์ง€ ์Šค์ผ€์ผ = {1.0, 2^(1/3), 2^(2/3)}
- 3๊ฐ€์ง€ aspect ratio = {1:1, 1:2, 2:1}

์ด๋Ÿฌํ•œ anchor๋Š” FPN์˜ P3๋ถ€ํ„ฐ P7๊นŒ์ง€์˜ feature map ์œ„์— ๋ฐฐ์น˜๋˜๋ฉฐ(๊ฐ ์…€๋งˆ๋‹ค ์œ„์น˜),
๊ฐ ํ”ผ์ฒ˜๋งต์˜ ํ•ด์ƒ๋„(stride)์— ๋”ฐ๋ผ anchor์˜ ํฌ๊ธฐ๋„ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, P3๋Š” stride 8์ด๋ฏ€๋กœ ์›๋ณธ ์ด๋ฏธ์ง€์˜ 8 ร—8 ํ”ฝ์…€ ์˜์—ญ๋งˆ๋‹ค ํ•˜๋‚˜์˜ ํ”ฝ์…€๋กœ ๋งคํ•‘๋˜๊ณ , P7์€ stride 128๋กœ 128 ร—128 ์˜์—ญ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, FPN feature map์˜ ๊ฐ ํ”ฝ์…€์€ ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ํŠน์ • ์œ„์น˜๋ฅผ ๋Œ€ํ‘œํ•˜๋ฉฐ,
๊ทธ ํ•˜๋‚˜์˜ ํ”ฝ์…€ ์œ„์— 9๊ฐœ์˜ anchor box๊ฐ€ ๋ฐฐ์น˜๋˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

์ด์ฒ˜๋Ÿผ stride๊ฐ€ ํด์ˆ˜๋ก ํฐ ๊ฐ์ฒด๋ฅผ ๋‹ด๋‹นํ•˜๊ณ , stride๊ฐ€ ์ž‘์„์ˆ˜๋ก ์ž‘์€ ๊ฐ์ฒด๋ฅผ ๋‹ด๋‹นํ•˜๊ฒŒ ๋˜์–ด,
RetinaNet์€ ํ•œ ์ด๋ฏธ์ง€์—์„œ ์ˆ˜๋งŒ ๊ฐœ์˜ anchor๋ฅผ ์กฐ๋ฐ€ํ•˜๊ฒŒ ์ƒ์„ฑํ•˜๋ฉฐ ์ž‘์€ ๋ฌผ์ฒด๋ถ€ํ„ฐ ํฐ ๋ฌผ์ฒด๊นŒ์ง€ ํญ๋„“๊ฒŒ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฐ˜์„ ๊ฐ–์ถ”๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


์š”์•ฝ


FPN ๋ ˆ๋ฒจ์˜ ํ”ฝ์…€ = anchor ์ค‘์‹ฌ์ 
- ํ”ฝ์…€๋งˆ๋‹ค anchor 9๊ฐœ (scale ร— aspect ratio ์กฐํ•ฉ)
- stride๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ํฐ anchor, ์ ์€ ํ•ด์ƒ๋„
- P3~P7 ์ „์ฒด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์Šค์ผ€์ผ ๊ฐ์ฒด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌ



3. RetinaNet์˜ ํ•™์Šต ๋ฐฉ์‹: Anchor ๊ธฐ๋ฐ˜ ํ•™์Šต


RetinaNet์€ anchor ๊ธฐ๋ฐ˜ dense detection ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ํ•œ ์ด๋ฏธ์ง€๋‹น ์ˆ˜๋งŒ ๊ฐœ์˜ anchor๊ฐ€ ์กด์žฌํ•˜๊ณ , ์ด๋“ค ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ ํ•™์Šต ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

1) Label ์ •์˜ ๋ฐฉ์‹


๊ฐ anchor์— ๋Œ€ํ•ด์„œ GT box์™€์˜ IoU๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋‹ค์Œ ๊ธฐ์ค€์œผ๋กœ ๋ผ๋ฒจ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
- IoU >= 0.5 -> Positive (foreground)
- IoU < 0.4 -> Negative (background)
- ๊ทธ ์‚ฌ์ด [0.4, 0.5) -> Ignore

2) ํ•™์Šต ์†์‹ค ๊ตฌ์„ฑ


- Classification Loss๋Š” Focal Loss๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
- Regression Loss๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” Smooth L1 Loss์ž…๋‹ˆ๋‹ค.
- Positive anchor(foreground)๋งŒ Regression ๋Œ€์ƒ์ด ๋˜๋ฉฐ, Negative anchor(background)๋Š” Classification์—์„œ๋งŒ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•™์Šต ๋ฐฉ์‹์€ ๊ธฐ์กด ๋ชจ๋ธ๋ณด๋‹ค ๋” ๋งŽ์€ anchor๋ฅผ ๋‹ค๋ฃจ๋ฉด์„œ๋„, ์ •ํ™•ํ•˜๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.



4. Focal Loss: ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ํ•ต์‹ฌ ์•„์ด๋””์–ด


๊ธฐ์กด CE Loss๋Š” ์‰ฌ์šด ๋ฐฐ๊ฒฝ ์ƒ˜ํ”Œ์ด ๋„ˆ๋ฌด ๋งŽ์•„ ์ „์ฒด ์†์‹ค์—์„œ ์ง€๋ฐฐ์ ์ธ ๋น„์ค‘์„ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ๋ชจ๋ธ์€ ์ง„์งœ ์–ด๋ ค์šด ์†Œ์ˆ˜์˜ ๊ฐ์ฒด ์ƒ˜ํ”Œ์„ ์ž˜ ํ•™์Šตํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด RetinaNet์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ Focal Loss๋ฅผ ์ œ์••ํ•ฉ๋‹ˆ๋‹ค.

Focal Loss: ์ˆ˜์‹ ๋ฐฐ๋ถ„๊ณผ ๊ทธ๋ž˜ํ”„ ํ•ด์„ - https://small0753.tistory.com/m/30
Focal Loss์— ๊ด€๋ จํ•œ ๋‚ด์šฉ์€ ํ•ด๋‹น ๋ธ”๋กœ๊ทธ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


5. RetinaNet์ด ๋‚จ๊ธด ์˜์˜์™€ ๋ฐฉํ–ฅ


๊ตฌ์กฐ์  ์˜์˜

RetinaNet์€ FPN๊ณผ ๋‘ ๊ฐœ์˜ head๋ผ๋Š” ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ๋งŒ์œผ๋กœ๋„ Two-stage ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ์ดํ›„ ๋“ฑ์žฅํ•œ ์ˆ˜๋งŽ์€ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์— ์ง€์†์ ์ธ ์˜ํ–ฅ์„ ๋ผ์นœ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

๋ฌด์—‡๋ณด๋‹ค๋„ Focal Loss๋Š” ์†์‹ค ํ•จ์ˆ˜ ์ž์ฒด์˜ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด class imbalance ๋ฌธ์ œ๋ฅผ ๊ตฌ์กฐ์ ์œผ๋กœ ํ•ด๊ฒฐํ–ˆ๋‹ค๋Š” ์ ์—์„œ, ๊ฐ์ฒด ํƒ์ง€ ์—ฐ๊ตฌ์˜ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋ฐ”๊ฟ”๋†“์•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ดํ›„ ๊ฐ์ฒด ํƒ์ง€ ๋ถ„์•ผ๋Š” anchor-free ๋ฐฉ์‹, attention ๊ธฐ๋ฐ˜์˜ head, transformer ๊ธฐ๋ฐ˜ backbone ๋“ฑ์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์ง€๋งŒ, RetinaNet์€ ๊ทธ ์‹œ์ž‘์ ์—์„œ โ€œOne-stage ๋ชจ๋ธ๋„ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ์„ ๋™์‹œ์— ์ถ”๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค โ€œ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ฆ๋ช…ํ•ด ๋‚ธ, ์ „ํ™˜์ ์ด ๋œ ๋ชจ๋ธ์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โธป

๊ทธ๋ฆฌ๊ณ  ์ด ๋ธ”๋กœ๊ทธ๋ฅผ ๋งˆ๋ฌด๋ฆฌํ•  ์ฆˆ์Œ, ์ €์—๊ฒŒ ์ด๋Ÿฐ ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค:

โ€œ๊ทธ๋Ÿฐ๋ฐ, ์™œ ํ•˜ํ•„ RetinaNet ๊ตฌ์กฐ์˜€์„๊นŒ? ๋‹ค๋ฅธ ๊ตฌ์กฐ์˜€๋‹ค๋ฉด ๊ฐ€๋Šฅํ–ˆ์„๊นŒ?โ€
โ€œ์™œ ๋ฐฑ๋ณธ์œผ๋กœ ResNet์„ ์ผ์„๊นŒ? ๋งŽ๊ณ  ๋งŽ์€ ๋‹ค๋ฅธ ๋„คํŠธ์›Œํฌ๋“ค๋„ ์žˆ๋Š”๋ฐโ€ฆโ€

์‚ฌ์‹ค RetinaNet์€ ์™„์ „ํžˆ ์ƒˆ๋กญ๊ฒŒ ๋ชจ๋“  ๊ฑธ ์„ค๊ณ„ํ•œ ๋ชจ๋ธ์ด๋ผ๊ธฐ๋ณด๋‹ค๋Š”, ๊ทธ ๋‹น์‹œ๊นŒ์ง€ ๊ฒ€์ฆ๋œ ๊ธฐ์ˆ ๋“ค์„ ์•„์ฃผ ์˜๋ฆฌํ•˜๊ฒŒ ํ†ตํ•ฉํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
โ€ข FPN์€ ์ด๋ฏธ Faster R-CNN์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ ๋‹ค์ค‘ ์Šค์ผ€์ผ ํ‘œํ˜„์˜ ๊ฐ•์ ์„ ์ž…์ฆํ–ˆ๊ณ ,
โ€ข ResNet์€ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์—์„œ ๊ฐ€์žฅ ์‹ ๋ขฐ๋ฐ›๋Š” backbone์ด์—ˆ์œผ๋ฉฐ,
โ€ข Anchor-based ๊ตฌ์กฐ๋Š” SSD ๋“ฑ์˜ ๋ชจ๋ธ์—์„œ ์„ฑ๊ณต์ ์œผ๋กœ ํ™œ์šฉ๋œ, ๋‹น์‹œ๋กœ์„  ๊ฐ€์žฅ ๋ฒ”์šฉ์ ์ด๊ณ  ์ง๊ด€์ ์ธ detection ํ”„๋ ˆ์ž„์›Œํฌ์˜€์ฃ .

์ฆ‰, RetinaNet์€ ์ƒˆ๋กœ์šด ์•„์ด๋””์–ด(Focal Loss)๋ฅผ ์•ˆ์ •์ ์ธ ๊ตฌ์กฐ ์œ„์— ์‹ค์šฉ์ ์œผ๋กœ ์–น์€ ๋ชจ๋ธ์ด์—ˆ๊ณ , ๊ทธ๊ฒƒ์ด ์ด ๋ชจ๋ธ์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ํ›„์† ๋ชจ๋ธ๋“ค์—๊ฒŒ๊นŒ์ง€ ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์—ˆ๋˜ ํ•ต์‹ฌ ์ด์œ ์˜€์Šต๋‹ˆ๋‹ค.


6. ๋งˆ๋ฌด๋ฆฌ


RetinaNet์€ ๋‹จ์ง€ ๋ชจ๋ธ ํ•˜๋‚˜๊ฐ€ ์•„๋‹ˆ๋ผ One-stage ๊ฐ์ฒด ํƒ์ง€๊ธฐ์˜ ํ•œ๊ณ„๋ฅผ ๊ตฌ์กฐ์™€ ์†์‹ค ํ•จ์ˆ˜ ์„ค๊ณ„๋กœ ๊ทน๋ณตํ•ด ๋‚ธ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.
๊ฐ์ฒด ํƒ์ง€๋ฅผ ๊ณต๋ถ€ํ•˜๊ฑฐ๋‚˜ ๊ตฌํ˜„ํ•ด๋ณด๊ณ  ์‹ถ์€ ๋ถ„๋“ค์—๊ฒŒ RetinaNet์€ ๋ฐ˜๋“œ์‹œ ํ•œ ๋ฒˆ ์งš๊ณ  ๋„˜์–ด๊ฐ€์•ผ ํ•  ๊ธฐ์ค€์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

Reference:
Focal Loss for Dense Object Detection: https://arxiv.org/abs/1708.02002