Loading [MathJax]/jax/output/CommonHTML/jax.js
MewwSikk
article thumbnail

1. Introduction



์ด ๋…ผ๋ฌธ์€ long-tailed distribution(๊ธด ๊ผฌ๋ฆฌ ๋ถ„ํฌ)์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ deep neural network๊ฐ€ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๊ฒช๋Š” ๋ฌธ์ œ์— ์ฃผ๋ชฉํ•ฉ๋‹ˆ๋‹ค.

ํŠนํžˆ, ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ์ƒํ™ฉ์—์„œ ํ”ํžˆ ์‚ฌ์šฉ๋˜๋Š” inverse class frequency re-weighting์˜ ํ•œ๊ณ„๋ฅผ ์ง€์ ํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด ์†์‹ค ํ•จ์ˆ˜์ธ Class-Balanced Loss๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” โ€œ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๋” ๋‚ซ๋‹คโ€๋Š” ์ง๊ด€์— ๊ธฐ๋ฐ˜ํ•˜๋˜, ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋กœ ์ค‘๋ณต๋˜๊ฑฐ๋‚˜ ์œ ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ณ ๋ คํ•ด, ์‹ค์ œ๋กœ ๋ชจ๋ธ ํ•™์Šต์— ๊ธฐ์—ฌํ•˜๋Š” Effective Number of Samples(์œ ํšจ ์ƒ˜ํ”Œ ์ˆ˜)๋ฅผ ์ •์˜ํ•˜๊ณ , ์ด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

2. ๊ด€๋ จ ์—ฐ๊ตฌ (Related Work)


๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

1. Re-sampling
- Over-sampling: ์†Œ์ˆ˜ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉ -> ์˜ค๋ฒ„ํ”ผํŒ…
- Under-sampling: ๋‹ค์ˆ˜ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ๋ฅผ ์ผ๋ถ€ ์ œ๊ฑฐ -> ์ค‘์š” ์ƒ˜ํ”Œ ์†์‹ค ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ

2. Cost-sensitive Re-Weighting
- ์†์‹ค ํ•จ์ˆ˜์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌ
- ์ผ๋ฐ˜์ ์œผ๋กœ inverse class frequency๋ฅผ ์‚ฌ์šฉ
- ์ตœ๊ทผ ์—ฐ๊ตฌ์—์„œ๋Š” inverse sqrt(frequency)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์™„ํ™”

๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฐ ๋ฐฉ๋ฒ•๋“ค์€ ํด๋ž˜์Šค ๊ฐ„ ์ •๋ณด๋Ÿ‰์˜ ์‹ค์งˆ์  ๊ธฐ์—ฌ๋„๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

3. ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ (Theoretical Framework)


1. ์•„์ด๋””์–ด์˜ ์ถœ๋ฐœ์ : Random Covering Problem

Class Balanced Loss์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” Random Covering Problem์—์„œ ์ถœ๋ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋Š” โ€œ์–ด๋–ค ๊ณต๊ฐ„์„ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ ์ž‘์€ ์˜์—ญ๋“ค๋กœ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ฎ์„ ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€๋ฅผ ๋ฌป๋Š” ๊ณ ์ „์ ์ธ ํ™•๋ฅ  ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.


2. ํด๋ž˜์Šค์˜ Feature ๊ณต๊ฐ„ ์ •์˜
- ์–ด๋–ค ํด๋ž˜์Šค์˜ feature space๋ฅผ S ๋ผ๊ณ  ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
- ์ด ๊ณต๊ฐ„์˜ ์ „์ฒด ๋ถ€ํ”ผ(Volume)๋Š” N >= 1์ž…๋‹ˆ๋‹ค.
-> ์ง๊ด€์ ์œผ๋กœ ๋ณด๋ฉด, ํด๋ž˜์Šค์˜ ๋‚ด๋ถ€ ๋‹ค์–‘์„ฑ ๋˜๋Š” ํ‘œํ˜„์˜ ๋‹ค์–‘์„ฑ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

3. ๊ฐ ์ƒ˜ํ”Œ์˜ ์˜๋ฏธ์™€ ์ค‘๋ณต ๊ฐ€์ •
- ๊ฐ ์ƒ˜ํ”Œ์€ ๋ถ€ํ”ผ(Volume) = 1 ์ธ ์ž‘์€ ์˜์—ญ์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค.
- ์ƒ˜ํ”Œ ๊ฐ„ ์ค‘๋ณต(overlap)์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
-> ์ด ์ค‘๋ณต์€ ๊ณง ์ •๋ณด์˜ ์ค‘๋ณต์„ฑ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์‹ค์ œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋น„์Šทํ•œ ์ƒํ™ฉ์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.


4. ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์˜ ์ค‘๋ณต ํ™•๋ฅ 
- (n - 1) ๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์ด๋ฏธ ๋ฝ‘์€ ์ƒํƒœ์—์„œ, ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์ด ๊ธฐ์กด๊ณผ ์™„์ „ํžˆ ๊ฒน์น  ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

p = E_(n-1)/N

- ์—ฌ๊ธฐ์„œ E_(n-1)์€ n๋ฒˆ์งธ ์ƒ˜ํ”Œ์„ ๋ฝ‘๊ธฐ ์ด์ „๊นŒ์ง€์˜ ์œ ํšจ ์ƒ˜ํ”Œ ์ˆ˜(Effective Number)์ž…๋‹ˆ๋‹ค.

+ ๋ถ€๋ถ„์ ์œผ๋กœ ๊ฒน์น˜๋Š” ์ƒํ™ฉ์€ ๋ชจ๋ธ ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.
์ฆ‰, ์ƒ˜ํ”Œ์€ ์™„์ „ํžˆ ๊ฒน์น˜๊ฑฐ๋‚˜, ์™„์ „ํžˆ ๊ฒน์น˜์ง€ ์•Š์Œ ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๋กœ๋งŒ ๋ด…๋‹ˆ๋‹ค.

2. Effective Number of Samples ๊ท€๋‚ฉ์  ๋„์ถœ



4. Class-Balanced Loss



+ ์ ์šฉ ์˜ˆ์‹œ

4.0.1. Softmax Cross-Entropy Loss

4.0.2. Sigmoid Cross-Entropy Loss

4.0.3. - Focal Loss


5. ์‹คํ—˜ (Experiments)


1. Dataset
- Long-tailed CIFAR-10/100: ฮผ๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ๋ถˆ๊ท ํ˜• ์ˆ˜์ค€ ์„ค์ •
- iNaturalist 2017/2018: ์‹ค์ œ long-tailed ๋ถ„ํฌ
- imageNet(ILSVRC 2012)

2. ์ฃผ์š” ๊ฒฐ๊ณผ
- CB Loss๋Š” ๊ธฐ์กด loss ๋Œ€๋น„ ์ผ๊ด€๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ ์ œ๊ณต

- CIFAR-10: ํฐ beta์—์„œ ์„ฑ๋Šฅ์ด ์ข‹์Œ -> coarse-grained class
-> ํด๋ž˜์Šค ๊ฐ„ ๊ฒน์น˜๋Š” ์˜์—ญ์ด ์ ๊ณ (๋‚ด๋ถ€์  ๋‹ค์–‘์„ฑ์ด ํผ) ์ปค๋‹ค๋ž€ ์ •๋ณด๊ณต๊ฐ„(N)์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋กœ ํ•ด์„ ๊ฐ€๋Šฅ

- CIFAR-100: ์ž‘์€ beta๊ฐ€ ์œ ๋ฆฌ -> fine-grained class
-> ํด๋ž˜์Šค ๊ฐ„ ๊ฒน์น˜๋Š” ์˜์—ญ์ด ๋งŽ๊ณ (๋‚ด๋ถ€์  ๋‹ค์–‘์„ฑ์ด ์ ์Œ) ์ž‘์€ ์ •๋ณด๊ณต๊ฐ„(N)์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋กœ ํ•ด์„ ๊ฐ€๋Šฅ


6. ๊ฒฐ๋ก  (Conclusion)

- Effective Number of Samples๋ผ๋Š” ๊ฐœ๋… ๋„์ž…์„ ํ†ตํ•ด ํด๋ž˜์Šค๋ณ„ ๋ฐ์ดํ„ฐ ์œ ํšจ ์ƒ˜ํ”Œ๊ณผ ์ •๋ณด๋Ÿ‰์„ ์ •๋Ÿ‰ํ™”
- ์†์‹ค ํ•จ์ˆ˜์— ํด๋ž˜์Šค๋ณ„๋กœ ์œ ํšจ ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ ๋ฐ˜์˜ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•จ์œผ๋กœ์จ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์—์„œ๋„ ๊ฐ•๊ฑดํ•œ ํ•™์Šต ๊ฐ€๋Šฅ
- ํŠน์ • ๋ชจ๋ธ์ด๋‚˜ ์†์‹ค ํ•จ์ˆ˜์— ์ข…์†๋˜์ง€ ์•Š์œผ๋ฉฐ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์— ์ ์šฉ ๊ฐ€๋Šฅ
-> Loss ๋‚ด๋ถ€์— ๋“ค์–ด๊ฐ€๋Š” ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ๋˜์ง€ ์•Š์•˜๊ณ , ์ƒ์ˆ˜๋กœ ๋“ค์–ด๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— Loss์™€ ๋…๋ฆฝ์ ์ด๋‹ค.


7. ๋น„ํŒ์  ๊ณ ์ฐฐ ๋ฐ ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ(*์€ ์ค‘์š”๋„)

* ํด๋ž˜์Šค๋ณ„ ์ •๋ณด ๊ณต๊ฐ„์˜ ํฌ๊ธฐ N ์€ ์‹ค์ œ๋กœ ์ธก์ •ํ•˜๊ฑฐ๋‚˜ ์ถ”์ •ํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฏ€๋กœ, ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“  ํด๋ž˜์Šค์— ๋™์ผํ•œ beta๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํด๋ž˜์Šค ๋ณ„๋กœ ์กฐ์ • ๊ฐ€๋Šฅํ•œ beta_i์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.

** ์œ ํšจ ์ƒ˜ํ”Œ ์ˆ˜๋Š” ์ƒ˜ํ”Œ ๊ฐ„์˜ ์œ ์‚ฌ๋„์— ์˜์กดํ•˜์ง€๋งŒ, ํ˜„์žฌ๋Š” ์ด๋ฅผ ๋‹จ์ˆœํžˆ ํ™•๋ฅ ์ ์œผ๋กœ ๊ฐ€์ •ํ–ˆ์œผ๋ฏ€๋กœ feature-level similarity๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์‹ค์ธก ๊ธฐ๋ฐ˜ ์œ ํšจ ์ƒ˜ํ”Œ ์ˆ˜ ๊ณ„์‚ฐ๋ฒ• ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜์˜€์Šต๋‹ˆ๋‹ค.

* CB Loss๋Š” ์ด ๋ฐ์ดํ„ฐ ์ˆ˜์— ๋”ฐ๋ผ scaling ๋˜๋ฏ€๋กœ batch-level normalization์ด ํ•„์š”ํ•œ ์ƒํ™ฉ์—์„œ๋Š” ๋ณด์ • ๋ฐฉ์‹์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

REFERENCE
Class-Balanced Loss Based on Effective Number of Samples
: https://arxiv.org/abs/1901.05555

profile

MewwSikk

@Mu Gyum

ํฌ์ŠคํŒ…์ด ์ข‹์•˜๋‹ค๋ฉด "์ข‹์•„์š”โค๏ธ" ๋˜๋Š” "๊ตฌ๋…๐Ÿ‘๐Ÿป" ํ•ด์ฃผ์„ธ์š”!