Pulse · AI 뉴스

RiT: Representation Image Transformer, DINOv2 기반의 Diffusion Transformer

DINOv2 · 2026-05-21

연구진은 DINOv2와 같은 사전 학습된 representation space가 flow-matching 학습에 더 유리한 분포를 제공하는지 조사했어요.

DINOv2는 pixel space와 유사한 intrinsic dimensionality를 가지지만, effective rank, covariance conditioning, excess kurtosis, on-manifold interpolation error 측면에서 우수했어요.

RiT는 DINOv2 feature 위에 x-prediction 기반 Diffusion Transformer를 적용하여 ImageNet 256x256에서 FID 1.45 (guidance 없이)를 달성하며 DiT^DH-XL보다 더 적은 파라미터로 우수한 성능을 보였어요.

##diffusion##representationlearn##flowmatching##DINOv2##ImageNet

매일 핵심 AI 소식을 한국어로, 빠르게

App Store 에서 Pulse 받기 앱에서 열기