오블완 3

[논문리뷰] YOLO-World: Real-Time Open-Vocabulary Object Detection

https://arxiv.org/abs/2401.17270 YOLO-World: Real-Time Open-Vocabulary Object DetectionThe You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, wearxiv.orgAbstract & IntroductionYou Only Look Once(YOLO..

[논문리뷰] GROUNDED-VIDEOLLM: SHARPENING FINEGRAINED TEMPORAL GROUNDING IN VIDEO LARGELANGUAGE MODELS

https://arxiv.org/abs/2410.03290 Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language ModelsVideo Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding. In this paper, we introduce Grounded-VideoLLM, a novel Video-LLM adept atarxiv.orgABSTRACT..

[논문리뷰] VideoMamba: State Space Model for EfficientVideo Understanding

https://arxiv.org/abs/2403.06977 VideoMamba: State Space Model for Efficient Video UnderstandingAddressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. The proposed VideoMamba overcomes the limitations of existing 3D convolution neural networks andarxiv.org Abstract본 연구는 Mamba 모델을 비디오 도메인에 적용한..