'AI' 카테고리의 글 목록

Notice

Recent Posts

Link

« 2026/05 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

관리 메뉴

목록AI (10)

정화 코딩

[논문 리뷰] Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08071.pdfhttps://arxiv.org/abs/2407.01851 Meerkat: Audio-Visual Large Language Model for Grounding in Space and TimeLeveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been ..

AI 2025. 11. 18. 12:07

Image, Text, Audio 멀티모달 데이터셋 조사

Text ↔ Image DatasetsFlickr30k Entities- 기존 Flickr30k(이미지+문장 캡션)에 명사구 별 bounding box 어노테이션 추가된 데이터셋- 이미지 + 각 이미지에 대해 5개의 문장(캡션) + 각 문장 내 명사구(phrase) ↔ bounding box 정보 ⇒ 전처리 없이 사용 가능- 이미지 31,783개, 이미지 당 객체 8.7개, 총 박스 276K개- https://arxiv.org/abs/1505.04870- https://github.com/BryanPlummer/flickr30k_entities- https://bryanplummer.com/Flickr30kEntities/ Visual Genome (VG)- Flickr 기반 이미지 + 각 이미지에 대해..

AI 2025. 7. 30. 15:30

[논문 리뷰] ImageBind: One Embedding Space To Bind Them All

https://arxiv.org/abs/2305.05665 ImageBind: One Embedding Space To Bind Them AllWe present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only imagearxiv.org 1. Introduction아이디어: 이미지의 결합(binding) 능력 -> 다양한 센서와 ..

AI 2025. 7. 24. 02:39

[논문 리뷰] AudioCLIP: Extending CLIP to Image, Text and Audio

https://arxiv.org/abs/2106.13043 AudioCLIP: Extending CLIP to Image, Text and AudioIn the past, the rapidly evolving field of sound classification greatly benefited from the application of methods from other domains. Today, we observe the trend to fuse domain-specific tasks and approaches together, which provides the community with new oarxiv.org 1. Introduction- 오디오 분류 분야의 발전. But, 이전까지는 오직 오디오..

AI 2025. 7. 23. 17:12

MDETR 모델 주요 코드 분석

논문: https://arxiv.org/abs/2104.12763 MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingMulti-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objearxiv.org깃허브(코드): https://github.com..

AI 2025. 7. 18. 13:54

[논문 리뷰] FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks

https://dl.acm.org/doi/10.1145/3534678.3539384 1. Introduction연합학습 (Federated Learning, FL)여러 클라이언트가 서로의 데이터를 공유하지 않고도 함께 모델을 학습하는 분산 학습 프레임워크의의: 통신 비용 절감과 프라이버시 보호 멀티모달 연합학습 (Multimodal Federated Learning, MFL)배경: 센서 기술의 발전, 다양한 형태의 데이터 증가 -> FL에서 확장되어 MFL 등장여러 클라이언트가 다양한 센서 조합(모달리티)으로 수집한 데이터를 기반으로, 데이터를 공유하지 않고도 함께 모델을 학습 기존 FL/MFL 연구의 한계대부분 통계적 이질성(Statistical Heterogeneity), 즉 클라이언트마다 데이터 분..

AI 2025. 7. 11. 15:40

[논문 리뷰] Towards Multi-modal Transformers in Federated Learning

https://arxiv.org/abs/2404.12467 Towards Multi-modal Transformers in Federated LearningMulti-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models witharxiv.org 0. AbstractMulti-modal transformers: 이미지와 텍스트..

AI 2025. 7. 8. 02:46

[논문 리뷰] MDETR - Modulated Detection for End-to-End Multi-Modal Understanding

https://arxiv.org/abs/2104.12763 MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingMulti-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objearxiv.org 0. AbstractMulti-modal reasoni..

AI 2025. 7. 3. 11:52

이전 Prev 1 2 Next 다음

목록AI (10)

정화 코딩

티스토리툴바