목록multimodallearning (4)
정화 코딩
https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08071.pdfhttps://arxiv.org/abs/2407.01851 Meerkat: Audio-Visual Large Language Model for Grounding in Space and TimeLeveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been ..
https://arxiv.org/abs/2305.05665 ImageBind: One Embedding Space To Bind Them AllWe present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only imagearxiv.org 1. Introduction아이디어: 이미지의 결합(binding) 능력 -> 다양한 센서와 ..
https://dl.acm.org/doi/10.1145/3534678.3539384 1. Introduction연합학습 (Federated Learning, FL)여러 클라이언트가 서로의 데이터를 공유하지 않고도 함께 모델을 학습하는 분산 학습 프레임워크의의: 통신 비용 절감과 프라이버시 보호 멀티모달 연합학습 (Multimodal Federated Learning, MFL)배경: 센서 기술의 발전, 다양한 형태의 데이터 증가 -> FL에서 확장되어 MFL 등장여러 클라이언트가 다양한 센서 조합(모달리티)으로 수집한 데이터를 기반으로, 데이터를 공유하지 않고도 함께 모델을 학습 기존 FL/MFL 연구의 한계대부분 통계적 이질성(Statistical Heterogeneity), 즉 클라이언트마다 데이터 분..
https://arxiv.org/abs/2404.12467 Towards Multi-modal Transformers in Federated LearningMulti-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models witharxiv.org 0. AbstractMulti-modal transformers: 이미지와 텍스트..