목록Clip (2)
정화 코딩
https://arxiv.org/abs/2305.05665 ImageBind: One Embedding Space To Bind Them AllWe present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only imagearxiv.org 1. Introduction아이디어: 이미지의 결합(binding) 능력 -> 다양한 센서와 ..
https://arxiv.org/abs/2106.13043 AudioCLIP: Extending CLIP to Image, Text and AudioIn the past, the rapidly evolving field of sound classification greatly benefited from the application of methods from other domains. Today, we observe the trend to fuse domain-specific tasks and approaches together, which provides the community with new oarxiv.org 1. Introduction- 오디오 분류 분야의 발전. But, 이전까지는 오직 오디오..