We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions

[ total of 25 entries: 1-25 ]
[ showing 25 entries per page: fewer | more ]

Fri, 1 Mar 2024

[1]  arXiv:2402.18702 [pdf, ps, other]
Title: Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
Comments: 14 pages, In the 4th International Conference on SMART MULTIMEDIA, 2024
Subjects: Multimedia (cs.MM)
[2]  arXiv:2402.19330 (cross-list from cs.CV) [pdf, other]
Title: A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation
Comments: 13 pages,7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[3]  arXiv:2402.18927 (cross-list from cs.CV) [pdf, other]
Title: Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[4]  arXiv:2402.18844 (cross-list from cs.CV) [pdf, other]
Title: Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2402.18761 (cross-list from eess.IV) [pdf, other]
Title: Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Thu, 29 Feb 2024

[6]  arXiv:2402.18400 [pdf, other]
Title: Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning
Subjects: Multimedia (cs.MM)
[7]  arXiv:2402.18107 [pdf, other]
Title: Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction
Comments: 10 pages,4 figures, 4 tables
Subjects: Multimedia (cs.MM)
[8]  arXiv:2402.18208 (cross-list from cs.SI) [pdf, other]
Title: Shorts on the Rise: Assessing the Effects of YouTube Shorts on Long-Form Video Content
Subjects: Social and Information Networks (cs.SI); Multimedia (cs.MM)
[9]  arXiv:2402.18122 (cross-list from cs.CV) [pdf, other]
Title: G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[10]  arXiv:2402.17723 (cross-list from cs.CV) [pdf, other]
Title: Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Comments: Accepted to CVPR 2024. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 27 Feb 2024

[11]  arXiv:2402.15513 [pdf, other]
Title: Investigating the Generalizability of Physiological Characteristics of Anxiety
Journal-ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Signal Processing (eess.SP); Medical Physics (physics.med-ph)
[12]  arXiv:2402.16665 (cross-list from cs.HC) [pdf, other]
Title: The Interaction Fidelity Model: A Taxonomy to Distinguish the Aspects of Fidelity in Virtual Reality
Comments: 34 pages incl. references and appendix
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR); Multimedia (cs.MM)
[13]  arXiv:2402.16366 (cross-list from cs.CV) [pdf, other]
Title: SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[14]  arXiv:2402.16364 (cross-list from cs.CL) [pdf, other]
Title: Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[15]  arXiv:2402.16318 (cross-list from cs.CV) [pdf, other]
Title: Gradient-Guided Modality Decoupling for Missing-Modality Robustness
Comments: AAAI24
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[16]  arXiv:2402.16153 (cross-list from cs.SD) [pdf, other]
Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM
Comments: GitHub: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17]  arXiv:2402.16110 (cross-list from cs.IR) [pdf, other]
Title: Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability
Comments: 12 pages, 7 figures
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[18]  arXiv:2402.15923 (cross-list from cs.LG) [pdf, other]
Title: Predicting Outcomes in Video Games with Long Short Term Memory Networks
Comments: 7 pages, 2 Figures, 2 Tables. Kittimate Chulajata and Sean Wu are considered co-first authors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[19]  arXiv:2402.15746 (cross-list from cs.CV) [pdf, other]
Title: Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[20]  arXiv:2402.15695 (cross-list from cs.HC) [pdf, ps, other]
Title: Applied User Research in Virtual Reality: Tools, Methods, and Challenges
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Mon, 26 Feb 2024

[21]  arXiv:2402.15444 (cross-list from cs.AI) [pdf, other]
Title: Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
Comments: Accepted by LREC-COLING 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[22]  arXiv:2402.15300 (cross-list from cs.CV) [pdf, other]
Title: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[23]  arXiv:2402.15096 (cross-list from cs.LG) [pdf, other]
Title: Multimodal Transformer With a Low-Computational-Cost Guarantee
Comments: Accepted to ICASSP 2024 (5 pages)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[24]  arXiv:2402.14947 (cross-list from cs.HC) [pdf, other]
Title: An Avalanche of Images on Telegram Preceded Russia's Full-Scale Invasion of Ukraine
Comments: 20 pages, 7 figures
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Social and Information Networks (cs.SI)

Fri, 23 Feb 2024

[25]  arXiv:2402.14326 [pdf, other]
Title: Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
Comments: Accepted by ACM Multimedia 2023
Subjects: Multimedia (cs.MM)
[ total of 25 entries: 1-25 ]
[ showing 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2403, contact, help  (Access key information)