We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 1 Mar 2024

[1]  arXiv:2402.19443 [pdf, other]
Title: Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[2]  arXiv:2402.19355 [pdf, other]
Title: Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3]  arXiv:2402.19325 [pdf, other]
Title: Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
Comments: Submitted to Odyssey 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2402.19333 (cross-list from cs.CL) [pdf, other]
Title: Compact Speech Translation Models via Discrete Speech Units Pretraining
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2402.19172 (cross-list from eess.SP) [pdf, other]
Title: Point Processes and spatial statistics in time-frequency analysis
Comments: Submitted
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Probability (math.PR)
[6]  arXiv:2402.19106 (cross-list from eess.AS) [pdf, other]
Title: A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
Comments: 9 pages, 2 figures, 9 tables, Accepted at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Sound (cs.SD)
[7]  arXiv:2402.18968 (cross-list from eess.AS) [pdf, other]
Title: Ambisonics Networks -- The Effect Of Radial Functions Regularization
Comments: to be published in Icassp 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2402.18932 (cross-list from eess.AS) [pdf, other]
Title: Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Comments: To appear in ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2402.18923 (cross-list from cs.CL) [pdf, other]
Title: Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
Comments: Accepted to ICASSP 2024
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 29 Feb 2024

[10]  arXiv:2402.18275 [pdf, other]
Title: Investigation of Adapter for Automatic Speech Recognition in Noisy Environment
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[11]  arXiv:2402.18204 [pdf, other]
Title: ConvDTW-ACS: Audio Segmentation for Track Type Detection During Car Manufacturing
Comments: 12 pages, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2402.18085 [pdf, other]
Title: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Comments: Dataset will be made public by end of March 2024
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[13]  arXiv:2402.17785 [pdf, other]
Title: ByteComposer: a Human-like Melody Composition Method based on Language Model Agent
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[14]  arXiv:2402.18056 (cross-list from eess.IV) [pdf, other]
Title: Improvement Of Audiovisual Quality Estimation Using A Nonlinear Autoregressive Exogenous Neural Network And Bitstream Parameters
Subjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2402.18007 (cross-list from cs.LG) [pdf, other]
Title: Mixer is more than just a model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16]  arXiv:2402.17907 (cross-list from eess.AS) [pdf, other]
Title: NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2402.17775 (cross-list from eess.SP) [pdf, other]
Title: Wavelet Scattering Transform for Bioacustics: Application to Watkins Marine Mammal Sound Database
Authors: Davide Carbone (1 and 2), Alessandro Licciardi (1 and 2) ((1) Politecnico di Torino, (2) Istituto Nazionale di Fisica Nucleare Sezione di Torino)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18]  arXiv:2402.17723 (cross-list from cs.CV) [pdf, other]
Title: Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Comments: Accepted to CVPR 2024. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 28 Feb 2024 (showing first 7 of 13 entries)

[19]  arXiv:2402.17645 [pdf, other]
Title: SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Comments: project page: this https URL code: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20]  arXiv:2402.17496 [pdf, other]
Title: Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
Authors: Lucía Gómez Zaragozá (1), Rocío del Amor (1), Elena Parra Vargas (1), Valery Naranjo (1), Mariano Alcañiz Raya (1), Javier Marín-Morales (1) ((1) HUMAN-tech Institute, Universitat Politènica de València, Valencia, Spain)
Comments: 10 pages, 6 figures, submitted to Scientific Data
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21]  arXiv:2402.17482 [pdf, ps, other]
Title: Automated Classification of Phonetic Segments in Child Speech Using Raw Ultrasound Imaging
Journal-ref: Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOIMAGING, 2024, pages 326-331
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[22]  arXiv:2402.17259 [pdf, other]
Title: EDTC: enhance depth of text comprehension in automated audio captioning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2402.17127 [pdf, other]
Title: Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24]  arXiv:2402.16927 [pdf, ps, other]
Title: The ICASSP 2024 Audio Deep Packet Loss Concealment Challenge
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2402.17735 (cross-list from eess.AS) [pdf, other]
Title: High-Fidelity Neural Phonetic Posteriorgrams
Comments: Accepted to ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2403, contact, help  (Access key information)