Welcome To WUJNS
武汉大学学报 英文版 | Wuhan University Journal of Natural Sciences
Wan Fang
Wuhan University
Latest Article
Shallow Convolutional Neural Networks for Acoustic Scene Classification
LU Lu, YANG Yuhong, JIANG Yuzhi, AI Haojun, TU Weiping
1. National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan 430072, Hubei, China; 2. The Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, Hubei, China; 3. Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, Hubei, China
Recently, deep neural networks, which include convolutional neural networks (CNNs), have been widely applied to acoustic scene classification (ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net (VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events (DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.
Key words:acoustic scene classification; convolutional neural networks; Mel-spectrogram
CLC number:TP 391
[1]	Barchiesi D, Giannoulis D, Dan S, et al. Acoustic scene classification: classifying environments from the sounds they produce[J]. IEEE Signal Processing Magazine, 2015, 32(3): 16-34.
[2]	Ito A, Aiba A, Ito A, et al. Detection of abnormal sound using multi-stage GMM for surveillance microphone[C]// International Conference on Information Assurance and Security. Washington D C: IEEE, 2009:733-736.  
[3]	Ajmera J, Mccowan I, Bourlard H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework[J]. Speech Communication, 2003, 40(3): 351-363.
[4]	Chit K M. Audio-Based action scene classification using HMM-SVM algorithm[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(4): 1347-1351.
[5]	Xu Y, Huang Q, Wang W, et al. Hierarchical Learning for DNN-based Acoustic Scene Classification[R/OL]. [2016-09- 03]. http://www.cs.tut.fi/sgn/arg/dcase2016/documents/chall- enge_ technical_reports/Task1/Xu_2016_task1.pdf. 
[6]	Eghbal-Zadeh H, Lehner B, Dorfer M, et al. CP-JKU sub-missions for DCASE-2016: A Hybrid Approach Using Binaural Ivectors and Deep Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/dcase2016/docum- ents/challenge_technical_reports/Task1/Eghbal-Zadeh_2016_task1.pdf.
[7]	Heittola T, Mesaros A. Acoustic Scene Classification Task Results[EB/OL]. [2017-02-13]. http://www.cs.tut.fi/sgn/arg/ dcase2016/task-results-acoustic-scene-classification.
[8]	Han Y C, Lee K G. Acoustic scene classification using convolutional neural network and multiple-width frequency- delta data augmentation[DB/OL]. [2017-04-15]. http://arxiv. org/ar: 1607. 02383, 2016. 
[9]	Valenti M, Diment A, Parascandolo G, et al. DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks[R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/ dcase2016/documents/challenge_technical_reports/Task1/Valenti_2016_task1.pdf.
[10]	Thomas L, Alexander S. CQT-based Convolutional Neural Networks for Audio Scene Classification and Domestic Audio Tagging [R/OL]. [2016-09-03]. http://www.cs.tut.fi/sgn/arg/ dcase2016/documents/challenge_technical_reports/Task1/Schindler_2016_task1.pdf.
[11]	Mafra G S, Duong N Q K, Ozerov A, et al. Acoustic Scene Classification: An Evaluation of an Extremely Compact Feature Representation[R/OL]. [2016-09-03]. http://www.cs. tut. fi/sgn/arg/dcase2016/documents/challenge_technical_reports/Task1/Duong_2016_task1.pdf.  
[12]	Heittola T, Mesaros A, Virtanen T. Tut database for acoustic scene classification and sound event detection[C]// European Signal Processing Conference (EUSIPCO). Washington D C: IEEE, 2016: 1128-1132.
[13]	Johnson J, Li F F, Karpathy A, et al. Convolutional neural networks: Architectures, convolution pooling layers [EB/OL]. [2017-02-13]. http://cs231n.github.io/convolutional-networks/.  
[14]	He K, Sun J. Convolutional neural networks at constrained time cost[C]// IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE, 2015: 5353- 5360.
[15]	Nam J, Herrera J, Slaney M, et al. Learning sparse feature representations for music annotation and retrieval[C]// International Society for Music Information Retrieval Conference. Porto, Portugal: Edições, 2012: 565-570.  
[16]	Han Y, Lee S, Nam J, et al. Sparse feature learning for instrument identification: Effects of sampling and pooling methods[J]. Journal of the Acoustical Society of America, 2016, 139(5):2290-2298.
[17]	Mcfee B, Raffel C,  Liang D W, et al. Librosa: Audio and music signal analysis in Python[C]// Proceedings of the Py-thon 14th Python in Science Conference. Austin: TX, 2015: 18-25.  
[18]	Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]// International Conference on Multimedia. New York: ACM, 2014: 675- 678.
[19]	Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]// Interna-tional Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1139-1147.  
[20]	Lee H, Kim G, Kim H G, et al. Deep CNNs along the time axis with intermap pooling for robustness to spectral varia-tions[J]. IEEE Signal Processing Letters, 2016, 23(10): 1310-1314.  
[21]	Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// International Conference on Machine Learning. Lille: JMLR. org, 2015: 448-456. 
[22]	Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function[J]. Journal of Statistical Planning & Inference, 2000, 90(2): 227-244.
Welcome To WUJNS

HOME | Aim and Scope | Editoral Board | Current Issue | Back Issue | Subscribe | Crosscheck | Polishing | Contact us Copyright © 1997-2018 All right reserved