Shallow Convolutional Neural Networks for Acoustic Scene Classification
LU Lu, YANG Yuhong, JIANG Yuzhi, AI Haojun, TU Weiping
1. National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan 430072, Hubei, China; 2. The Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, Hubei, China; 3. Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, Hubei, China
Recently, deep neural networks, which include convolutional neural networks (CNNs), have been widely applied to acoustic scene classification (ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net (VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events (DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.
Key words:acoustic scene classification; convolutional neural networks; Mel-spectrogram
CLC number:TP 391
