In addition to the common use of speech-to-text algorithms, analysis of audio data can provide important knowledge. From sounds made by mechanical devices that pre-cursor an approaching failure to monitoring the dynamics of a crowd of people, audio data can provide important insights.
Fortunately, when using existing building blocks, automatic analysis of audio data is not a complicated task. One effective approach is using an audio feature extraction library, followed by machine learning. Useful open source libraries that can extract audio features include Gist, Kaldi, Essentia, jAudio, and many more. jAudio is a comprehensive easy-to-use java-based library that outputs the audio features in the form of XML files, and can be run through a simple command line. Once the audio features are extracted, they can be analyzed with any machine learning tool or library.
Another approach to automatic audio analysis is converting the audio samples into spectrograms, and then analyzing them as 2D images. A spectrogram is a visual representation of the sound such that the x-axis is the time, the y-axis is the frequency, and the pixel intensity is the volume. Once converted into spectrogram, the visualized sounds can be analyzed using image analysis libraries, including deep learning.
Despite their simple and quick nature, using these approaches can provide substantial information from audio.