Back to Blog

Diverse Musical Instrument Detection Using Machine-Learning Models

November 4, 2022
Featured image for “Diverse Musical Instrument Detection Using Machine-Learning Models”

This blog post is based on a research paper written by Cultural Infusion’s Chief Technology Officer, Rezza Moieni, and University of Melbourne data science student and Cultural Infusion intern, Kaile Wang. The paper looks at the universalisation of music. Rezza has developed an instrument-detection system using machine-learning technology that is able to provide information on under-recognised traditional musical instruments and sounds. All the music samples were provided by Cultural Infusion’s Sound Infusion.

Globalisation has increased the spread of cultures around the world, each with their own unique histories and traditions. More and more people are becoming interested in exploring new musical instruments and their associated culture, but struggle to find reliable information on them; there is little information available to the general public and mainstream media has yet to tap into this growing field of interest. Although many of us can recognise familiar instruments just by listening, we lack tools to identify new instruments.

Currently, there are no applications on the market that can accurately detect and identify traditional and cultural instruments. Shazam is a popular application available to smartphone users that can identify attributes of a song, such as the artist and album. PixelPlayer is another popular music application on the market. It can separate the sound of each instrument from solos and duets for the purpose of audio editing and adjusting. But although these apps are music related, they centre on song identification and audio separation, and are unable to identify specific instruments in a song.

After observing this gap in the market, we wanted to fulfil the unanswered demand to identify and acknowledge cultural instruments – after all, we are in the business of assisting and strengthening the public’s awareness of cultural diversity. 

Our aim for this project is to build a functional website that can distinguish up to 200 musical instruments embedded in sounds and music from cultures all round the world. This system allows users to upload solo audio clips and on the spot recordings to our innovative machine-learning technology to be provided with information specific to each instrument. The technology encourages users to conduct further research on the instruments and sounds.

How it works

Our cultural instrument detection website is a supervised learning system using audio signals and machine-learning technologies. We tested a variety of sound-processing systems until we discovered the one that works best: the Mel Frequency Cepstral Coefficients (MFCC). The MFCC’s purpose is to perform basic processing on the audio signal to obtain pure audio in a suitable file format.

The music classification system is a Music Information Retrieval (MIR) task, where the computer’s main goal is to understand and identify the music semantics. Given it is a machine-learning technology, the system will refine and improve itself over extended use.

To test the effectiveness of the MFCC system, we considered the underrepresentation of Indigenous music in the mainstream market, and collected and tested rare instrumental audio samples provided by Cultural Infusion’s digital cultural music arrangement tool, Sound Infusion


81 types of instrumental clips/audio files in a length of 10 to 30 seconds were audited using our cultural instrument detection system. The above table displays the identified instrumental audio files successfully identified. The sounds were comprehensively segmented in to five sectors: aerophone, chordophone, idiophone, membranophone and vocal. The MFCC sound processing system was able to separate and identify multi-instrumental polyphonic music, alongside solo vocals and monophonic sounds. This innovative technology is unprecedented on the market, with the potential to support people in recognising a variety of cultural musical instruments.

In future work, we will seek to improve the model by going beyond the use of our authorised audio clips to introduce a variety of data. This can be achieved by using licence-free music databases to generate samples that better predict the use of our website in real, everyday scenarios.

Sound Infusion is an online platform, enabling users to explore and arrange the world’s largest database of instrument samples. We value interculturalism, and take people on a global musical journey, allowing them to hear, play and arrange diverse sounds. Our instrumental detection system gives users the opportunity to discover new instruments, highlighting their cultural significance for the purpose of knowledge and diversification.

To read Rezza’s research paper ‘Multi-Instrument Detection in Culture Musics Using Machine Learning Models’, click here.

About the author:

Share this Post