Thursday, December 25, 2014

More Is Yet to Come...

Hello Everyone,

Well, It's the end of 3rd semester and I must say this was a hectic one.

So, the most challenging thing I was able to do in this semester was a semester project for one of the subjects as I will describe further.

The project was about comparing audio files and finding a match between them. It wasn't a surprise that it was titled as "Detect Audio Misappropriations". When I first started working on this one I was a bit curious about how an audio file is actually structured from inside except the fact that all data is binary. As I learnt more about different audio file extensions the picture became clear. One thing I liked about this project was the problem statement which was very well structured as it came in bits and pieces every week and I had to dwell in some new concepts. I will try to describe the whole process I followed in the project but one of the concepts I struggled with was Fast Fourier Transform which is a part of Electrical Engineering. I really wish I would have taken some Electrical Engineering classes to form my basics about it. I also feel that I made it tough for myself somehow by writing the algorithm myself. So, this project was a team project and we were suppose to handle 3 file extensions ".wav", ".mp3" & ".ogg".

For any format below was the process:

a) Validate the format by reading its header data (Every format has its header and the header has specific values for each format).
b) If the header was in valid format, the file was converted in a canonical format which apparently was ".wav".
c) Then the audio data of the file was read (samples).
d) 16384 samples were read in each iteration and this was termed as 1 bin of samples.
e) Each sample was processed through Hanning Window and Fast Fourier Transform and a single value was generated for every bin.
f) Further these values were compared for two files to detect a match.

This is actually a brief description of the process and it actually involves some more important concepts which might be too thorough for this post.

While I was working on this project, I also tried to implement one of the great research papers by Haitsma & Kalker (http://ismir2002.ismir.net/proceedings/02-FP04-2.pdf). We were able to implement about 90% of this research paper but we got stuck due to less knowledge about terms such as frequency bands. I would say there is a scope of improvement of what we now have implemented and scope is handling silence in audio.

For now, the developed product takes two directories or files and compares all the files in first directory to all the files in the second directory and reports a match if there is same audio content of more than 5 seconds. Future improvements can be handling silence as I mentioned above.

This process was interesting to come up with and I really thank one of my friends Raymond Li who worked as a great class mate.

Below is the github link to the project:
https://github.com/vishalrajpal/Assignment-7

Please feel free to drop your suggestions. Thanks for reading.

Vishal