CNN implementation in Python with PyTorch, on audio (.wav) files (94+ on test).
- Shilat Givati
- Tomer Shay
Implementation of a neural network on the audio files. using gcommand_dataset.py that converts the .wav files into a 2D matrix (of 161 x 101).
The audio files in this dataset are ~ 1sec long, and there are 30 optional commands that can be heard in the files.
You can see the diagram of the Neural Network:
In short, the model has 6 convolutional layers, with Batch Normalize, ReLU and Max Pooling after each one. Then a Flatten layer and 4 more Fully Connected layers. The output of the neural network is 30.
- Dropout: after the first fully-connected layer
= 0.1 - Epochs
= 15 - Batch Size
= 64 - Optimizer
= Adam - Learning Rate
= 0.0001
For the network to run properly, the audio files must be organized within folders as follows:
- The
gcommandsfolder next to theex5.pyfile - A
gcommands/trainfolder with subfolders (with the names of thelabels), so that inside each folder are the.wavfiles associated with the same label. - A
gcommands/validatefolder contains subfolders (with the names of thelabels), so that within each folder are the ``.wav` files associated with the same label. - A
gcommands/testfolder contains a subfolder (its name is irrelevant), so it contains the test's.wavfiles.
The program code exports a total of 2 files:
- A
test_yfile that contains the predictions for the test. - The
BestModelcpu.pngorBestModelcuda.pngfile (based on the device on which the code runs), which contains a graph of the accuracy percentage and loss values of the training and the validation depending on the epochs.
The program gets one argument, that can be cuda. If it is, then the program will check if cuda can be used and if so, a run. If no argument is given at all, or an argument is not cuda, the program will run the code on the cpu.
running example:
$ python3 ex5.py cuda
Note that for using the dataset given in this repo, you need to download the dataset (about 1GB). You can also use google colab for running this program.
- Open the terminal.
- Clone the project by:
$ git clone https://github.com/tomershay100/Speech-Recognition-with-PyTorch.git - Run the
main.pyfile:
$ python3 ex5.py cuda
As you can see, there are several additional files. In the files you can see a report in the Hebrew language that describes the code and the model, you can see a graph that describes the success rates in train and validate within epochs, and you can see a diagram of the network structure.