Welcome to python_speech_features’s documentation!¶
This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs are, and would like to know more have a look at this MFCC tutorial: http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/.
You will need numpy and scipy to run these files. The code for this project is available at https://github.com/jameslyons/python_speech_features .
features.mfcc()- Mel Frequency Cepstral Coefficients
features.fbank()- Filterbank Energies
features.logfbank()- Log Filterbank Energies
features.ssc()- Spectral Subband Centroids
To use MFCC features:
from features import mfcc from features import logfbank import scipy.io.wavfile as wav (rate,sig) = wav.read("file.wav") mfcc_feat = mfcc(sig,rate) fbank_feat = logfbank(sig,rate) print(fbank_feat[1:3,:])
From here you can write the features to a file etc.