An accent is the distinctive way words are pronounced. Every speaker has an accent, which varies by gender, age, formality, social class, geographical region, and native language. Those who speak English as a second language have what many regard as “foreign” accents, but even native speakers have some sort of accent, however subtle.
When we’re born, we’re blessed with the ability to create whatever sounds we choose. Unfortunately, the decision to pick up an accent really isn’t up to us. Each language around the world focuses on differing sounds. It’s nurture and not nature that determines the particular strengths we develop in speaking over the weaknesses we “inherit”.
Accent recognition is an important task but a complex problem due to the numerous characteristics that set accents apart. Accents differ by voice quality, phoneme pronunciation, and prosody. Since it is difficult to extract these exact features, existing work uses alternate features such as spectral features, which captures the frequency of speech. Such features include the Mel-Frequency Cepstral Coefficient (MFCC), Spectrogram, Chromagram, Spectral Centroid, and Spectral Roll-off, which are extracted from raw audio samples.In this work, these five features were used to train a 2-layer Convolution Neural Network on a dataset of six distinct language-accents, namely English, French, German, Italian, UK, US. Your Task is to build a machine learning model that predicts to which class a user’s accent belongs to based on the Explanatory variables, obtained using MFCC on the original time domain soundtrack of the maximum 1s of reading of a word