hindi-audio_speech-to-text use for fine tune speech model for hindi language

🗂️ Dataset Schema Description

This dataset provides paired audio-to-text samples for training and evaluating speech recognition or audio-to-speech models. Each record represents a unique audio clip along with its metadata, transcription, and reference URLs.

Fields:

user_id – An anonymized identifier representing the speaker or user who provided the audio recording.

recording_id – A unique identifier assigned to each audio sample in the dataset.

language – ISO language code of the spoken audio (e.g., "hi" for Hindi, "en" for English).

duration – The total length of the audio clip in seconds. Useful for filtering long or short samples and batching data during model training.

rec_url_gcp – Direct URL to the raw audio file stored on cloud infrastructure (e.g., Google Cloud Storage). This serves as the main input for model training or inference.

transcription_url – URL to the corresponding ground-truth transcription text for each audio file. This acts as the label or target text for supervised learning tasks.

metadata_url – Link to additional metadata about the recording (e.g., device type, accent, background noise, recording conditions). While optional, it can provide valuable insights for analysis, model robustness, and domain adaptation.

💡 Usage

This dataset is ideal for:

Speech-to-text (ASR) model training and evaluation

Audio feature extraction and preprocessing

Multilingual speech research

Acoustic environment analysis and speaker variation studies

Related Datasets

370k English Words Corpus

@kaggle
AI Performance On Language Tasks

@owid
Effective Fertility Rates (Malani And Jacob)

@owid
Poverty Rates

@ukgov
Employment Vacancies Notified To DfC

@ukgov
Employment Vacancies Notified To DfC

@ukgov

370k English Words Corpus

AI Performance On Language Tasks

Effective Fertility Rates (Malani And Jacob)

Poverty Rates

Employment Vacancies Notified To DfC

Employment Vacancies Notified To DfC