Baselight
Sign In
kaggle

Hindi-audio-speech To Text-or-visa-versa

@kaggle.rohansinghjadoan_hindi_audio_speech_detection

Loading...
Loading...

hindi-audio_speech-to-text use for fine tune speech model for hindi language

🗂️ Dataset Schema Description

This dataset provides paired audio-to-text samples for training and evaluating speech recognition or audio-to-speech models. Each record represents a unique audio clip along with its metadata, transcription, and reference URLs.

Fields:

user_id – An anonymized identifier representing the speaker or user who provided the audio recording.

recording_id – A unique identifier assigned to each audio sample in the dataset.

language – ISO language code of the spoken audio (e.g., "hi" for Hindi, "en" for English).

duration – The total length of the audio clip in seconds. Useful for filtering long or short samples and batching data during model training.

rec_url_gcp – Direct URL to the raw audio file stored on cloud infrastructure (e.g., Google Cloud Storage). This serves as the main input for model training or inference.

transcription_url – URL to the corresponding ground-truth transcription text for each audio file. This acts as the label or target text for supervised learning tasks.

metadata_url – Link to additional metadata about the recording (e.g., device type, accent, background noise, recording conditions). While optional, it can provide valuable insights for analysis, model robustness, and domain adaptation.

💡 Usage

This dataset is ideal for:

Speech-to-text (ASR) model training and evaluation

Audio feature extraction and preprocessing

Multilingual speech research

Acoustic environment analysis and speaker variation studies


Related Datasets

Share link

Anyone who has the link will be able to view this.