Hindi-audio-speech To Text-or-visa-versa
hindi-audio_speech-to-text use for fine tune speech model for hindi language
@kaggle.rohansinghjadoan_hindi_audio_speech_detection
hindi-audio_speech-to-text use for fine tune speech model for hindi language
@kaggle.rohansinghjadoan_hindi_audio_speech_detection
🗂️ Dataset Schema Description
This dataset provides paired audio-to-text samples for training and evaluating speech recognition or audio-to-speech models. Each record represents a unique audio clip along with its metadata, transcription, and reference URLs.
Fields:
user_id – An anonymized identifier representing the speaker or user who provided the audio recording.
recording_id – A unique identifier assigned to each audio sample in the dataset.
language – ISO language code of the spoken audio (e.g., "hi" for Hindi, "en" for English).
duration – The total length of the audio clip in seconds. Useful for filtering long or short samples and batching data during model training.
rec_url_gcp – Direct URL to the raw audio file stored on cloud infrastructure (e.g., Google Cloud Storage). This serves as the main input for model training or inference.
transcription_url – URL to the corresponding ground-truth transcription text for each audio file. This acts as the label or target text for supervised learning tasks.
metadata_url – Link to additional metadata about the recording (e.g., device type, accent, background noise, recording conditions). While optional, it can provide valuable insights for analysis, model robustness, and domain adaptation.
💡 Usage
This dataset is ideal for:
Speech-to-text (ASR) model training and evaluation
Audio feature extraction and preprocessing
Multilingual speech research
Acoustic environment analysis and speaker variation studies
CREATE TABLE hindi_audio_detection (
"user_id" BIGINT,
"recording_id" BIGINT,
"language" VARCHAR,
"duration" BIGINT,
"rec_url_gcp" VARCHAR,
"transcription_url_gcp" VARCHAR,
"metadata_url_gcp" VARCHAR
);Anyone who has the link will be able to view this.