Vu Trong Phung Novels Audio Dataset by Kaggle | Other

About this Dataset

Vu Trong Phung Novels Audio Dataset

Vietnamese TTS Dataset: Novels by Vu Trong Phung

By Thông Nguyễn (From Huggingface) [source]

About this dataset

This dataset consists of audio files containing Vietnamese text-to-speech recordings. The audio recordings are specifically designed for training a Vietnamese Text-to-Speech (TTS) model using novels written by Vu Trong Phung. The dataset is approximately 5.4GB in size and has a total duration of 35.9 hours.

The dataset primarily includes the following files:

audio: This folder contains the audio files in which the recorded Vietnamese text-to-speech data is stored. These audio files serve as the main content for training a TTS model.

train.csv: This CSV file provides additional information about the audio data used for training purposes.

The purpose of this dataset is to facilitate research and development in Vietnamese language processing, particularly in the field of text-to-speech technology. By utilizing novels written by Vu Trong Phung as a source material, this dataset offers an extensive collection of diverse linguistic patterns and expressions commonly found in Vietnamese literature.

Researchers, developers, and enthusiasts working on TTS models can benefit from accessing this dataset to improve existing models or build new ones specifically tailored for the Vietnamese language. Additionally, linguists and language researchers can leverage this dataset to analyze and study various aspects of spoken Vietnamese.

By offering a substantial amount of high-quality speech data, this Kaggle dataset aims to provide valuable resources to advance speech synthesis technologies targeting the specific linguistic characteristics of Vietnam's national language

How to use the dataset

Understand the Dataset Structure:

The dataset consists of two main components: audio files and a train.csv file.

The audio files are in the form of Vietnamese text-to-speech recordings.

The train.csv file contains metadata and additional information related to the audio data.

Explore the Audio Files:

Use an appropriate tool or library to access and listen to the audio files.

Familiarize yourself with different styles, tones, and accents present in the recordings.

Take note of any unique characteristics or patterns within the audio data.

Utilize Metadata from train.csv:

Open and analyze the train.csv file using spreadsheet software or programming libraries.

Study columns such as 'audio' for useful information about each recording.

Extract relevant attributes like speaker's name, novel title, chapter number etc., if available
for further analysis or processing.

Preprocess Data as Per Your Requirements:

Depending on your specific TTS model requirements, preprocess the data accordingly.
For example: normalize volume levels, filter out noise or unwanted segments etc.

Train/Develop a Vietnamese TTS Model:

Leverage this dataset for training your own customized Vietnamese Text-to-Speech model using suitable techniques such as deep learning-based models like Tacotron, WaveNet etc., along with other essential preprocessing steps as per your choice.

Cross-reference with Other Sources (Optional):

Consider cross-referencing this dataset with other available data sources, such as text transcripts or translations of the novels by Vu Trong Phung.

This can help in aligning the audio data with corresponding transcriptions, enabling further analysis or improving the quality of the Text-to-Speech model.

Experiment and Iterate:

Iterate over different training strategies or preprocessing techniques to improve your TTS model based on feedback loops.

Fine-tune hyperparameters and evaluate the performance against relevant quality metrics to achieve desired results.

Respect Copyright and Licensing:

Keep in mind that this dataset contains copyrighted materials from Vu Trong Phung's novels

Research Ideas

Speech synthesis research: This dataset can be used for training and evaluating Vietnamese text-to-speech (TTS) models. Researchers can develop new algorithms and techniques for generating high-quality speech in Vietnamese using the novels by Vu Trong Phung as training data.

Audiobook production: The dataset can be used to create audiobooks in Vietnamese. By converting novels written by Vu Trong Phung into audio format, publishers or individuals can provide an audio alternative to reading books, making it accessible to visually impaired individuals or those who prefer listening instead of reading.

Language learning applications: The dataset can be utilized in language learning platforms or applications that aim to teach Vietnamese pronunciation and improve listening skills. Learners can listen to the synthesized speech generated from the novels by Vu Trong Phung, enhancing their understanding of the language's intonation, phonetics, and natural rhythm

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name	Description
audio	This column contains the audio recordings in Vietnamese text-to-speech format. The audio files are provided in various formats such as .mp3 or .wav. (Audio File)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Thông Nguyễn (From Huggingface).

Tables

Train

@kaggle.thedevastator_vu_trong_phung_novels_audio_dataset.train

2.85 KB
2 rows
2 columns


CREATE TABLE train (
  "a" VARCHAR,
  "dio" VARCHAR
);