Baselight

WORD DIFFICULTY PREDICTION

Dataset For Word Difficulty Prediction

@kaggle.kkhandekar_word_difficulty

Loading...
Loading...

About this Dataset

WORD DIFFICULTY PREDICTION

Context

Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task. Removing dependency on frequency of previously acquired words for measuring difficulty was one of our primary aims. Then we analyzed a convolutional neural network based prediction model which operates at the character level and evaluate its efficiency compared to others.

This dataset contains 40481 data instances. The various column headers are as follows:

  • Word
  • Length
  • Freq_HAL
  • Log_Freq_HAL
  • I_Mean_RT
  • I_Zscore
  • I_SD
  • Obs
  • I_Mean_Accuracy

I_Zscore determines the difficulty of the word. This value fluctuates between 0 & 1 for a word with 0 being SIMPLE & 1 being DIFFICULT

Content

The data is in CSV format. Please check the research paper for obtaining the difficulty label from the I_Z score.

Acknowledgements

Thank you AvishekGarain, Arpan Basu & Sudip KumarNaskar [citation] (https://ieee-dataport.org/open-access/dataset-word-difficulty-prediction)

The other details of the dataset and the method to obtain the difficulty labels are present in the research publication whose link is attached. For getting open-access to the publication visit https://garain.codes

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Tables

Worddifficulty

@kaggle.kkhandekar_word_difficulty.worddifficulty
  • 990.09 KB
  • 40481 rows
  • 9 columns
Loading...

CREATE TABLE worddifficulty (
  "word" VARCHAR,
  "length" BIGINT,
  "freq_hal" BIGINT,
  "log_freq_hal" DOUBLE,
  "i_mean_rt" DOUBLE,
  "i_zscore" DOUBLE,
  "i_sd" DOUBLE,
  "obs" DOUBLE,
  "i_mean_accuracy" DOUBLE
);

Share link

Anyone who has the link will be able to view this.