Baselight

Is This Sentence Completed?

Classify whether a sentence has been finished or not

@kaggle.johoetter_is_this_sentence_completed

Loading...
Loading...

About this Dataset

Is This Sentence Completed?

Context

Predicting whether a sentence is finished or not is one of the most high-level classifications that NLP offers. If implemented, it can help e.g. to detect sentences that users forgot to finish, or that leave too much room for interpretation. In many applications, this can help tremendously to clean your text data.

With this dataset, you can build a classification model for such a task.

Content

Each item consists of a sentence and its target is_finished. Your goal is to predict whether a sentence is finished or not, e.g.:

  • finished: "Kaggle is such a great platform, where Data Scientists from all over the world can share their ideas and data!"
  • not finished: "I believe that we should just" [... just what?]

The data is collected from various news headlines. The labeling is weakly supervised using our labeling software onetask, i.e. we labeled the data both programmatically using labeling functions (e.g. dependency parsers etc.) as well as manually.

Acknowledgements

Thanks to my colleague Henrik Wenck, who provided me with the idea to publish this task on Kaggle 🙏

Inspiration

Let's build and discuss ideas! From my point of view, this task can be solved e.g.

  • parsing text using linguistic algorithms, to detect e.g. dependencies within the text that indicate whether a sentence is finished or not
  • using a recurrent architecture like RNNs
  • using vanilla algorithms with fine-tuned embeddings, representing the context of a sentence

Tables

Finished Sentences

@kaggle.johoetter_is_this_sentence_completed.finished_sentences
  • 2.37 MB
  • 53149 rows
  • 2 columns
Loading...
Loading...

CREATE TABLE finished_sentences (
  "sentence" VARCHAR,
  "is_finished" VARCHAR
);