Is This Sentence Completed? by Kaggle | Technology and IT

About this Dataset

Is This Sentence Completed?

Context

Predicting whether a sentence is finished or not is one of the most high-level classifications that NLP offers. If implemented, it can help e.g. to detect sentences that users forgot to finish, or that leave too much room for interpretation. In many applications, this can help tremendously to clean your text data.

With this dataset, you can build a classification model for such a task.

Content

Each item consists of a sentence and its target is_finished. Your goal is to predict whether a sentence is finished or not, e.g.:

finished: "Kaggle is such a great platform, where Data Scientists from all over the world can share their ideas and data!"
not finished: "I believe that we should just" [... just what?]

The data is collected from various news headlines. The labeling is weakly supervised using our labeling software onetask, i.e. we labeled the data both programmatically using labeling functions (e.g. dependency parsers etc.) as well as manually.

Acknowledgements

Thanks to my colleague Henrik Wenck, who provided me with the idea to publish this task on Kaggle 🙏

Inspiration

Let's build and discuss ideas! From my point of view, this task can be solved e.g.

parsing text using linguistic algorithms, to detect e.g. dependencies within the text that indicate whether a sentence is finished or not
using a recurrent architecture like RNNs
using vanilla algorithms with fine-tuned embeddings, representing the context of a sentence

Tables

Finished Sentences

@kaggle.johoetter_is_this_sentence_completed.finished_sentences

2.37 MB
53149 rows
2 columns


CREATE TABLE finished_sentences (
  "sentence" VARCHAR,
  "is_finished" VARCHAR
);

Is This Sentence Completed?

About this Dataset