Baselight

Classification Of Robots From Their Conversation

Develop an ML model that predicts the robot from their sequence

@kaggle.msk1097_classification_of_robots_from_their_conversation

About this Dataset

Classification Of Robots From Their Conversation

**Description:
**
5 Robots, named quite unexpectedly **0,1,2,3,4 **are having a uniform conversation, where each of them spits out a series of 10 numbers at a time in a round-robin fashion. The task is to train a model which can predict the robot when given the 10 numbers spoken by it, with a good accuracy. A log of a long conversation between these 5 robots has been given, this is your datasets.
**A snippet of their conversation: **

Task - Develop an ML model that predicts the robot from their sequence with a good accuracy

Help Notes:

  1. It is a classification problem.
  2. The file has 500001 lines.
  3. The first column is source. This column gives us the label for every row of entries. The label can have value - 0,1,2,3,4. So there are 5 possible labels (the five robots.)
  4. The features here are the sequence of 10 numbers. For each row it is num1, num2, num3, num4, num5, num6, num7, num8, num9, num10. Thus 10 numbers.
  5. In this classification problem your input while testing/validating your model with be a sequence of 10 numbers - i.e. any row from the datasets (without the first column) and the output will be predicted source having potential values - 0,1,2,3,4 (which will mostly be one hot encoded making them like 10000,01000,00100,00010,00001)
  6. You will be training and testing your model on not a single input sequence, but a train of inputs, traditionally the x-train. And your labels (the first column here) would sit in a y-train.
  7. Also, the dataset is big, don’t try to use all data, sample it out. Make training, test and validation trains out of it.

Share link

Anyone who has the link will be able to view this.