**Description:
**
5 Robots, named quite unexpectedly **0,1,2,3,4 **are having a uniform conversation, where each of them spits out a series of 10 numbers at a time in a round-robin fashion. The task is to train a model which can predict the robot when given the 10 numbers spoken by it, with a good accuracy. A log of a long conversation between these 5 robots has been given, this is your datasets.
**A snippet of their conversation: **
Task - Develop an ML model that predicts the robot from their sequence with a good accuracy
Help Notes:
- It is a classification problem.
- The file has 500001 lines.
- The first column is source. This column gives us the label for every row of entries. The label can have value - 0,1,2,3,4. So there are 5 possible labels (the five robots.)
- The features here are the sequence of 10 numbers. For each row it is num1, num2, num3, num4, num5, num6, num7, num8, num9, num10. Thus 10 numbers.
- In this classification problem your input while testing/validating your model with be a sequence of 10 numbers - i.e. any row from the datasets (without the first column) and the output will be predicted source having potential values - 0,1,2,3,4 (which will mostly be one hot encoded making them like 10000,01000,00100,00010,00001)
- You will be training and testing your model on not a single input sequence, but a train of inputs, traditionally the x-train. And your labels (the first column here) would sit in a y-train.
- Also, the dataset is big, don’t try to use all data, sample it out. Make training, test and validation trains out of it.