Baselight

All Shark Tank (US) Pitches & Deals

Can you predict if a company/product pitch would succeed in Shark tank?

@kaggle.neiljs_all_shark_tank_us_pitches_deals

About this Dataset

All Shark Tank (US) Pitches & Deals

Context

Shark Tank is a great show based on an interesting concept wherein entrepreneurs and founders pitch their businesses in front of seasoned investors (aka sharks) who decide whether or not to invest in the businesses based on multiple parameters.

The show has many versions in different regions, and this database is for the US version, featuring, among other guest sharks, Mark Cuban, Robert Herjavec, Daymond John, Kevin O'Leary, Barbara Corcoran, and Lori Greiner.

The investment decisions on the show are merely handshake deals which are followed up by a detailed due-diligence and subsequent final investment decisions. Many of the deals taking place on the show do not go through.

Among many other points, some of the major decision vectors for the sharks to make a deal are:

  1. The relevance of the business to their fields of interest and exposure (Daymond for fashion, Lori for QVC, Kevin for Wines, etc.)
  2. The pitch quality (preparation, energy, etc. of the presenter)
  3. Health of the business (Financials, debts, etc.)
  4. Valuation (The most important)

Since elements such as pitch quality, exact financials disclosed, and specifics of what communication happened between the sharks and the presenters can be considered to be copyrighted to the show, I picked up the publically available details of the pitches and the results (deal = YES OR NO) and the associated shark(s) from websites, consolidated and cleaned the data, and presented in this dataset.

The idea is that a text vector based learning algorithm might be able to predict, given a description of a new pitch, how likely is the pitch to succeed in the shark tank, and even which shark might be more interested in the pitch.

Content

The dataset contains following headers:

  1. Season_Epi_code - The data spans all 8 seasons of Shark Tank (US) and this code gives the season and the episode for indexing purposes. Format = SEE (101 = 1st season 1st Episode, 826 = 8th season 26th Episode)

  2. Pitched_Business_Identifier - A short name of the pitched business

  3. Pitched_Business_Desc - Brief description of the pitched business. Combination of text from more than one source has been added here, and there might be repetition or a very small description.

  4. Deal_Status - Status of whether the pitched business got a deal in the episode where at least one shark and the presenters agreed on a particular deal. Format = (YES = 1, NO = 0)

  5. Deal_Shark - Which of the most common sharks agreed on the episode along with the presenters for a deal?
    Format = either single shark's initials or '+' separated values of more than one shark's initials

Initials used:
BC - Barbara Corcoran
DJ - Daymond John
KOL - Kevin O'Leary
LG - Lori Greiner
MC - Mark Cuban
RH - Robert Herjavec

Note: While I have tried my best to collect, consolidate and clean the data, I do not make any claims of completeness or accuracy of data in the dataset. The user assumes the entire risk with respect to the use of this dataset.

Acknowledgements

ABC for producing such an entertaining, educational and well-managed show.
Photo by Jakob Owens on Unsplash

Inspiration

The idea is that a text vector based learning algorithm might be able to predict, given a description of a new pitch, how likely is the pitch to succeed in the shark tank, and even which shark might be more interested in the pitch.

I have planned to cover the 5 most interesting solutions (EDA as well as actual prediction models) in a series of blog posts on thinkpatcri.com

Share link

Anyone who has the link will be able to view this.