Dataset Information
Additional Information
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
There are four datasets:
- bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]
- bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.
- bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).
- bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs).
The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).
The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
Data Dictionary
- age: Age of the client.
- job: Type of job the client holds.
- marital: Marital status of the client.
- education: Client's education level.
- default: Indicates if the client has credit in default.
- balance: Average yearly balance of the client, in euros.
- housing: Indicates if the client has a housing loan.
- loan: Indicates if the client has a personal loan.
- contact: Type of communication mode used for contact.
- day: Day of the last contact.
- month: Month of the last contact.
- duration: Duration of the last contact in seconds.
- campaign: Number of contacts performed during this campaign.
- pdays: Days since the client was last contacted from a previous campaign (999 indicates the client was not contacted).
- previous: Number of contacts before this campaign.
- poutcome: Outcome of the previous marketing campaign.
- y: Indicates if the client subscribed to a term deposit.