German Credit History
financial and banking details for customers
@kaggle.ashrafkhan94_german_credit_history
financial and banking details for customers
@kaggle.ashrafkhan94_german_credit_history
The German credit dataset describes financial and banking details for customers and the task is to determine whether the customer is good or bad. The assumption is that the task involves predicting whether a customer will pay back a loan or credit. The dataset includes 1,000 examples and 20 input variables, 7 of which are numerical (integer) and 13 are categorical.
Status of existing checking account
Duration in month
Credit history
Purpose
Credit amount
Savings account
Present employment since
Installment rate in percentage of disposable income
Personal status and sex
Other debtors
Present residence since
Property
Age in years
Other installment plans
Housing
Number of existing credits at this bank
Job
Number of dependents
Telephone
Foreign worker
Some of the categorical variables have an ordinal relationship, such as Savings account, although most do not. There are two outcome classes, 1 for good customers and 2 for bad customers. Good customers are the default or negative class, whereas bad customers are the exception or positive class. A total of 70 percent of the examples are good customers, whereas the remaining 30 percent of examples are bad customers.
Good Customers: Negative or majority class (70%).
Bad Customers: Positive or minority class (30%).
A cost matrix is provided with the dataset that gives a different penalty to each misclas- sification error for the positive class. Specifically, a cost of five is applied to a false negative (marking a bad customer as good) and a cost of one is assigned for a false positive (marking a
good customer as bad).
Cost for False Negative: 5
Cost for False Positive: 1
This suggests that the positive class is the focus of the prediction task and that it is more costly to the bank or financial institution to give money to a bad customer than to not give money to a good customer. This must be taken into account when selecting a performance metric.
CREATE TABLE german (
"a11" VARCHAR,
"n_6" BIGINT -- 6,
"a34" VARCHAR,
"a43" VARCHAR,
"n_1169" BIGINT -- 1169,
"a65" VARCHAR,
"a75" VARCHAR,
"n_4" BIGINT -- 4,
"a93" VARCHAR,
"a101" VARCHAR,
"n_4_1" BIGINT -- 4.1,
"a121" VARCHAR,
"n_67" BIGINT -- 67,
"a143" VARCHAR,
"a152" VARCHAR,
"n_2" BIGINT -- 2,
"a173" VARCHAR,
"n_1" BIGINT -- 1,
"a192" VARCHAR,
"a201" VARCHAR,
"n_1_1" BIGINT -- 1.1
);Anyone who has the link will be able to view this.