Lending Club Data Analysis

Essay by Ashutosh Lall • December 1, 2018 • Research Paper • 2,623 Words (11 Pages) • 1,470 Views

Essay Preview: Lending Club Data Analysis

prev next

Report this essay

Page 1 of 11

Lending Club Data Analysis

- Dr. Soper

[pic 1]

ISDS 577: Project Phase – 1

Presented by:

Ashutosh Lall

Gurleen Kaur

Tanisha Munshi

Wenjie Li

Contents

Introduction 3

A. Data 4

B. Research questions 10

C. Analytical methods 13

D. Figures and tables 15

E. References 17

F. Data Visualizations 18

Introduction

Lending Club is a peer to peer lending company based in United States, in which investors provide funds for potential borrowers and earn a profit depending on the risk they take.

Lending Club enables borrowers to create unsecured personal loans between $1,000 and $40,000. The standard loan period is three years. Investors can search and browse the loan listings on Lending Club website and select loans that they want to invest in based on the information supplied about the borrower, amount of loan, loan grade, and loan purpose. Investors make money from interest. Lending Club makes money by charging borrowers an origination fee and investors a service fee.

[pic 2]

A. Data

Where will you get the data for your project?

The data for our project is sourced from:

https://www.kaggle.com/wendykan/lending-club-loan-data/home

These files contain complete loan data for all loans issued through the 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

2. Are there any legal or privacy concerns associated with your group possessing these data? If so, what measures will you take to protect your data?

There are no legal or privacy concerns associated with possessing this data since it is obtained from open data source i.e. Kaggle.

3. In what format(s) will your data be provided to you?

The data is available to us in csv (comma separated values) format. This csv file will be used to read the data in Python and converted to .xlsx format to use Tableau for creating visualizations.

4. Will it be necessary to accomplish two or more datasets for your project? If so, how do you plan to accomplish this?

For our project, we aim to build a prediction model which will predict whether a loan is good or bad based on various factors. Our current dataset includes complete loan information from 2007 Q1 – 2015 Q4. For prediction, we will be extracting data of 2016 Q1, from Lending Club’s official website and feed it to our prediction model.

5. Will it be necessary to clean your data in any way? If so, what approach do you plan to use to clean your data?

To perform efficient data analysis, our data will need to be cleaned up since it has missing values and irrelevant data. Following steps will be taken to clean up the data:

-Handling missing values: Our dataset contains 75 variables and it was observed that a lot of them had missing values. Therefore, we will remove the variables which have 25% or more of missing data since they will not contribute to our analysis.

-Removing irrelevant variables: After removing variables containing more than 25% missing data, we plan to walk through our remaining list of variables slice by slice and keeping what we need and cleaning the rest. There are certain irrelevant variables in our dataset like id, member_Id which do not have any predictive power. Other variables like url, emp_title, desc, title which are more descriptive in nature will also be removed.

-Creating dummy variables and reducing categories: After removing irrelevant variables, we will create dummy variables for categorical variables such as grade, Sub_trade, Home_ownership, purpose, initial_list_status, application_type, and emp_length_int. Because all categories except emp_length_int are well defined for our data set, we decide to keep all categories. For emp_length_int, we will reduce categories to three categories, if_emp_short_term, and If_emp_ mid_term.

-Correlation analysis: After creating dummy variables and reducing category, we will check the correlation between the variables and remove the variables with a strong correlation to get a more accurate result.

6. Which attributes/variables will your dataset contain?

After data pre-processing, our dataset will contain the following variables,

Variable name	Description
loan_amnt	The listed amount of the loan applied by the borrower.
funded_amnt_inv	The total amount committed by the investors for that loan.
term	The number of payments on the loan (36 or 60 months).
int_rate	Interest Rate on the loan.
installment	The monthly payment owed by the borrower if the loan originates.
grade	LC assigned loan grade.
subgrade	LC assigned loan subgrade.
home_ownership	The home ownership status provided by the borrower during registration.
annual_inc	The self-reported annual income provided by the borrower.
issue_d	The month in which the loan was funded.
purpose	A category provided by the borrower for the loan request.
addr_state	The state provided by the borrower in the loan application.
dti	Borrower’s debt-to-income ratio.
delinq_2yrs	Number of 30+ days past due incidences of delinquency in the borrower’s credit line for the past 2 years.
earliest_cr_line	The month the borrower’s earliest reported credit line was opened.
inq_last_6mnths	The number of inquiries in past 6 months.
open_acc	The number of open accounts in the borrower’s credit line.
pub_rec	Number of derogatory public records.
revol_bal	Total credit revolving balance.
revol_util	Revolving line utilization rate.
total_acc	The total number of credit lines in the borrower’s credit file.
initial_list_status	The initial listing status of the loan.
collections_12_mnths_ex_med	Number of collections in 12 months excluding medical conditions.
application_type	Indicates whether the loan application is individual or joint.
acc_now_delinq	The number of accounts in which the borrower is now delinquent.
tot_coll_amnt	Total collection amounts ever owed.
tot_cur_bal	Total current balance of all accounts.
total_rev_hi_lim	Total revolving high credit/credit limit
loan_condition	Target variable which represents the status of the loan (1 = bad, 0 = good).
emp_length_int	Employment length of the borrower.

...

Download as: txt (16.3 Kb) pdf (470 Kb) docx (627.6 Kb)

Continue for 10 more pages »

Read Full Essay Save

Only available on AllBestEssays.com

Similar Essays

Rondell Data Corporation Case Analysis

Rondell Data Corporation Rondell Data Corporation is a company with a significantly big size and it has been very innovative and known for high quality

1,287 Words | 6 Pages
Data Analysis Mba

1. Kate Denney was the Executive Vice President of Human Resources at a major Canadian financial institution. She had recently received a complaint from the

2,301 Words | 10 Pages
Just Deserts Finale Data Analysis

MEMORANDUM To: Paul Conforti, CEO Finale Desserterie & Bakery From: Felicity Klass, Market Research Analyst Date: April 27, 2012 Subject: Finale Data Analysis Dear Mr.

909 Words | 4 Pages
Engineering Statistics Data Analysis

Engineering Statistics Homework Fall 2015 Submit a written report with the answers for the following problem. Use a wordprocessor for the report and Minitab for

7,966 Words | 32 Pages
Statistical Tools and Data Analysis

Statistical Tools and Data Analysis Brett Zurbrick QSO-510 02-06-2016 ________________ To understand the needs of A-Cat Corporation with regard to the required amount of transformers

665 Words | 3 Pages
Lending Club Company Case Study

Company background Building Lending club was established in 2006 with a goal of allowing individuals to borrow and lend directly among themselves. After making P2P

600 Words | 3 Pages
Global Shark Attacks Data Analysis

Global Shark Attacks Data Analysis An analysis of shark attacks reported worldwide from 1854-2015 Introduction: A shark attack is an attack on a human by

5,455 Words | 22 Pages
Implementing Weka as a Platform for Data Analysis Using Machine Learning Algorithms

IMPLEMENTING WEKA AS A PLATFORM FOR DATA ANALYSIS USING MACHINE LEARNING ALGORITHMS A Research Project submitted in partial fulfillment for the award of the Degree

3,151 Words | 13 Pages