Validation of Risk Recommendation model
Dear reader, we at Pexitics have launched the ItP (Intention to Pay) Q12 customer survey, which helps profile borrowers across LMRS (Law abidance, Morality, Responsibility and Self Preservation). This profile creates an LMRS score. The LMRS scores is segmented to create the decision recommendation: AVOID; CAUTION-High; CAUTION-Medium; CAUTION-Low; PROCEED.
Please check out the details of the ItP Q12 score at https://pexitics.com/index.php/pexitics/behavioral-risk-for-loans/
As mentioned, the ItP Q12 score is most efficient to understand the credit risk for New — to- Credit customers and for customers who have a credit bureau score which is sub-optimal (below the lending norms of an organization)
Today in this exercise, I will be validation the accuracy of the decisions. I am using python for this exercise.
Step 1 : We start with Exploratory Data Analysis : to understand the dataset .
import pandas as pd
df = pd.read_csv (“path to dataset//dataset .csv”)
The profiling built into pandas is a very smart way to do the EDA .
In case you do not have it downloaded, use the code # conda install pandas-profiling
# Exploratory analysis
import numpy as np
from pandas_profiling import ProfileReport
profile = ProfileReport(df,explorative=True)
#to view result in jupyter notebook or google colab
profile.to_widgets()
# to save results of pandas-profiling to a HTML file
profile.to_file(“EDA.html”)
This Profiling enables me to see the basics of the data starting with
1. Overview of the data
Dataset statistics
a. Number of variables 14
b. Number of observations 3944
c. Missing cells 0
d. Missing cells (%) 0.0%
e. Duplicate rows 0
f. Duplicate rows (%) 0.0%
g. Total size in memory 1.5 MiB
h. Average record size in memory 401.3 B
Variable types
i. NUM 8
j. CAT 6
2. A numeric and pictorial (graphical description of each variable). Delinquency is the ‘Y’ variable or variable we want to predict through the variables we have scored thru the ItP Q12 — L,M,R,S (Law Abidance, Morality, Responsibility and Self Preservation) . The Combination of these variables create the LMRS node . The Decision variable is an outcome of the micro-segmentation model which helps create the 5 decision .
3. Correlations
4. Missing values per variable
5. Top rows printed
6. Bottom rows printed
Thus, this is a comprehensive report on Exploratory Data Analysis — easy to run and save.
Step 2: We need to understand the efficiency of the LMRS Decision Triad. Read more on the LMRS Decision Triad (matrix) in the whitepaper here (https://pexitics.com/wp-content/uploads/2022/04/ItP-Q12-A-detailed-Whitepaper.pdf)
The Default customer flag in this dataset is Delinquency. The Important factor is to understand that though the decisions are divided into 5 part, the actual -on ground decision is primarily divided into only 2 — AVOID & PROCEED
Let us tabulate the frequency table to understand the effectiveness of the ItP Q12 recommendations.
# frequency table
table = pd.crosstab(df.DECISION, df.Delinquency,margins=True)
# export the table as a csv file
table.to_csv(“table.csv”)
Let us create the Confusion matrix, to better understand effectiveness.
Note: The Matrix is created ONLY for the Recommendation for Lending (AVOID + PROCEED) data. The CAUTION — Medium as per the ItP Q12 Decision triad is not used since it refers the case to credit. Thus, a clear Recommendation is AVOID / PROCEED.
Insight: This model is very effective in its decision recommendations.
It has a high accuracy — 94% — for predictions. If we break up the precision as per the correct prediction of Delinquency @ 80% of total delinquency predictions. Also, the loss of business is minimal as the correct non-delinquent prediction lies at 98%.
Did you find this interesting? Continue reading my blog to understand if this effectiveness of the model is consistent across Loan products.