R-Squared

Halil İbrahim Hatun
3 min readSep 19, 2022

--

In data science, we create and use regression models of the process of estimating a variable (the dependent variable) using one or more variables. So, how will we understand the performance of the regression model we have created?
One of the metrics used to measure regression performance is R — squared.

What is R-Squared?

Let’s go through an example to explain what R-squared is. We have data. We have given this data to a regression model by making the necessary preprocessing applications. Then a regression line was formed. So is this regression line appropriate? Or how good is this regression line, could it be better? R-squared is a metric that allows us to get answers to such questions. It is a metric that shows us how well the regression line formed, with a numerical value in an appropriate position.

How to calculate R-Squared?

R-squared is obtained by dividing the sum of the squares of the distance of each point from the regression line by the sum of the squares of the distance of each moment from the mean and subtracting the result from 1.

There are some exceptions when interpreting the R-square metric. For example, logically, if R-Squared is low, we think that the fit of the model is bad, and if it is high, we think that the fit of the model is good. But this is not always the case. In some data, this situation varies. Therefore, it is not correct to evaluate the performance of the model only with the R-Squared metric.

Adjusted R-Squared

As the number of independent values increases, the R-Squared metric will increase indirectly. For example, let’s say we’re calculating the R-Squared of a house estimate. Next, add an attribute called the average height of previous homeowners to this home estimate data. This attribute has nothing to do with house prices, but R-Squared will be higher. In other words, it will be deduced that the prices of the houses with a high average height of the old house owners are higher. This approach is wrong. We use the Adjusted R-squared metric to improve this situation as much as possible. This metric’s expected value is the number of individual elements.

Let’s examine R-Squared and Adjusted R-Squared metrics by applying

Firstly, we are importing libraries and methods that we use

We delete the “Posted On” feature for it is not necessary for the regression model.

Preprocessing Part

Model Split Part

shape control image

Regression Part

results of the main regression part
scoring df

Visualization of R-Squared And Adjusted R-Squared Values

References

--

--