correlation coefficients. Some of the values are positive, some are negative. And all the values are 1.0 diagonally.
Correlation Coefficients: 0 1 2 3 0 1.000000 0.999898 -0.647059 0.993540 1 0.999898 1.000000 -0.657860 0.991822 2 -0.647059 -0.657860 1.000000 -0.556357 3 0.993540 0.991822 -0.556357 1.000000
We know that each column of a DataFrame specifies a feature or attribute that can be expressed using a random variable. So, when we call the function DataFrame.corr(), it computes the correlation coefficients between each pair of variables and gives us the correlation matrix.
So, if we want to find out the correlation coefficient of variable 0 and 1, we need to look into row 0 and column 1 or row 1 and column 0 (both the values are the same). A positive value indicates that the variables are postively correlated. In other words, if we increase one variable, the other variable also increases. And a negative correlation coefficient indicates the variables are negatively corrlated. In other words, if we increase one variable, the other variable decreases.
Moreover, the correlation coefficient is a value between -1 to +1. If the absolute value of the correlation coefficient is more and closer to 1, that means the variables are strongly correlated.
Now, let’s look into the following DataFrame.
DataFrame: 0 1 2 3 0 10 12 A True 1 52 61 B False 2 9 10 C True
Only column 0 and column 1 contain numbers. Column 2 contains strings and column 3 contains boolean values. Now, we want to compute the correlation coefficient of column 0 and column 1 only. How should we do that?
We can use the following Python code to compute the correlation coefficient of two columns of a DataFrame.
import pandas df2 = pandas.DataFrame([[10, 12, 'A', True], [52, 61, 'B', False], [9, 10, 'C', True]]) print("DataFrame: \n", df2) print("Correlation Coefficients of column 0 and column 1: \n", df2.iloc[:, [0, 1]].corr())
Here, we are using DataFrame.iloc[] to select the first and the second columns only. Please note that …






0 Comments