# MM018 – Lies, Damned Lies! Why is my ML/DL Model lying to me?

Today I want to explore why models sometimes seem good but they are terribly wrong. Why some metrics like RMSE/R2/Accuracy are good, or actually they are damn good, but in the real world they are not. What happens? Bet, let’s do some experiments and understand the basics.

I want to generate some noisy parabolic data (polynomial curve with degree equal to 2) and try to understand if I can create a model that fits the parabola.

``degrees = (2, 4, 8, 16) # Polynomial curve orders that I want to fit my model``
``xVals1, yVals1 = getData('Dataset 1.txt') #first dataset``
``````pdval1 = pd.Series(yVals1)
pdval1.plot();``````

Ok, now I want to try to generate my model and trying to fit the polynomial curves of 2,4,8 and 16 degrees.

To understand if my model is good, I would use the R2 metrics. I will not explaib the mathematical point of view of R2 metric, but it basically should go from 0 to 1. If it is equal to 1 the model fits perfectly the dataset.

``models1 = genFits(xVals1, yVals1, degrees) # train the model``
``testFits(models1, degrees, xVals1, yVals1,'DataSet 1.txt') #test and show the fit``

Seems that the 16th-degree curve is a very good fit, let’s try with the second generated dataset. Remember we started from a parabolic curve (degree equal to 2) and then we just generated some noise.

``````xVals2, yVals2 = getData('Dataset 2.txt')
pdval2 = pd.Series(yVals2)
pdval2.plot();``````
``````models2 = genFits(xVals2, yVals2, degrees)
testFits(models2, degrees, xVals2, yVals2, 'DataSet 2.txt')``````

Again seems that degree 16 is the best model we can get. But why R2 is so good, what are we missing? Introducing some noise we got the 16th-degree model?
Let’s try to swap the models and dataset, in literature, this is called cross-validation. We test different dataset to the same model.

``````#dataset 2 against model 1
testFits(models1, degrees, xVals2, yVals2,'DataSet 2/Model 1')``````
``````#dataset 1 against model 2
testFits(models2, degrees, xVals1, yVals1,'DataSet 1/Model 2')``````

Ook! Now we have the evidence that 16th degree is not the best model for our datasets. But still, we have to choose between 2 and 4 degrees.
How can we decide?
Well, we have 2 ways:

1. A rule of thumb. Like the Occam Razor: the simplest decision is the best one.
2. Empirical rule. Like generate more dataset and test them against the model.

I would like to suggest the second one!
Let’s try another example, let’s generate a line:

``````xVals = (0,1,2,3)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values');``````
``````a,b,c = pylab.polyfit(xVals, yVals, 2) #let's try to fit the data ( a line ), with a polynomial courve
print('a =', round(a, 4), 'b =', round(b, 4), 'c =', round(c, 4)) #let's get the coefficients``````
``a = -0.0 b = 1.0 c = 0.0``

As you can see, the library understood that we had a line because the model is:

y = ax^2 + bx + c

so, if we insert the coefficients above we have:

y = x

exactly our line. Let’s see the predictions;

``````estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');``````
``R-squared =  1.0``

Perfect match! Now let’s wide the horizon and recompute the R2 metrics:

``````pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');``````
``R-squared =  1.0``

Perfect Model! Ok, so we assumed that the basics are solid, let’s add some noise.

``````#almost a line
pylab.figure()
xVals = (0,1,2,3)
yVals = (0,1,2,3.1) #just slightly noised
pylab.plot(xVals, yVals, label = 'Actual values');``````
``````model = pylab.polyfit(xVals, yVals, 2)
print(model)
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');``````
``````[0.025 0.955 0.005]
R-squared =  0.9999057936881771``````

Remember the equation before? Now we have a Parabolic model:

``y = 0.025x^2 + 0.955x + 0.005``

But, is it correct? Let’s wide again the horizon

``````pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');``````
``R-squared =  0.7026164813486402``

Well, close call. As you can see a 0.9999 R2 precision is not the ONLY thing we have to reach to create a perfect Machine learning model. We have always to test it into the right conditions, or we can just fall into a trap!
But it was a close call? Obviously not! See what happens with a range of 1000!

``````pylab.figure()
#Extend domain
xVals = xVals + (1000,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');``````
``R-squared =  -753.8806225945068``

R-squared = -753.88. This means that our model is a disaster!!
In my opinion, testing the model is the most important and critical phase in a machine learning project, because it’s easy to be deceived by good numbers and luckly runs.

What do you think?
Keep in touch!
Bye,
Graziano

Categorie: Analytics