MM018 – Lies, Damned Lies! Why is my ML/DL Model lying to me?
Today I want to explore why models sometimes seem good but they are terribly wrong. Why some metrics like RMSE/R2/Accuracy are good, or actually they are damn good, but in the real world they are not. What happens? Bet, let’s do some experiments and understand the basics.
I want to generate some noisy parabolic data (polynomial curve with degree equal to 2) and try to understand if I can create a model that fits the parabola.
degrees = (2, 4, 8, 16) # Polynomial curve orders that I want to fit my model
xVals1, yVals1 = getData('Dataset 1.txt') #first dataset
pdval1 = pd.Series(yVals1)
pdval1.plot();
![](https://i.imgur.com/hVKiK8c.png)
Ok, now I want to try to generate my model and trying to fit the polynomial curves of 2,4,8 and 16 degrees.
To understand if my model is good, I would use the R2 metrics. I will not explaib the mathematical point of view of R2 metric, but it basically should go from 0 to 1. If it is equal to 1 the model fits perfectly the dataset.
models1 = genFits(xVals1, yVals1, degrees) # train the model
testFits(models1, degrees, xVals1, yVals1,'DataSet 1.txt') #test and show the fit
![](https://i.imgur.com/XgeLt9u.png)
Seems that the 16th-degree curve is a very good fit, let’s try with the second generated dataset. Remember we started from a parabolic curve (degree equal to 2) and then we just generated some noise.
xVals2, yVals2 = getData('Dataset 2.txt')
pdval2 = pd.Series(yVals2)
pdval2.plot();
![](https://i.imgur.com/XWinaI9.png)
models2 = genFits(xVals2, yVals2, degrees)
testFits(models2, degrees, xVals2, yVals2, 'DataSet 2.txt')
![](https://i.imgur.com/rQc92yd.png)
Again seems that degree 16 is the best model we can get. But why R2 is so good, what are we missing? Introducing some noise we got the 16th-degree model?
Let’s try to swap the models and dataset, in literature, this is called cross-validation. We test different dataset to the same model.
#dataset 2 against model 1
testFits(models1, degrees, xVals2, yVals2,'DataSet 2/Model 1')
![](https://i.imgur.com/JbwrzQI.png)
#dataset 1 against model 2
testFits(models2, degrees, xVals1, yVals1,'DataSet 1/Model 2')
![](https://i.imgur.com/VO1gB76.png)
Ook! Now we have the evidence that 16th degree is not the best model for our datasets. But still, we have to choose between 2 and 4 degrees.
How can we decide?
Well, we have 2 ways:
- A rule of thumb. Like the Occam Razor: the simplest decision is the best one.
- Empirical rule. Like generate more dataset and test them against the model.
I would like to suggest the second one!
Let’s try another example, let’s generate a line:
xVals = (0,1,2,3)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values');
![](https://i.imgur.com/QAuv9Ho.png)
a,b,c = pylab.polyfit(xVals, yVals, 2) #let's try to fit the data ( a line ), with a polynomial courve
print('a =', round(a, 4), 'b =', round(b, 4), 'c =', round(c, 4)) #let's get the coefficients
a = -0.0 b = 1.0 c = 0.0
As you can see, the library understood that we had a line because the model is:
y = ax^2 + bx + c
so, if we insert the coefficients above we have:
y = x
exactly our line. Let’s see the predictions;
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 1.0
![](https://i.imgur.com/3akPSTV.png)
Perfect match! Now let’s wide the horizon and recompute the R2 metrics:
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 1.0
![](https://i.imgur.com/AL2a409.png)
Perfect Model! Ok, so we assumed that the basics are solid, let’s add some noise.
#almost a line
pylab.figure()
xVals = (0,1,2,3)
yVals = (0,1,2,3.1) #just slightly noised
pylab.plot(xVals, yVals, label = 'Actual values');
![](https://i.imgur.com/9rRG0H7.png)
model = pylab.polyfit(xVals, yVals, 2)
print(model)
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
[0.025 0.955 0.005]
R-squared = 0.9999057936881771
![](https://i.imgur.com/PAvIRQL.png)
Remember the equation before? Now we have a Parabolic model:
y = 0.025x^2 + 0.955x + 0.005
But, is it correct? Let’s wide again the horizon
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 0.7026164813486402
![](https://i.imgur.com/lKymTXO.png)
Well, close call. As you can see a 0.9999 R2 precision is not the ONLY thing we have to reach to create a perfect Machine learning model. We have always to test it into the right conditions, or we can just fall into a trap!
But it was a close call? Obviously not! See what happens with a range of 1000!
pylab.figure()
#Extend domain
xVals = xVals + (1000,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = -753.8806225945068
![](https://i.imgur.com/cjGZKEb.png)
R-squared = -753.88. This means that our model is a disaster!!
In my opinion, testing the model is the most important and critical phase in a machine learning project, because it’s easy to be deceived by good numbers and luckly runs.
What do you think?
Keep in touch!
Bye,
Graziano