MM018 – Lies, Damned Lies! Why is my ML/DL Model lying to me?
Today I want to explore why models sometimes seem good but they are terribly wrong. Why some metrics like RMSE/R2/Accuracy are good, or actually they are damn good, but in the real world they are not. What happens? Bet, let’s do some experiments and understand the basics.
I want to generate some noisy parabolic data (polynomial curve with degree equal to 2) and try to understand if I can create a model that fits the parabola.
degrees = (2, 4, 8, 16) # Polynomial curve orders that I want to fit my model
xVals1, yVals1 = getData('Dataset 1.txt') #first dataset
pdval1 = pd.Series(yVals1)
pdval1.plot();

Ok, now I want to try to generate my model and trying to fit the polynomial curves of 2,4,8 and 16 degrees.
To understand if my model is good, I would use the R2 metrics. I will not explaib the mathematical point of view of R2 metric, but it basically should go from 0 to 1. If it is equal to 1 the model fits perfectly the dataset.
models1 = genFits(xVals1, yVals1, degrees) # train the model
testFits(models1, degrees, xVals1, yVals1,'DataSet 1.txt') #test and show the fit

Seems that the 16th-degree curve is a very good fit, let’s try with the second generated dataset. Remember we started from a parabolic curve (degree equal to 2) and then we just generated some noise.
xVals2, yVals2 = getData('Dataset 2.txt')
pdval2 = pd.Series(yVals2)
pdval2.plot();

models2 = genFits(xVals2, yVals2, degrees)
testFits(models2, degrees, xVals2, yVals2, 'DataSet 2.txt')

Again seems that degree 16 is the best model we can get. But why R2 is so good, what are we missing? Introducing some noise we got the 16th-degree model?
Let’s try to swap the models and dataset, in literature, this is called cross-validation. We test different dataset to the same model.
#dataset 2 against model 1
testFits(models1, degrees, xVals2, yVals2,'DataSet 2/Model 1')

#dataset 1 against model 2
testFits(models2, degrees, xVals1, yVals1,'DataSet 1/Model 2')

Ook! Now we have the evidence that 16th degree is not the best model for our datasets. But still, we have to choose between 2 and 4 degrees.
How can we decide?
Well, we have 2 ways:
- A rule of thumb. Like the Occam Razor: the simplest decision is the best one.
- Empirical rule. Like generate more dataset and test them against the model.
I would like to suggest the second one!
Let’s try another example, let’s generate a line:
xVals = (0,1,2,3)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values');

a,b,c = pylab.polyfit(xVals, yVals, 2) #let's try to fit the data ( a line ), with a polynomial courve
print('a =', round(a, 4), 'b =', round(b, 4), 'c =', round(c, 4)) #let's get the coefficients
a = -0.0 b = 1.0 c = 0.0
As you can see, the library understood that we had a line because the model is:
y = ax^2 + bx + c
so, if we insert the coefficients above we have:
y = x
exactly our line. Let’s see the predictions;
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 1.0

Perfect match! Now let’s wide the horizon and recompute the R2 metrics:
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 1.0

Perfect Model! Ok, so we assumed that the basics are solid, let’s add some noise.
#almost a line
pylab.figure()
xVals = (0,1,2,3)
yVals = (0,1,2,3.1) #just slightly noised
pylab.plot(xVals, yVals, label = 'Actual values');

model = pylab.polyfit(xVals, yVals, 2)
print(model)
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
[0.025 0.955 0.005]
R-squared = 0.9999057936881771

Remember the equation before? Now we have a Parabolic model:
y = 0.025x^2 + 0.955x + 0.005
But, is it correct? Let’s wide again the horizon
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = 0.7026164813486402

Well, close call. As you can see a 0.9999 R2 precision is not the ONLY thing we have to reach to create a perfect Machine learning model. We have always to test it into the right conditions, or we can just fall into a trap!
But it was a close call? Obviously not! See what happens with a range of 1000!
pylab.figure()
#Extend domain
xVals = xVals + (1000,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
R-squared = -753.8806225945068

R-squared = -753.88. This means that our model is a disaster!!
In my opinion, testing the model is the most important and critical phase in a machine learning project, because it’s easy to be deceived by good numbers and luckly runs.
What do you think?
Keep in touch!
Bye,
Graziano