# MM018 – Lies, Damned Lies! Why is my ML/DL Model lying to me?

Today I want to explore why models sometimes seem good but they are terribly wrong. Why some metrics like RMSE/R2/Accuracy are good, or actually they are damn good, but in the real world they are not. What happens? Bet, let’s do some experiments and understand the basics.

I want to generate some noisy parabolic data (polynomial curve with degree equal to 2) and try to understand if I can create a model that fits the parabola.

`degrees = (2, 4, 8, 16) # Polynomial curve orders that I want to fit my model`

`xVals1, yVals1 = getData('Dataset 1.txt') #first dataset`

```
pdval1 = pd.Series(yVals1)
pdval1.plot();
```

Ok, now I want to try to generate my model and trying to fit the polynomial curves of 2,4,8 and 16 degrees.

To understand if my model is good, I would use the R2 metrics. I will not explaib the mathematical point of view of R2 metric, but it basically should go from 0 to 1. If it is equal to 1 the model fits **perfectly** the dataset.

`models1 = genFits(xVals1, yVals1, degrees) # train the model`

`testFits(models1, degrees, xVals1, yVals1,'DataSet 1.txt') #test and show the fit`

Seems that the 16th-degree curve is a very good fit, let’s try with the second generated dataset. Remember we started from a parabolic curve (degree equal to 2) and then we just generated some noise.

```
xVals2, yVals2 = getData('Dataset 2.txt')
pdval2 = pd.Series(yVals2)
pdval2.plot();
```

```
models2 = genFits(xVals2, yVals2, degrees)
testFits(models2, degrees, xVals2, yVals2, 'DataSet 2.txt')
```

Again seems that degree 16 is the best model we can get. But why R2 is so good, what are we missing? Introducing some noise we got the 16th-degree model?

Let’s try to swap the models and dataset, in literature, this is called cross-validation. We test different dataset to the same model.

```
#dataset 2 against model 1
testFits(models1, degrees, xVals2, yVals2,'DataSet 2/Model 1')
```

```
#dataset 1 against model 2
testFits(models2, degrees, xVals1, yVals1,'DataSet 1/Model 2')
```

Ook! Now we have the evidence that 16th degree is not the best model for our datasets. But still, we have to choose between 2 and 4 degrees.

How can we decide?

Well, we have 2 ways:

- A rule of thumb. Like the Occam Razor: the simplest decision is the best one.
- Empirical rule. Like generate more dataset and test them against the model.

I would like to suggest the second one!

Let’s try another example, let’s generate a line:

```
xVals = (0,1,2,3)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values');
```

```
a,b,c = pylab.polyfit(xVals, yVals, 2) #let's try to fit the data ( a line ), with a polynomial courve
print('a =', round(a, 4), 'b =', round(b, 4), 'c =', round(c, 4)) #let's get the coefficients
```

`a = -0.0 b = 1.0 c = 0.0`

As you can see, the library understood that we had a line because the model is:

y = ax^2 + bx + c

so, if we insert the coefficients above we have:

y = x

exactly our line. Let’s see the predictions;

```
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
```

`R-squared = 1.0`

Perfect match! Now let’s wide the horizon and recompute the R2 metrics:

```
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval((a,b,c), xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predictive values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
```

`R-squared = 1.0`

Perfect Model! Ok, so we assumed that the basics are solid, let’s add some noise.

```
#almost a line
pylab.figure()
xVals = (0,1,2,3)
yVals = (0,1,2,3.1) #just slightly noised
pylab.plot(xVals, yVals, label = 'Actual values');
```

```
model = pylab.polyfit(xVals, yVals, 2)
print(model)
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
```

```
[0.025 0.955 0.005]
R-squared = 0.9999057936881771
```

Remember the equation before? Now we have a Parabolic model:

`y = 0.025x^2 + 0.955x + 0.005`

But, is it correct? Let’s wide again the horizon

```
pylab.figure()
#Extend domain
xVals = xVals + (20,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
```

`R-squared = 0.7026164813486402`

Well, close call. As you can see a 0.9999 R2 precision is not the **ONLY** thing we have to reach to create a perfect Machine learning model. We have always to test it into the right conditions, or we can just fall into a trap!

But it was a close call? Obviously not! See what happens with a range of 1000!

```
pylab.figure()
#Extend domain
xVals = xVals + (1000,)
yVals = xVals
pylab.plot(xVals, yVals, label = 'Actual values')
estYVals = pylab.polyval(model, xVals)
pylab.plot(xVals, estYVals, 'r--', label = 'Predicted values')
print('R-squared = ', rSquared(yVals, estYVals))
pylab.legend(loc = 'best');
```

`R-squared = -753.8806225945068`

R-squared = -753.88. This means that our model is a **disaster**!!

In my opinion, **testing the model is the most important and critical phase in a machine learning project**, because it’s easy to be deceived by good numbers and luckly runs.

What do you think?

Keep in touch!

Bye,

Graziano