Keywords
We can use the least-square regression to generate a curve that can sufficiently describe the relationship between
Linear Regression
Best Criteria
For a set of paired observations:
where
where
Least-Squares Fit
The expression for a straight line is
however, we set
where
Quantification of Error
The least-squares regression provides the best estimate of
- Along the entire range of data, the distance between the regression line and the data points is of similar magnitude.
- The distribution of the points with respect to the line is normal.2
Furthermore, the regression line’s standard deviation can be obtained—assuming that the criteria are met—using
The standard error of the estimate tells us the spread of data around the regression line; in contrast, the standard deviation only tells us the spread of data around the mean. This assists us in assessing the accuracy of our fit, especially when we compare other types of regression methods.
To determine the goodness of our fit:
- Determine the
or total sum of the squares around the mean for the dependent variable. - This is the magnitude of the residual error associated with the pre-regression dependent variable.
- Solve for the regression line.
- Determine the
or the sum of the squares of the residuals around the regression line. - This is the magnitude of the residual error that remains after the regression; it is also referred to as the unexplained sum of the squares
- Solve for the coefficient of determination
- A perfect fit has no
and has - The result indicates the percent of original uncertainty explained by the linear model
- A perfect fit has no
The coefficient of determination can mathematically be expressed as
where its square root
WARNING
Just because
is close to does not mean that the fit is necessarily good (like in cases where the relationship between the variables are not even linear).
TIP
If
, then the linear regression model has merit; else, it does not.
Polynomial Regression
Besides linear regression, we could also apply the least-square procedure to polynomial regression—which can be handy for trends in data that a straight line cannot acceptably represent. For example, we can extend the least-square formula to handle quadratic polynomial using the following equation:
To compute for the three unknowns (i.e.,
For an
we can express its standard error of the estimate as
Similar to a linear regression, the coefficient of determination is
Multiple Linear Regression
In a case where there are two independent variables and only a single dependent variable, the regression line becomes a regression plane. For example, the following function:
As with previous cases, we can obtain the best values by taking advantage of the sum of the squares of the residuals,
To obtain the coefficients for the minimum sum of the squares of the residuals, we can use the following equation
It should be noted that it’s not limited to two variables, multiple linear regression can be extended to
where the standard error is
and the coefficient of determination is
Nonlinear Regression
Gauss-Newton Method
Sources
- Numerical Methods for Engineers by Steven Chapra and Raymond Canale (Chapter 17)
Footnotes
-
Residual error is the difference between the true value and approximated value. ↩