We can use the least-square regression to generate a curve that can sufficiently describe the relationship between and . This method minimizes the discrepancy between the data points and curve produced; thus, it can be useful for objectively illustrating the general trend in data.

Linear Regression

Best Criteria

For a set of paired observations: . We can express the residual error of a particular straight line approximation using the formula

where and are coefficients characterizing the slope and intercept.1 Consequently, we can find the line that best fit the given data points using a specific criteria: the line with the smallest sum of squares of residual errors is the best fit (least-squares fit). We can determine this sum using the expression

where is the total number of points.

Least-Squares Fit

The expression for a straight line is

however, we set to obtain the line with the least-squares. On the other hand, we can solve for and using the formulae:

where and are the means of and .

Quantification of Error

The least-squares regression provides the best estimate of and when it meets the following criteria:

  1. Along the entire range of data, the distance between the regression line and the data points is of similar magnitude.
  2. The distribution of the points with respect to the line is normal.2

Furthermore, the regression line’s standard deviation can be obtained—assuming that the criteria are met—using

The standard error of the estimate tells us the spread of data around the regression line; in contrast, the standard deviation only tells us the spread of data around the mean. This assists us in assessing the accuracy of our fit, especially when we compare other types of regression methods.

To determine the goodness of our fit:

  1. Determine the or total sum of the squares around the mean for the dependent variable.
    • This is the magnitude of the residual error associated with the pre-regression dependent variable.
  2. Solve for the regression line.
  3. Determine the or the sum of the squares of the residuals around the regression line.
    • This is the magnitude of the residual error that remains after the regression; it is also referred to as the unexplained sum of the squares
  4. Solve for the coefficient of determination
    • A perfect fit has no and has
    • The result indicates the percent of original uncertainty explained by the linear model

The coefficient of determination can mathematically be expressed as

where its square root is referred to as the correlation coefficient. The correlation coefficient can also be solved using the formula

WARNING

Just because is close to does not mean that the fit is necessarily good (like in cases where the relationship between the variables are not even linear).

TIP

If , then the linear regression model has merit; else, it does not.

Polynomial Regression

Besides linear regression, we could also apply the least-square procedure to polynomial regression—which can be handy for trends in data that a straight line cannot acceptably represent. For example, we can extend the least-square formula to handle quadratic polynomial using the following equation:

To compute for the three unknowns (i.e., and ), we need to solve the following system of three linear equations

For an -order polynomial

we can express its standard error of the estimate as

Similar to a linear regression, the coefficient of determination is

Multiple Linear Regression

In a case where there are two independent variables and only a single dependent variable, the regression line becomes a regression plane. For example, the following function:

As with previous cases, we can obtain the best values by taking advantage of the sum of the squares of the residuals,

To obtain the coefficients for the minimum sum of the squares of the residuals, we can use the following equation

It should be noted that it’s not limited to two variables, multiple linear regression can be extended to dimensions, as in

where the standard error is

and the coefficient of determination is

Nonlinear Regression

TODO

Gauss-Newton Method

TODO

Sources

  1. Numerical Methods for Engineers by Steven Chapra and Raymond Canale (Chapter 17)

Footnotes

  1. Residual error is the difference between the true value and approximated value.

  2. normal distribution