Dec 212017
 
Text file from Microsoft BBS concerning Regression Analysis and Best Fit Lines when using Excel 3.0 or 4.0.
File XE0124.ZIP from The Programmer’s Corner in
Category Lotus and other Spreadsheets
Text file from Microsoft BBS concerning Regression Analysis and Best Fit Lines when using Excel 3.0 or 4.0.
File Name File Size Zip Size Zip Type
XE0124.TXT 29473 8377 deflated

Download File XE0124.ZIP Here

Contents of the XE0124.TXT file


======================================================================
Microsoft(R) Product Support Services Application Note (Text File)
XE0124: REGRESSION ANALYSIS AND BEST FIT LINES
======================================================================
Revision Date: 8/93
No Disk Included

The information in this Application Note applies to Microsoft Excel
versions 3.0 and 4.0 for the Macintosh(R), and Microsoft Excel
versions 3.0, 4.0 and 4.0a for Windows.

----------------------------------------------------------------------
| INFORMATION PROVIDED IN THIS DOCUMENT AND ANY SOFTWARE THAT MAY |
| ACCOMPANY THIS DOCUMENT (collectively referred to as an Application |
| Note) IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER |
| EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED |
| WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR |
| PURPOSE. The user assumes the entire risk as to the accuracy and |
| the use of this Application Note. This Application Note may be |
| copied and distributed subject to the following conditions: 1) All |
| text must be copied without modification and all pages must be |
| included; 2) If software is included, all files on the disk(s) |
| must be copied without modification (the MS-DOS(R) utility |
| diskcopy is appropriate for this purpose); 3) All components of |
| this Application Note must be distributed together; and 4) This |
| Application Note may not be distributed for profit. |
| |
| Copyright (C) 1992-1993 Microsoft Corporation. All Rights Reserved.|
| Microsoft and MS-DOS are registered trademarks and Windows is a |
| trademark of Microsoft Corporation. |
| Macintosh is a registered trademark of Apple Computer, Inc. |
| --------------------------------------------------------------------|


OVERVIEW
========

This Application Note discusses how to use Microsoft Excel functions
to perform simple, multiple, and polynomial regression analysis. It
contains examples of how to use LINEST(), LOGEST(), TREND(), and
GROWTH() to describe a best fit line or curve and to make predictions
about your data. It also outlines some of the new statistical
functions and tools available with Microsoft Excel version 4.0.


GENERAL INFORMATION
===================

Regression is a statistical method used to predict values based on
relationships in existing data. By analyzing how a single dependent
variable (y) is affected by the values of one or more independent
variables (x), you can predict what y will be given x. You can use
this information to fit a line or a curve to your existing data and to
forecast future values. The LINEST(), TREND(), LOGEST(), and GROWTH()
functions are the primary functions you will use to perform
regression analysis in Microsoft Excel versions 3.0 and 4.0.

While this Application Note focuses primarily on the functions that
can be used in Microsoft Excel versions 3.0 and 4.0, Microsoft Excel
version 4.0 offers several new functions and tools that you can use to
perform regression analysis and to create best fit lines. When one of
these new functions can be used to perform a task described in this
Application Note, the function will be noted in the appropriate
section. Following are some of these new functions:

To do this Use this function
------------------------------------------------------------
Return the correlation coefficient CORREL()
for two arrays of cells

Return a single predicted y value FORECAST()
based on a linear regression of
known x and y ranges

Return the y intercept of the INTERCEPT()
linear regression line

Calculate R2, the coefficient of RSQ()
determination

Return the slope of the linear SLOPE()
regression line

Return the standard error of the STEYX()
regression

In addition, the Analysis ToolPak add-in macro provides a special set
of analysis tools, including tools to accomplish the following:

To do this Use this analysis tool
-----------------------------------------------------------------
Predict a value based on the forecast Exponential Smoothing
for the prior period, adjusted for the
error in that prior forecast

Project values in the forecast period Moving Average
based on the average value of the
variable over a specific number of
preceding periods

Perform linear regression analysis and Regression
return statistics and plots as specified

CHOOSING THE BEST FUNCTION
===========================

Whether you are performing simple regression (one x variable),
multiple regression (two or more x variables), or polynomial
regression (one x variable raised to different powers), you will get
the most accurate results if the function that you choose to regress
your data is based on the patterns in your existing data.

When Your Data Is Linear
------------------------

If the rate of change in your data is even to such an extent that when
you plot it in a chart the pattern in your data points resembles a
line, use the linear regression functions, LINEST() and TREND(). Both
functions use the "least squares" method to calculate a straight line
that best fits your data. LINEST() returns information about the line,
such as its slope and y intercept, and TREND() returns predicted
values along the line.

In Microsoft Excel version 4.0, the Regression tool (from the Options
menu, choose Analysis Tools) performs linear regression, returns
regression statistics, calculates best fit lines, and creates best fit
line charts. For additional information, see pages 41-45 in "User's
Guide 2."

When Your Data Is Exponential
-----------------------------

If the rate of change in your data is exponential, such that when you
plot it in a chart, the pattern resembles a curve that rises or falls
at an increasingly higher rate, use the logarithmic regression
functions LOGEST() and GROWTH(). LOGEST() calculates an exponential
curve that best fits your data and, like LINEST(), returns information
about the curve. Like TREND(), GROWTH() returns predicted values along
the curve.

When Your Data Is Curvilinear
-----------------------------

To most accurately predict values when the pattern in your data is
neither linear nor exponential, use polynomial regression in
conjunction with the TREND() function to calculate a best fit curve.
For example, use this method if, when you plot your data in a chart,
it resembles a curve for which the rate of change is not dramatic or
if your data fluctuates in such a way that no linear or curved pattern
can be identified.


SIMPLE REGRESSION ANALYSIS
==========================

Your regression analysis is "simple" if you have only one independent
x variable for each dependent y variable. For example, assume you are
analyzing the sales figures for the first six months of operation for
WidgeMaker, Inc., a company that specializes in the design and
manufacture of Widgets.

NOTE: The following examples primarily use the LINEST() and
TREND() functions. Wherever these two functions are discussed,
LOGEST() and GROWTH() can be substituted if your data is
exponentially curved and a curve fit would be more accurate than a
straight line.

In the following sample data, the values in the Month column are the
independent x variables and the values in the Sales column are the y
variables. Based on this data, you can describe, calculate, and plot a
best fit line, and then predict future sales figures. Because the data
is linear, you will use the LINEST() and TREND() functions to perform
the regression analysis.

The Regression tool in Microsoft Excel version 4.0 performs each of
these tasks automatically. For additional information on using this
tool and specifying the options you want, see pages 41-45 of "User's
Guide 2." Because this tool performs linear regression, if your data
resembles an exponential curve, use LOGEST() and GROWTH(). Following
are the sales figures for WidgeMaker and the corresponding months in
both table and chart form:

| A | B
--------------------
1 | Month | Sales
--------------------
2 1 $4,200
3 2 $6,100
4 3 $7,300
5 4 $7,300
6 5 $8,700
7 6 $10,500

NOTE: The sales figures are formatted using the built-in currency
format with no decimal places. To change your number format, choose
Number from the Format menu.



To create the chart as a separate document:

1. Select cells A1:B7.

2. From the File menu, choose New. Select Chart and choose OK.

3. In the New Chart dialog box, select X-Values For XY-Chart. Choose
OK.

4 From the Gallery menu, choose XY (Scatter). Select chart type
number 2 and choose OK.

5. Select the x-axis and choose Scale from the Format menu. Change the
Minimum value to 1 and choose OK.

6. From the Chart menu, choose Attach Text. Select Chart Title and
choose OK. In the Formula bar, type:

WidgeMaker, Inc.

7. With the title selected, choose Patterns from the Format menu.
Under Border, choose Automatic and select the Shadow check box.
Choose OK.

8. From the Chart menu, choose Attach text. Select Value (Y) Axis and
choose OK. In the Formula bar, type Sales.

9. From the Chart menu, choose Attach Text. Select Category (X) Axis
and choose OK. In the Formula bar, type Month and press ENTER
(press RETURN if you are using Microsoft Excel for the Macintosh).

NOTE: In Microsoft Excel version 4.0, an easier way to create
charts is to use the ChartWizard. Select cells A1:B7 and choose
the ChartWizard button. Follow the steps, selecting the formatting
options you want. Once the chart is finished, to add additional
formatting, double-click the chart to open it in its own chart
window.

DESCRIBING A BEST FIT LINE
==========================

The equation of a straight line is y=x+, where is the slope and
is the y intercept. LINEST() returns the m and b values that describe
the line derived from your existing data.

Microsoft Excel version 4.0 provides individual functions for
calculating the slope and the y intercept when your data is linear:
SLOPE() and INTERCEPT(). For additional information, see pages 244-245
and 405-406 of the "Function Reference" version 4.0.

NOTE: If your data is exponentially curved, use LOGEST() to return
the and values that describe the curve. The equation used by
LOGEST() is y=*^x.

Finding the Slope and the Y Intercept
-------------------------------------

To calculate the values of the slope and y intercept , do the
following:

1. Select cells E2:F2.

2. Type the following formula:

=LINEST(B2:B7,A2:A7)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

The first argument in the LINEST() function is the array containing
the known y values (which in this example are the Sales numbers). The
second argument is the array containing the known x values (in this
case, the Month numbers).

NOTE: LINEST() also takes other arguments but these aren't necessary
to this example.

The result 1122.857, in E2, is the slope and the result 3420, in cell
F2, is where the line crosses the y-axis.

| E | F
--------------------------
1 | Slope | Y intercept
--------------------------
2 1122.857 3420

CALCULATING A BEST FIT LINE
===========================

If your data is linear, use TREND() or LINEST() to calculate your best
fit line. In Microsoft Excel version 4.0, you can also use the
FORECAST() function. If your data is an exponential curve, use
LOGEST() or GROWTH().

Using TREND()
-------------

The TREND() function is the easiest and most efficient function for
calculating the points along a best fit line. To simultaneously
calculate all the values on the best fit line, do the following:

1. Select cells C2:C7 and type the following formula:

=TREND(B2:B7,A2:A7)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

The first argument in the TREND() function is the array containing the
known y values and the second argument is the array containing the
known x values.

NOTE: TREND() also takes other arguments but these aren't necessary
to this example.

Using LINEST()
--------------

You can also use the slope and the y intercept values returned
by LINEST() to find data points on the best fit line by substituting
the y values or the x values into y=mx+b, the equation for a line. By
plugging each month number into this formula, you can calculate all
the data points for your best fit line. With the m value (1122.857) in
cell E2 and the b value (3420) in cell F2, do the following to
generate the points on your best fit line:

1. Select cell D2 and enter the following formula (it is not necessary
to enter this formula as an array):

=($E$2*A2)+$F$2

2. Select cells D2:D7.

3. From the Edit menu, choose Fill Down.

The values returned are the y values for your best fit line.

The following table shows the results of the values returned when you
use TREND() and LINEST():

| A | B | C | D | E | F
----------------------------------------------------------------------
1 | Month | Sales | Predicted Y | Predicted Y | Slope | Y Intercept
| | | TREND() | LINEST() | |
----------------------------------------------------------------------
2 1 $4,200 $4,543 $4,543 1122.857 3420
3 2 $6,100 $5,666 $5,666
4 3 $7,300 $6,789 $6,789
5 4 $7,300 $7,911 $7,911
6 5 $8,700 $9,031 $9,031
7 6 $10,500 $10,157 $10,157

NOTE: The returned values for TREND() and LINEST() are identical.

PLOTTING THE BEST FIT LINE
===========================

Once you've calculated the values on your best fit line, you can add
that line to your existing chart:

1. Select cells C1:C7. From the Edit menu, choose Copy.

2. Activate the chart document by selecting it from the Window menu.

3. From the Edit menu, choose Paste.

4. If you want different data point markers on your best fit line,
select the line and choose Pattern from the Format menu. Under
Marker, select Custom and choose the data point marker and color
you want. Choose OK.

NOTE: In Microsoft Excel 4.0, you can use the ChartWizard to
easily update your chart with the new trend data. Activate your
chart by clicking it once if it is a chart object or by selecting
the filename from the Window menu if it is a separate chart
document. Select the ChartWizard button and in the Step 1 Of 2
dialog box, change the reference from filename!$A$1:$B$7 to
filename!$A$1:$C$7. Choose Next and in the Step 2 Of 2 dialog box,
choose OK.

The resulting chart will have a straight line (best fit line) running
through your original data.



NOTE: When you display the points of a best fit line against the
original data in a chart, you will, in most cases, get the best
results by using an xy (scatter) chart. If you use a line chart, the
x values will be treated as labels rather than as values, and
crooked lines may result.

PREDICTING FUTURE VALUES
========================

In addition to returning values along the line fitted to your existing
data, you can use TREND() and LINEST() to predict future values. You
can also use the FORECAST() function in Microsoft Excel 4.0.

Using the WidgeMaker, Inc., example, suppose you want to calculate
sales figures for months 7, 8, and 9. The following examples show how
to accomplish this using the TREND() and LINEST() functions,
respectively.

Using TREND()
-------------

To predict values for months 7, 8, and 9, first enter the month
numbers for which you want predicted sales figures and then use the
TREND() function to calculate the values:

1. In cells A8:A10, type 7, 8, and 9 respectively.

2. Select cells B8:B10.

3. Type the following formula:

=TREND(B2:B7,A2:A7,A8:A10)

4. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

The third argument in the TREND() function is the array containing the
new x values for which you want to derive predicted y values.

The resulting values in cells B8:B10 are the predicted sales for the
next three months:

| A | B
---------------------
1 | Month | Sales
---------------------
2 1 $4,200
3 2 $6,100
4 3 $7,300
5 4 $7,300
6 5 $8,700
7 6 $10,500
8 7 $11,280
9 8 $12,403
10 9 $13,526

NOTE: In Microsoft Excel 4.0, you can use the AutoFill feature to
predict future values. Using the above data, if you wanted to
predict sales for months 7, 8, and 9, you would select cells B2:B7,
select the AutoFill handle in the lower-right corner of the selected
area and drag down three additional cells.

CAUTION: In addition to returning predicted values for months 7, 8,
and 9, the data in cells B2:B7 will be overwritten with the values
that represent the best fit line. If you do not want your original
data to be overwritten, copy it to a separate area on your worksheet
and then use AutoFill.

Using LINEST()
--------------

To obtain the new y values, you can also substitute the slope and y
intercept values that you derived with the LINEST() function and the
new x values (7, 8, and 9) into the formula, y=x+. See the
"Calculating a Best Fit Line" section for step-by-step instructions
on how to do this.


MULTIPLE REGRESSION ANALYSIS
============================

When you have two or more independent x variables for each y variable,
the regression analysis is multiple. For example, you could predict a
child's weight given his or her age and height. Assume you've
collected the following data

| A | B | C
--------------------------
1 | Age | Height | Weight
--------------------------
2 3 32 35
3 5 40 40
4 6 39 43
5 10 50 70

where the values under Weight (C2:C5) represent the y variables and
the values under Age and Height (A2:B5) are the x variables.

PREDICTING Y VALUES
===================

You can use either the TREND() or the LINEST() function to analyze the
relationship of the age and height to weight, and you can make
predictions based on the results of this analysis.

In Microsoft Excel 4.0, the Regression tool can also be used to
predict y values in a multiple regression model. The FORECAST()
function only works for simple regression. For additional information,
see pages 41-45 in "User's Guide 2."

Using TREND()
-------------

To use TREND() to predict the weight of a 9-year-old, 45-inch child,
do the following:

1. In cells A6:B6, type 9 and 45 respectively.

2. Select cell C6 and type the following formula:

=TREND(C2:C5,A2:B5,A6:B6)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

The predicted weight is 63.42.

Using LINEST()
--------------

To predict a y value with LINEST(), you must first calculate the
slopes for each x variable and find the y intercept. Because a slope
is returned for each x variable, when you use the LINEST() function,
you must first select a range of cells that consist of a single row
and a single column plus an additional column for each x variable you
have. In this example, because you have two x variables, you will need
to select a range of three cells, three columns wide by one row.

To calculate the slopes and the y intercepts, do the following:

1. Select cells A7:C7 and type the following formula:

=LINEST(C2:C5,A2:B5)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for Macintosh.

The following values will be returned

| A | B | C
---------------------
7 | -0.32 5.98 24

where -.32 is the slope for the second x variable (height), 5.98 is
the slope for the first x variable (age), and 24 is the y intercept.

NOTE: The slopes are in reverse order: the first slope value
corresponds to the second x variable and the second slope value
corresponds to the first x variable. The LINEST() function always
returns the slopes in backwards order when more than one x
variable is involved.

You can use the slope values and the y intercept value to make
predictions based on your data. Using the formula,
y=(1*x1)+(2*x2)+(n*xn)+, you can predict the weight
of a 45-inch 9-year old:

=(9*5.98)+(45*-0.32)+24

The result of the formula, 63.42, is the predicted weight.

Similar to LINEST(), the Regression tool in Microsoft Excel 4.0
returns the slope values and the y intercept value. As described
previously, you can plug these values into the formula, y=x+,
to predict y.

NOTE: Because the x variables are independent, there may not be a
good graphical representation for a multiple regression model. Each
x value can be plotted with its corresponding y value but the
individual lines may be completely unrelated and therefore may not
be useful.


POLYNOMIAL REGRESSION ANALYSIS
==============================

When your data is neither exponentially curved nor consistently
linear, use this method of regression. When you plot a best fit curve
calculated with polynomial regression, the curve will rise and fall
with the data.

CALCULATING A POLYNOMIAL CURVE
==============================

To calculate a polynomial curve, the y variable is regressed against
the independent x variable raised to different powers. To illustrate
this process, take the following example (assume WidgeMaker, Inc.'s
sales for the first six months of its operation are as follows):

| A | B
------------------
1 | Month | Sales
------------------
2 1 $4,200
3 2 $1,600
4 3 $5,120
5 4 $4,500
6 5 $5,400
7 6 $1,460

Fitting a straight line to this data would not accurately predict the
sales for any given month. In this case, you'll get the best results
by setting up the following polynomial regression model:

| A | B | C | D | E | F
---------------------------------------------
10 | X | X^2 | X^3 | X^4 | Sales | Trend
---------------------------------------------
11 1 1 1 1 $4,200 $4,089
12 2 4 8 16 $1,600 $2,154
13 3 9 27 81 $5,120 $4,011
14 4 16 64 256 $4,500 $5,609
15 5 25 125 625 $5,400 $4,846
16 6 36 216 1296 $1,460 $1,571

The values in cells A11:A16 are the month numbers copied from A2:A7.
The values in cells B11:D16 are the original x variables raised to the
second, third, and fourth powers respectively. To obtain these values,
do the following:

1. Select cell B11 and enter the formula:

=A11^2

2. Select cell C11 and enter the formula:

=A11^3

3. Select cell D11 and enter the formula:

=A11^4

4. Select cells B11:D16.

5. From the Edit menu, choose Fill Down.

The values in E11:E16 are the sales figures copied from B2:B7. To
derive the trend values, do the following:

1. Select cells F11:F16 and type the formula:

=TREND(E11:E16,A11:D16)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

PLOTTING A POLYNOMIAL CURVE
===========================

To add the TREND() results to this chart, select cells F10:F16 and
follow the steps under "Plotting the Best Fit Line" on page 6 of this
Application Note. You will create a chart that resembles the
following:





USING REGRESSION STATISTICS
===========================

The LINEST() and LOGEST() functions can return additional regression
statistics that can be helpful in using and evaluating your regression
model. If you have linear data, you can use the Regression Tool in
Microsoft Excel version 4.0 to automatically return all the regression
statistics. If your data resembles an exponential curve, use LOGEST()
to return accurate regression statistics.

Using LINEST() or LOGEST() to Return Regression Statistics
----------------------------------------------------------

To return the additional statistics using LINEST() or LOGEST(), you
must select a range that includes five rows and a single column plus
an additional column for each x variable you have. In addition, the
stats argument, which is the fourth argument in both of these
functions, must be set to TRUE.

To return the additional regression statistics using the following
data

| A | B | C
-------------------------
1 | Age | Height | Weight
-------------------------
2 3 32 35
3 5 40 40
4 6 39 43
5 10 50 70

do the following:

1. Select cells D1:F5.

NOTE: This range consists of five rows and a single column plus
two additional columns, one for each x variable.

2. Type the following formula:

=LINEST(C2:C5,A2:B5,,TRUE)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in
Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel
for the Macintosh.

The resulting data should resemble the following:

| D | E | F
------------------------------------
1 -0.32 5.98 24
2 2.243569 5.647619 57.68449
3 .950813 6.024948 #N/A
4 9.665289 1 #N/A
5 701.7 36.3 #N/A

In the first row of the statistics, you have the slope for the height,
the slope for the age, and the y intercept. The second row contains
the standard error of the slopes and of the y intercept. R2 and the
standard error for the y estimate are in the third row. The
statistic and degrees of freedom are in the fourth row, while the
regression sum of squares and the residual sum of squares are in the
fifth row.

Using R2 to Determine the Accuracy of the Regression Model
----------------------------------------------------------

A particularly useful statistic returned is the coefficient of
determination (R2). In Microsoft Excel 4.0, you can also use the RSQ()
function to find R2. This R2 indicator ranges in value from 0 to 1 and
tells you how closely the estimated y values correlate to your actual
y values. The closer R2 is to 1, the more perfect the correlation--
this correlation indicates that the regression equation is very useful
in accurately predicting a y value. On the other hand, the closer R2
is to 0, the less helpful it will be in predicting a y value. In the
previous example, the value for R2 returned by LINEST() is .95, a near
perfect correlation. This indicates that, based on the collected data,
the LINEST() model can be used to make extremely accurate predictions
of a child's weight given a specific age and height.

If you want to predict how accurately a child's age and height will
predict their weight, given the collected data, R2 will indicate the
accuracy of the predicted weight of a child with a different age and
height not included in the collected data.


 December 21, 2017  Add comments

Leave a Reply