# Category : Lotus and other Spreadsheets

Archive : XE0124.ZIP

Filename : XE0124.TXT

======================================================================

Microsoft(R) Product Support Services Application Note (Text File)

XE0124: REGRESSION ANALYSIS AND BEST FIT LINES

======================================================================

Revision Date: 8/93

No Disk Included

The information in this Application Note applies to Microsoft Excel

versions 3.0 and 4.0 for the Macintosh(R), and Microsoft Excel

versions 3.0, 4.0 and 4.0a for Windows.

----------------------------------------------------------------------

| INFORMATION PROVIDED IN THIS DOCUMENT AND ANY SOFTWARE THAT MAY |

| ACCOMPANY THIS DOCUMENT (collectively referred to as an Application |

| Note) IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER |

| EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED |

| WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR |

| PURPOSE. The user assumes the entire risk as to the accuracy and |

| the use of this Application Note. This Application Note may be |

| copied and distributed subject to the following conditions: 1) All |

| text must be copied without modification and all pages must be |

| included; 2) If software is included, all files on the disk(s) |

| must be copied without modification (the MS-DOS(R) utility |

| diskcopy is appropriate for this purpose); 3) All components of |

| this Application Note must be distributed together; and 4) This |

| Application Note may not be distributed for profit. |

| |

| Copyright (C) 1992-1993 Microsoft Corporation. All Rights Reserved.|

| Microsoft and MS-DOS are registered trademarks and Windows is a |

| trademark of Microsoft Corporation. |

| Macintosh is a registered trademark of Apple Computer, Inc. |

| --------------------------------------------------------------------|

OVERVIEW

========

This Application Note discusses how to use Microsoft Excel functions

to perform simple, multiple, and polynomial regression analysis. It

contains examples of how to use LINEST(), LOGEST(), TREND(), and

GROWTH() to describe a best fit line or curve and to make predictions

about your data. It also outlines some of the new statistical

functions and tools available with Microsoft Excel version 4.0.

GENERAL INFORMATION

===================

Regression is a statistical method used to predict values based on

relationships in existing data. By analyzing how a single dependent

variable (y) is affected by the values of one or more independent

variables (x), you can predict what y will be given x. You can use

this information to fit a line or a curve to your existing data and to

forecast future values. The LINEST(), TREND(), LOGEST(), and GROWTH()

functions are the primary functions you will use to perform

regression analysis in Microsoft Excel versions 3.0 and 4.0.

While this Application Note focuses primarily on the functions that

can be used in Microsoft Excel versions 3.0 and 4.0, Microsoft Excel

version 4.0 offers several new functions and tools that you can use to

perform regression analysis and to create best fit lines. When one of

these new functions can be used to perform a task described in this

Application Note, the function will be noted in the appropriate

section. Following are some of these new functions:

To do this Use this function

------------------------------------------------------------

Return the correlation coefficient CORREL()

for two arrays of cells

Return a single predicted y value FORECAST()

based on a linear regression of

known x and y ranges

Return the y intercept of the INTERCEPT()

linear regression line

Calculate R2, the coefficient of RSQ()

determination

Return the slope of the linear SLOPE()

regression line

Return the standard error of the STEYX()

regression

In addition, the Analysis ToolPak add-in macro provides a special set

of analysis tools, including tools to accomplish the following:

To do this Use this analysis tool

-----------------------------------------------------------------

Predict a value based on the forecast Exponential Smoothing

for the prior period, adjusted for the

error in that prior forecast

Project values in the forecast period Moving Average

based on the average value of the

variable over a specific number of

preceding periods

Perform linear regression analysis and Regression

return statistics and plots as specified

CHOOSING THE BEST FUNCTION

===========================

Whether you are performing simple regression (one x variable),

multiple regression (two or more x variables), or polynomial

regression (one x variable raised to different powers), you will get

the most accurate results if the function that you choose to regress

your data is based on the patterns in your existing data.

When Your Data Is Linear

------------------------

If the rate of change in your data is even to such an extent that when

you plot it in a chart the pattern in your data points resembles a

line, use the linear regression functions, LINEST() and TREND(). Both

functions use the "least squares" method to calculate a straight line

that best fits your data. LINEST() returns information about the line,

such as its slope and y intercept, and TREND() returns predicted

values along the line.

In Microsoft Excel version 4.0, the Regression tool (from the Options

menu, choose Analysis Tools) performs linear regression, returns

regression statistics, calculates best fit lines, and creates best fit

line charts. For additional information, see pages 41-45 in "User's

Guide 2."

When Your Data Is Exponential

-----------------------------

If the rate of change in your data is exponential, such that when you

plot it in a chart, the pattern resembles a curve that rises or falls

at an increasingly higher rate, use the logarithmic regression

functions LOGEST() and GROWTH(). LOGEST() calculates an exponential

curve that best fits your data and, like LINEST(), returns information

about the curve. Like TREND(), GROWTH() returns predicted values along

the curve.

When Your Data Is Curvilinear

-----------------------------

To most accurately predict values when the pattern in your data is

neither linear nor exponential, use polynomial regression in

conjunction with the TREND() function to calculate a best fit curve.

For example, use this method if, when you plot your data in a chart,

it resembles a curve for which the rate of change is not dramatic or

if your data fluctuates in such a way that no linear or curved pattern

can be identified.

SIMPLE REGRESSION ANALYSIS

==========================

Your regression analysis is "simple" if you have only one independent

x variable for each dependent y variable. For example, assume you are

analyzing the sales figures for the first six months of operation for

WidgeMaker, Inc., a company that specializes in the design and

manufacture of Widgets.

NOTE: The following examples primarily use the LINEST() and

TREND() functions. Wherever these two functions are discussed,

LOGEST() and GROWTH() can be substituted if your data is

exponentially curved and a curve fit would be more accurate than a

straight line.

In the following sample data, the values in the Month column are the

independent x variables and the values in the Sales column are the y

variables. Based on this data, you can describe, calculate, and plot a

best fit line, and then predict future sales figures. Because the data

is linear, you will use the LINEST() and TREND() functions to perform

the regression analysis.

The Regression tool in Microsoft Excel version 4.0 performs each of

these tasks automatically. For additional information on using this

tool and specifying the options you want, see pages 41-45 of "User's

Guide 2." Because this tool performs linear regression, if your data

resembles an exponential curve, use LOGEST() and GROWTH(). Following

are the sales figures for WidgeMaker and the corresponding months in

both table and chart form:

| A | B

--------------------

1 | Month | Sales

--------------------

2 1 $4,200

3 2 $6,100

4 3 $7,300

5 4 $7,300

6 5 $8,700

7 6 $10,500

NOTE: The sales figures are formatted using the built-in currency

format with no decimal places. To change your number format, choose

Number from the Format menu.

To create the chart as a separate document:

1. Select cells A1:B7.

2. From the File menu, choose New. Select Chart and choose OK.

3. In the New Chart dialog box, select X-Values For XY-Chart. Choose

OK.

4 From the Gallery menu, choose XY (Scatter). Select chart type

number 2 and choose OK.

5. Select the x-axis and choose Scale from the Format menu. Change the

Minimum value to 1 and choose OK.

6. From the Chart menu, choose Attach Text. Select Chart Title and

choose OK. In the Formula bar, type:

WidgeMaker, Inc.

7. With the title selected, choose Patterns from the Format menu.

Under Border, choose Automatic and select the Shadow check box.

Choose OK.

8. From the Chart menu, choose Attach text. Select Value (Y) Axis and

choose OK. In the Formula bar, type Sales.

9. From the Chart menu, choose Attach Text. Select Category (X) Axis

and choose OK. In the Formula bar, type Month and press ENTER

(press RETURN if you are using Microsoft Excel for the Macintosh).

NOTE: In Microsoft Excel version 4.0, an easier way to create

charts is to use the ChartWizard. Select cells A1:B7 and choose

the ChartWizard button. Follow the steps, selecting the formatting

options you want. Once the chart is finished, to add additional

formatting, double-click the chart to open it in its own chart

window.

DESCRIBING A BEST FIT LINE

==========================

The equation of a straight line is y=

**, where** is the slope and

**is the y intercept. LINEST() returns the m and b values that describe**

the line derived from your existing data.

Microsoft Excel version 4.0 provides individual functions for

calculating the slope and the y intercept when your data is linear:

SLOPE() and INTERCEPT(). For additional information, see pages 244-245

and 405-406 of the "Function Reference" version 4.0.

NOTE: If your data is exponentially curved, use LOGEST() to return

the and

the line derived from your existing data.

Microsoft Excel version 4.0 provides individual functions for

calculating the slope and the y intercept when your data is linear:

SLOPE() and INTERCEPT(). For additional information, see pages 244-245

and 405-406 of the "Function Reference" version 4.0.

NOTE: If your data is exponentially curved, use LOGEST() to return

the

**values that describe the curve. The equation used by**

LOGEST() is y=

LOGEST() is y=

*****^x.

Finding the Slope and the Y Intercept

-------------------------------------

To calculate the values of the slope and y intercept Finding the Slope and the Y Intercept

-------------------------------------

To calculate the values of the slope

**, do the**

following:

1. Select cells E2:F2.

2. Type the following formula:

=LINEST(B2:B7,A2:A7)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The first argument in the LINEST() function is the array containing

the known y values (which in this example are the Sales numbers). The

second argument is the array containing the known x values (in this

case, the Month numbers).

NOTE: LINEST() also takes other arguments but these aren't necessary

to this example.

The result 1122.857, in E2, is the slope and the result 3420, in cell

F2, is where the line crosses the y-axis.

| E | F

--------------------------

1 | Slope | Y intercept

--------------------------

2 1122.857 3420

CALCULATING A BEST FIT LINE

===========================

If your data is linear, use TREND() or LINEST() to calculate your best

fit line. In Microsoft Excel version 4.0, you can also use the

FORECAST() function. If your data is an exponential curve, use

LOGEST() or GROWTH().

Using TREND()

-------------

The TREND() function is the easiest and most efficient function for

calculating the points along a best fit line. To simultaneously

calculate all the values on the best fit line, do the following:

1. Select cells C2:C7 and type the following formula:

=TREND(B2:B7,A2:A7)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The first argument in the TREND() function is the array containing the

known y values and the second argument is the array containing the

known x values.

NOTE: TREND() also takes other arguments but these aren't necessary

to this example.

Using LINEST()

--------------

You can also use the slope and the y intercept

following:

1. Select cells E2:F2.

2. Type the following formula:

=LINEST(B2:B7,A2:A7)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The first argument in the LINEST() function is the array containing

the known y values (which in this example are the Sales numbers). The

second argument is the array containing the known x values (in this

case, the Month numbers).

NOTE: LINEST() also takes other arguments but these aren't necessary

to this example.

The result 1122.857, in E2, is the slope and the result 3420, in cell

F2, is where the line crosses the y-axis.

| E | F

--------------------------

1 | Slope | Y intercept

--------------------------

2 1122.857 3420

CALCULATING A BEST FIT LINE

===========================

If your data is linear, use TREND() or LINEST() to calculate your best

fit line. In Microsoft Excel version 4.0, you can also use the

FORECAST() function. If your data is an exponential curve, use

LOGEST() or GROWTH().

Using TREND()

-------------

The TREND() function is the easiest and most efficient function for

calculating the points along a best fit line. To simultaneously

calculate all the values on the best fit line, do the following:

1. Select cells C2:C7 and type the following formula:

=TREND(B2:B7,A2:A7)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The first argument in the TREND() function is the array containing the

known y values and the second argument is the array containing the

known x values.

NOTE: TREND() also takes other arguments but these aren't necessary

to this example.

Using LINEST()

--------------

You can also use the slope

**values returned**

by LINEST() to find data points on the best fit line by substituting

the y values or the x values into y=mx+b, the equation for a line. By

plugging each month number into this formula, you can calculate all

the data points for your best fit line. With the m value (1122.857) in

cell E2 and the b value (3420) in cell F2, do the following to

generate the points on your best fit line:

1. Select cell D2 and enter the following formula (it is not necessary

to enter this formula as an array):

=($E$2*A2)+$F$2

2. Select cells D2:D7.

3. From the Edit menu, choose Fill Down.

The values returned are the y values for your best fit line.

The following table shows the results of the values returned when you

use TREND() and LINEST():

| A | B | C | D | E | F

----------------------------------------------------------------------

1 | Month | Sales | Predicted Y | Predicted Y | Slope | Y Intercept

| | | TREND() | LINEST() | |

----------------------------------------------------------------------

2 1 $4,200 $4,543 $4,543 1122.857 3420

3 2 $6,100 $5,666 $5,666

4 3 $7,300 $6,789 $6,789

5 4 $7,300 $7,911 $7,911

6 5 $8,700 $9,031 $9,031

7 6 $10,500 $10,157 $10,157

NOTE: The returned values for TREND() and LINEST() are identical.

PLOTTING THE BEST FIT LINE

===========================

Once you've calculated the values on your best fit line, you can add

that line to your existing chart:

1. Select cells C1:C7. From the Edit menu, choose Copy.

2. Activate the chart document by selecting it from the Window menu.

3. From the Edit menu, choose Paste.

4. If you want different data point markers on your best fit line,

select the line and choose Pattern from the Format menu. Under

Marker, select Custom and choose the data point marker and color

you want. Choose OK.

NOTE: In Microsoft Excel 4.0, you can use the ChartWizard to

easily update your chart with the new trend data. Activate your

chart by clicking it once if it is a chart object or by selecting

the filename from the Window menu if it is a separate chart

document. Select the ChartWizard button and in the Step 1 Of 2

dialog box, change the reference from filename!$A$1:$B$7 to

filename!$A$1:$C$7. Choose Next and in the Step 2 Of 2 dialog box,

choose OK.

The resulting chart will have a straight line (best fit line) running

through your original data.

NOTE: When you display the points of a best fit line against the

original data in a chart, you will, in most cases, get the best

results by using an xy (scatter) chart. If you use a line chart, the

x values will be treated as labels rather than as values, and

crooked lines may result.

PREDICTING FUTURE VALUES

========================

In addition to returning values along the line fitted to your existing

data, you can use TREND() and LINEST() to predict future values. You

can also use the FORECAST() function in Microsoft Excel 4.0.

Using the WidgeMaker, Inc., example, suppose you want to calculate

sales figures for months 7, 8, and 9. The following examples show how

to accomplish this using the TREND() and LINEST() functions,

respectively.

Using TREND()

-------------

To predict values for months 7, 8, and 9, first enter the month

numbers for which you want predicted sales figures and then use the

TREND() function to calculate the values:

1. In cells A8:A10, type 7, 8, and 9 respectively.

2. Select cells B8:B10.

3. Type the following formula:

=TREND(B2:B7,A2:A7,A8:A10)

4. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The third argument in the TREND() function is the array containing the

new x values for which you want to derive predicted y values.

The resulting values in cells B8:B10 are the predicted sales for the

next three months:

| A | B

---------------------

1 | Month | Sales

---------------------

2 1 $4,200

3 2 $6,100

4 3 $7,300

5 4 $7,300

6 5 $8,700

7 6 $10,500

8 7 $11,280

9 8 $12,403

10 9 $13,526

NOTE: In Microsoft Excel 4.0, you can use the AutoFill feature to

predict future values. Using the above data, if you wanted to

predict sales for months 7, 8, and 9, you would select cells B2:B7,

select the AutoFill handle in the lower-right corner of the selected

area and drag down three additional cells.

CAUTION: In addition to returning predicted values for months 7, 8,

and 9, the data in cells B2:B7 will be overwritten with the values

that represent the best fit line. If you do not want your original

data to be overwritten, copy it to a separate area on your worksheet

and then use AutoFill.

Using LINEST()

--------------

To obtain the new y values, you can also substitute the slope and y

intercept values that you derived with the LINEST() function and the

new x values (7, 8, and 9) into the formula, y=x+

by LINEST() to find data points on the best fit line by substituting

the y values or the x values into y=mx+b, the equation for a line. By

plugging each month number into this formula, you can calculate all

the data points for your best fit line. With the m value (1122.857) in

cell E2 and the b value (3420) in cell F2, do the following to

generate the points on your best fit line:

1. Select cell D2 and enter the following formula (it is not necessary

to enter this formula as an array):

=($E$2*A2)+$F$2

2. Select cells D2:D7.

3. From the Edit menu, choose Fill Down.

The values returned are the y values for your best fit line.

The following table shows the results of the values returned when you

use TREND() and LINEST():

| A | B | C | D | E | F

----------------------------------------------------------------------

1 | Month | Sales | Predicted Y | Predicted Y | Slope | Y Intercept

| | | TREND() | LINEST() | |

----------------------------------------------------------------------

2 1 $4,200 $4,543 $4,543 1122.857 3420

3 2 $6,100 $5,666 $5,666

4 3 $7,300 $6,789 $6,789

5 4 $7,300 $7,911 $7,911

6 5 $8,700 $9,031 $9,031

7 6 $10,500 $10,157 $10,157

NOTE: The returned values for TREND() and LINEST() are identical.

PLOTTING THE BEST FIT LINE

===========================

Once you've calculated the values on your best fit line, you can add

that line to your existing chart:

1. Select cells C1:C7. From the Edit menu, choose Copy.

2. Activate the chart document by selecting it from the Window menu.

3. From the Edit menu, choose Paste.

4. If you want different data point markers on your best fit line,

select the line and choose Pattern from the Format menu. Under

Marker, select Custom and choose the data point marker and color

you want. Choose OK.

NOTE: In Microsoft Excel 4.0, you can use the ChartWizard to

easily update your chart with the new trend data. Activate your

chart by clicking it once if it is a chart object or by selecting

the filename from the Window menu if it is a separate chart

document. Select the ChartWizard button and in the Step 1 Of 2

dialog box, change the reference from filename!$A$1:$B$7 to

filename!$A$1:$C$7. Choose Next and in the Step 2 Of 2 dialog box,

choose OK.

The resulting chart will have a straight line (best fit line) running

through your original data.

NOTE: When you display the points of a best fit line against the

original data in a chart, you will, in most cases, get the best

results by using an xy (scatter) chart. If you use a line chart, the

x values will be treated as labels rather than as values, and

crooked lines may result.

PREDICTING FUTURE VALUES

========================

In addition to returning values along the line fitted to your existing

data, you can use TREND() and LINEST() to predict future values. You

can also use the FORECAST() function in Microsoft Excel 4.0.

Using the WidgeMaker, Inc., example, suppose you want to calculate

sales figures for months 7, 8, and 9. The following examples show how

to accomplish this using the TREND() and LINEST() functions,

respectively.

Using TREND()

-------------

To predict values for months 7, 8, and 9, first enter the month

numbers for which you want predicted sales figures and then use the

TREND() function to calculate the values:

1. In cells A8:A10, type 7, 8, and 9 respectively.

2. Select cells B8:B10.

3. Type the following formula:

=TREND(B2:B7,A2:A7,A8:A10)

4. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The third argument in the TREND() function is the array containing the

new x values for which you want to derive predicted y values.

The resulting values in cells B8:B10 are the predicted sales for the

next three months:

| A | B

---------------------

1 | Month | Sales

---------------------

2 1 $4,200

3 2 $6,100

4 3 $7,300

5 4 $7,300

6 5 $8,700

7 6 $10,500

8 7 $11,280

9 8 $12,403

10 9 $13,526

NOTE: In Microsoft Excel 4.0, you can use the AutoFill feature to

predict future values. Using the above data, if you wanted to

predict sales for months 7, 8, and 9, you would select cells B2:B7,

select the AutoFill handle in the lower-right corner of the selected

area and drag down three additional cells.

CAUTION: In addition to returning predicted values for months 7, 8,

and 9, the data in cells B2:B7 will be overwritten with the values

that represent the best fit line. If you do not want your original

data to be overwritten, copy it to a separate area on your worksheet

and then use AutoFill.

Using LINEST()

--------------

To obtain the new y values, you can also substitute the slope and y

intercept values that you derived with the LINEST() function and the

new x values (7, 8, and 9) into the formula, y=

**. See the**

"Calculating a Best Fit Line" section for step-by-step instructions

on how to do this.

MULTIPLE REGRESSION ANALYSIS

============================

When you have two or more independent x variables for each y variable,

the regression analysis is multiple. For example, you could predict a

child's weight given his or her age and height. Assume you've

collected the following data

| A | B | C

--------------------------

1 | Age | Height | Weight

--------------------------

2 3 32 35

3 5 40 40

4 6 39 43

5 10 50 70

where the values under Weight (C2:C5) represent the y variables and

the values under Age and Height (A2:B5) are the x variables.

PREDICTING Y VALUES

===================

You can use either the TREND() or the LINEST() function to analyze the

relationship of the age and height to weight, and you can make

predictions based on the results of this analysis.

In Microsoft Excel 4.0, the Regression tool can also be used to

predict y values in a multiple regression model. The FORECAST()

function only works for simple regression. For additional information,

see pages 41-45 in "User's Guide 2."

Using TREND()

-------------

To use TREND() to predict the weight of a 9-year-old, 45-inch child,

do the following:

1. In cells A6:B6, type 9 and 45 respectively.

2. Select cell C6 and type the following formula:

=TREND(C2:C5,A2:B5,A6:B6)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The predicted weight is 63.42.

Using LINEST()

--------------

To predict a y value with LINEST(), you must first calculate the

slopes for each x variable and find the y intercept. Because a slope

is returned for each x variable, when you use the LINEST() function,

you must first select a range of cells that consist of a single row

and a single column plus an additional column for each x variable you

have. In this example, because you have two x variables, you will need

to select a range of three cells, three columns wide by one row.

To calculate the slopes and the y intercepts, do the following:

1. Select cells A7:C7 and type the following formula:

=LINEST(C2:C5,A2:B5)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for Macintosh.

The following values will be returned

| A | B | C

---------------------

7 | -0.32 5.98 24

where -.32 is the slope for the second x variable (height), 5.98 is

the slope for the first x variable (age), and 24 is the y intercept.

NOTE: The slopes are in reverse order: the first slope value

corresponds to the second x variable and the second slope value

corresponds to the first x variable. The LINEST() function always

returns the slopes in backwards order when more than one x

variable is involved.

You can use the slope values and the y intercept value to make

predictions based on your data. Using the formula,

y=(1*x1)+(2*x2)+(n*xn)+

"Calculating a Best Fit Line" section for step-by-step instructions

on how to do this.

MULTIPLE REGRESSION ANALYSIS

============================

When you have two or more independent x variables for each y variable,

the regression analysis is multiple. For example, you could predict a

child's weight given his or her age and height. Assume you've

collected the following data

| A | B | C

--------------------------

1 | Age | Height | Weight

--------------------------

2 3 32 35

3 5 40 40

4 6 39 43

5 10 50 70

where the values under Weight (C2:C5) represent the y variables and

the values under Age and Height (A2:B5) are the x variables.

PREDICTING Y VALUES

===================

You can use either the TREND() or the LINEST() function to analyze the

relationship of the age and height to weight, and you can make

predictions based on the results of this analysis.

In Microsoft Excel 4.0, the Regression tool can also be used to

predict y values in a multiple regression model. The FORECAST()

function only works for simple regression. For additional information,

see pages 41-45 in "User's Guide 2."

Using TREND()

-------------

To use TREND() to predict the weight of a 9-year-old, 45-inch child,

do the following:

1. In cells A6:B6, type 9 and 45 respectively.

2. Select cell C6 and type the following formula:

=TREND(C2:C5,A2:B5,A6:B6)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The predicted weight is 63.42.

Using LINEST()

--------------

To predict a y value with LINEST(), you must first calculate the

slopes for each x variable and find the y intercept. Because a slope

is returned for each x variable, when you use the LINEST() function,

you must first select a range of cells that consist of a single row

and a single column plus an additional column for each x variable you

have. In this example, because you have two x variables, you will need

to select a range of three cells, three columns wide by one row.

To calculate the slopes and the y intercepts, do the following:

1. Select cells A7:C7 and type the following formula:

=LINEST(C2:C5,A2:B5)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for Macintosh.

The following values will be returned

| A | B | C

---------------------

7 | -0.32 5.98 24

where -.32 is the slope for the second x variable (height), 5.98 is

the slope for the first x variable (age), and 24 is the y intercept.

NOTE: The slopes are in reverse order: the first slope value

corresponds to the second x variable and the second slope value

corresponds to the first x variable. The LINEST() function always

returns the slopes in backwards order when more than one x

variable is involved.

You can use the slope values and the y intercept value to make

predictions based on your data. Using the formula,

y=(

**, you can predict the weight**

of a 45-inch 9-year old:

=(9*5.98)+(45*-0.32)+24

The result of the formula, 63.42, is the predicted weight.

Similar to LINEST(), the Regression tool in Microsoft Excel 4.0

returns the slope values and the y intercept value. As described

previously, you can plug these values into the formula, y=x+

of a 45-inch 9-year old:

=(9*5.98)+(45*-0.32)+24

The result of the formula, 63.42, is the predicted weight.

Similar to LINEST(), the Regression tool in Microsoft Excel 4.0

returns the slope values and the y intercept value. As described

previously, you can plug these values into the formula, y=

**,**

to predict y.

NOTE: Because the x variables are independent, there may not be a

good graphical representation for a multiple regression model. Each

x value can be plotted with its corresponding y value but the

individual lines may be completely unrelated and therefore may not

be useful.

POLYNOMIAL REGRESSION ANALYSIS

==============================

When your data is neither exponentially curved nor consistently

linear, use this method of regression. When you plot a best fit curve

calculated with polynomial regression, the curve will rise and fall

with the data.

CALCULATING A POLYNOMIAL CURVE

==============================

To calculate a polynomial curve, the y variable is regressed against

the independent x variable raised to different powers. To illustrate

this process, take the following example (assume WidgeMaker, Inc.'s

sales for the first six months of its operation are as follows):

| A | B

------------------

1 | Month | Sales

------------------

2 1 $4,200

3 2 $1,600

4 3 $5,120

5 4 $4,500

6 5 $5,400

7 6 $1,460

Fitting a straight line to this data would not accurately predict the

sales for any given month. In this case, you'll get the best results

by setting up the following polynomial regression model:

| A | B | C | D | E | F

---------------------------------------------

10 | X | X^2 | X^3 | X^4 | Sales | Trend

---------------------------------------------

11 1 1 1 1 $4,200 $4,089

12 2 4 8 16 $1,600 $2,154

13 3 9 27 81 $5,120 $4,011

14 4 16 64 256 $4,500 $5,609

15 5 25 125 625 $5,400 $4,846

16 6 36 216 1296 $1,460 $1,571

The values in cells A11:A16 are the month numbers copied from A2:A7.

The values in cells B11:D16 are the original x variables raised to the

second, third, and fourth powers respectively. To obtain these values,

do the following:

1. Select cell B11 and enter the formula:

=A11^2

2. Select cell C11 and enter the formula:

=A11^3

3. Select cell D11 and enter the formula:

=A11^4

4. Select cells B11:D16.

5. From the Edit menu, choose Fill Down.

The values in E11:E16 are the sales figures copied from B2:B7. To

derive the trend values, do the following:

1. Select cells F11:F16 and type the formula:

=TREND(E11:E16,A11:D16)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

PLOTTING A POLYNOMIAL CURVE

===========================

To add the TREND() results to this chart, select cells F10:F16 and

follow the steps under "Plotting the Best Fit Line" on page 6 of this

Application Note. You will create a chart that resembles the

following:

USING REGRESSION STATISTICS

===========================

The LINEST() and LOGEST() functions can return additional regression

statistics that can be helpful in using and evaluating your regression

model. If you have linear data, you can use the Regression Tool in

Microsoft Excel version 4.0 to automatically return all the regression

statistics. If your data resembles an exponential curve, use LOGEST()

to return accurate regression statistics.

Using LINEST() or LOGEST() to Return Regression Statistics

----------------------------------------------------------

To return the additional statistics using LINEST() or LOGEST(), you

must select a range that includes five rows and a single column plus

an additional column for each x variable you have. In addition, the

stats argument, which is the fourth argument in both of these

functions, must be set to TRUE.

To return the additional regression statistics using the following

data

| A | B | C

-------------------------

1 | Age | Height | Weight

-------------------------

2 3 32 35

3 5 40 40

4 6 39 43

5 10 50 70

do the following:

1. Select cells D1:F5.

NOTE: This range consists of five rows and a single column plus

two additional columns, one for each x variable.

2. Type the following formula:

=LINEST(C2:C5,A2:B5,,TRUE)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The resulting data should resemble the following:

| D | E | F

------------------------------------

1 -0.32 5.98 24

2 2.243569 5.647619 57.68449

3 .950813 6.024948 #N/A

4 9.665289 1 #N/A

5 701.7 36.3 #N/A

In the first row of the statistics, you have the slope for the height,

the slope for the age, and the y intercept. The second row contains

the standard error of the slopes and of the y intercept. R2 and the

standard error for the y estimate are in the third row. The

statistic and degrees of freedom are in the fourth row, while the

regression sum of squares and the residual sum of squares are in the

fifth row.

Using R2 to Determine the Accuracy of the Regression Model

----------------------------------------------------------

A particularly useful statistic returned is the coefficient of

determination (R2). In Microsoft Excel 4.0, you can also use the RSQ()

function to find R2. This R2 indicator ranges in value from 0 to 1 and

tells you how closely the estimated y values correlate to your actual

y values. The closer R2 is to 1, the more perfect the correlation--

this correlation indicates that the regression equation is very useful

in accurately predicting a y value. On the other hand, the closer R2

is to 0, the less helpful it will be in predicting a y value. In the

previous example, the value for R2 returned by LINEST() is .95, a near

perfect correlation. This indicates that, based on the collected data,

the LINEST() model can be used to make extremely accurate predictions

of a child's weight given a specific age and height.

If you want to predict how accurately a child's age and height will

predict their weight, given the collected data, R2 will indicate the

accuracy of the predicted weight of a child with a different age and

height not included in the collected data.

to predict y.

NOTE: Because the x variables are independent, there may not be a

good graphical representation for a multiple regression model. Each

x value can be plotted with its corresponding y value but the

individual lines may be completely unrelated and therefore may not

be useful.

POLYNOMIAL REGRESSION ANALYSIS

==============================

When your data is neither exponentially curved nor consistently

linear, use this method of regression. When you plot a best fit curve

calculated with polynomial regression, the curve will rise and fall

with the data.

CALCULATING A POLYNOMIAL CURVE

==============================

To calculate a polynomial curve, the y variable is regressed against

the independent x variable raised to different powers. To illustrate

this process, take the following example (assume WidgeMaker, Inc.'s

sales for the first six months of its operation are as follows):

| A | B

------------------

1 | Month | Sales

------------------

2 1 $4,200

3 2 $1,600

4 3 $5,120

5 4 $4,500

6 5 $5,400

7 6 $1,460

Fitting a straight line to this data would not accurately predict the

sales for any given month. In this case, you'll get the best results

by setting up the following polynomial regression model:

| A | B | C | D | E | F

---------------------------------------------

10 | X | X^2 | X^3 | X^4 | Sales | Trend

---------------------------------------------

11 1 1 1 1 $4,200 $4,089

12 2 4 8 16 $1,600 $2,154

13 3 9 27 81 $5,120 $4,011

14 4 16 64 256 $4,500 $5,609

15 5 25 125 625 $5,400 $4,846

16 6 36 216 1296 $1,460 $1,571

The values in cells A11:A16 are the month numbers copied from A2:A7.

The values in cells B11:D16 are the original x variables raised to the

second, third, and fourth powers respectively. To obtain these values,

do the following:

1. Select cell B11 and enter the formula:

=A11^2

2. Select cell C11 and enter the formula:

=A11^3

3. Select cell D11 and enter the formula:

=A11^4

4. Select cells B11:D16.

5. From the Edit menu, choose Fill Down.

The values in E11:E16 are the sales figures copied from B2:B7. To

derive the trend values, do the following:

1. Select cells F11:F16 and type the formula:

=TREND(E11:E16,A11:D16)

2. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

PLOTTING A POLYNOMIAL CURVE

===========================

To add the TREND() results to this chart, select cells F10:F16 and

follow the steps under "Plotting the Best Fit Line" on page 6 of this

Application Note. You will create a chart that resembles the

following:

USING REGRESSION STATISTICS

===========================

The LINEST() and LOGEST() functions can return additional regression

statistics that can be helpful in using and evaluating your regression

model. If you have linear data, you can use the Regression Tool in

Microsoft Excel version 4.0 to automatically return all the regression

statistics. If your data resembles an exponential curve, use LOGEST()

to return accurate regression statistics.

Using LINEST() or LOGEST() to Return Regression Statistics

----------------------------------------------------------

To return the additional statistics using LINEST() or LOGEST(), you

must select a range that includes five rows and a single column plus

an additional column for each x variable you have. In addition, the

stats argument, which is the fourth argument in both of these

functions, must be set to TRUE.

To return the additional regression statistics using the following

data

| A | B | C

-------------------------

1 | Age | Height | Weight

-------------------------

2 3 32 35

3 5 40 40

4 6 39 43

5 10 50 70

do the following:

1. Select cells D1:F5.

NOTE: This range consists of five rows and a single column plus

two additional columns, one for each x variable.

2. Type the following formula:

=LINEST(C2:C5,A2:B5,,TRUE)

3. Enter the formula as an array by pressing CTRL+SHIFT+ENTER in

Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel

for the Macintosh.

The resulting data should resemble the following:

| D | E | F

------------------------------------

1 -0.32 5.98 24

2 2.243569 5.647619 57.68449

3 .950813 6.024948 #N/A

4 9.665289 1 #N/A

5 701.7 36.3 #N/A

In the first row of the statistics, you have the slope for the height,

the slope for the age, and the y intercept. The second row contains

the standard error of the slopes and of the y intercept. R2 and the

standard error for the y estimate are in the third row. The

statistic and degrees of freedom are in the fourth row, while the

regression sum of squares and the residual sum of squares are in the

fifth row.

Using R2 to Determine the Accuracy of the Regression Model

----------------------------------------------------------

A particularly useful statistic returned is the coefficient of

determination (R2). In Microsoft Excel 4.0, you can also use the RSQ()

function to find R2. This R2 indicator ranges in value from 0 to 1 and

tells you how closely the estimated y values correlate to your actual

y values. The closer R2 is to 1, the more perfect the correlation--

this correlation indicates that the regression equation is very useful

in accurately predicting a y value. On the other hand, the closer R2

is to 0, the less helpful it will be in predicting a y value. In the

previous example, the value for R2 returned by LINEST() is .95, a near

perfect correlation. This indicates that, based on the collected data,

the LINEST() model can be used to make extremely accurate predictions

of a child's weight given a specific age and height.

If you want to predict how accurately a child's age and height will

predict their weight, given the collected data, R2 will indicate the

accuracy of the predicted weight of a child with a different age and

height not included in the collected data.