# Category : Lotus and other Spreadsheets

Archive : POWERWKS.ZIP

Filename : SREGR.DOC

SREGR - Simple Linear Regression

1. INTRODUCTION

Linear regression is a method of finding the "best" straight

line through a set of points. Each point is described by two

values, and if there is a linear relationship between the values,

the points will lie close to a straight line when plotted on a

graph. For example, we might look for such a relationship

between:

- sales of the IBM PC and sales of the PC version of Lotus

1-2-3

- sales of Lotus 1-2-3 and of books about 1-2-3

- prime interest rate and business bankruptcies

In each of the above examples, we might be interested in

estimating the value of the second item (Y, the dependent var-

iable) given a value for the first item (X, the independent

variable). The SREGR worksheet will calculate the equation of the

best-fit straight line through pairs of X-Y values, and will

calculate estimates of Y, given X values, from the line. This

worksheet is a subset of the LREGR worksheet (contained on the

Volume 1B disk). It does not calculate any confidence limits, or

the significance of the Student's t value for the regression. As

a result, it can accept more pairs of input values for a

particular main memory configuration, and it calculates faster

than LREGR.

The worksheet accepts pairs of values (labelled Y and X), and

calculates the values of a and b in the equation Y = a + b X of

the best-fit straight line. a is often called the intercept,

because it is the point at which the line intercepts the Y-axis on

a graph, and b is the slope.

In addition to calculating the intercept and slope, the

worksheet calculates two of the parameters a statistician uses to

appraise the quality of the fit. These are:

- the correlation coefficient (r), which measures the degree

of association between X and Y. The value of r always lies

between 1 and -1. A value of zero indicates no

relationship; a value of +1 or -1 indicates a perfect

relationship. The correlation coefficient is negative when

larger values of X are associated with smaller values of Y.

- the value of Student's t for the hypothesis that the slope

(b) is zero (ie. changing the value of X doesn't have a

significant effect on the value of Y). The larger the value

of t, the less we are inclined to accept the hypothesis.

Note the SREGR worksheet does not calculate the significance

of this t-value in probability terms (but you can look it up

in standard statistical tables).

08/21/84 SREGR - 1

Statistical tables exist for the correlation coefficient and

Student's t, which give the probability that such a value (or

better) could be obtained purely by chance from random data.

For the purpose of estimating the coefficients (a and b) of

the regression equation, the worksheet uses all X-values for which

Y-values are supplied. These X-Y pairs must be entered in

consecutive rows, beginning at the top of the input area.

However, you may enter more X-values below the ones paired with

Y-values. The worksheet estimates the Y-values corresponding to

all the X-values you enter. This is useful if you want to use the

regression line as a forecasting pr estimating tool.

2. USING SREGR

The main menu of SREGR offers you seven choices (plus Quit).

They are as follows:

Input

This choice allows you to enter data to feed the

calculations, and to change the limit on the number of data

items you may input. There are two types of data: the raw

X-Y pairs from which the regression line is derived, and

additional X values for which Y estimates are calculated.

Note that changing the maximum number of data items involves

erasing all input data. Therefore, if you think you might

need more space than the worksheet initially allows (300

pairs), you should change the limit before you enter any

data.

Blank

You should choose this option if you have finished with the

data in the worksheet and want to start again with an empty

worksheet. Note that this option wipes out all calculated

results as well as the raw data. (You must confirm this

choice to reduce the risk of accidental erasure.)

Calc

Make this choice when you want to calculate or recalculate

the results. If you have changed data or entered more data

since you last calculated, you must recalculate before

displaying new results or graphs.

Results

This option guides you through the various results areas in

the worksheet. The main results are all shown in the home

screen. But there is a screen full of intermediate results

(averages and sums of squares, for example). And alongside

the columns in which you entered the raw data are calculated

Y estimates. To print your results, select Print from the

main menu (see below).

Graph

One of the clearest ways to show a linear regression is

08/21/84 SREGR - 2

though a graph. This worksheet offers one graph - a simple

one showing the original data points and the regression line

through them.

The print option allows you to send selected areas of the

worksheet to your printer. You can select two different

areas: the final and intermediate results areas (as one

page), and the raw data and Y estimates.

Agenda

If you select this option (and confirm it when requested),

the current worksheet is lost and replaced by the main

agenda. Then you can load another worksheet from the disk.

Note confirmation is required before this option proceeds,

because it will overwrite the worksheet in memory.

2.1 Input Details

The input phase allows you to enter data in a variety of

ways. The input menu has five choices (plus Quit).

The first and second choices (XYPairs and YXPairs) are very

similar. If your source data is arranged as two columns of

numbers, these choices will handle all pointer movement as you

enter your data row-by-row. If the X column is on the left,

select XYPairs; if the Y column is on the left, select YXPairs.

Then enter your data one row at a time. Type the number on the

left first, then the number on the right. After each number,

press the ENTER key. Note that in the worksheet the Y column is

always on the left. Selecting XYPairs or YXPairs merely

determines which column receives the first number of each pair you

type. These two options (and Est-X) are convenient if you want to

use the numeric pad on your keyboard to enter the numbers, because

they perform all the necessary cursor movement for you. You can

press the Num Lock key to disable the cursor movement keys, and

use the numeric pad for numbers. (Remember to press the Num Lock

key again when you've finished entering your data, to re-activate

the cursor movement keys.) When you have entered all your data,

hold down the Ctrl key and press Break. This will stop the macro

and exit to 1-2-3 READY mode. Then hold down the Alt key and

press I to return to the input menu. If you choose either XYPairs

or YXPairs when you have already entered some data, the new data

will be added to the bottom of the existing pairs (but overlaying

any unpaired X values for which Y estimates were required). To

get rid of existing pairs, you must use the Blank option from the

main menu.

The third choice is Est-X. It allows you to add a column of

unpaired X values below the existing data. The worksheet will

calculate a Y estimate for each unpaired X value you enter. Type

each value for X, and press the ENTER key. When you have entered

all the values, hold down the Ctrl key and press Break. This will

stop the macro and exit to 1-2-3 READY mode. Then hold down the

Alt key and press I to return to the input menu.

08/21/84 SREGR - 3

The fourth choice uses the standard 1-2-3 Range Input facil

ity. In some respects this is convenient, because it confines the

cell pointer to the unprotected cells in the X and Y input

columns. And pressing the Esc key returns you directly to the

input menu. However, you are responsible for all cursor movement

within the input area. So you cannot easily use the numeric pad

to enter your numbers with this option. Remember to enter Y

values on the left (column B), and X values on the right (column

C).

The final choice (except Quit) in this menu is Number. It

allows you to set the maximum number of input values. When the

SREGR worksheet is first loaded, it is limited to 300 pairs of

input values. If your machine has less than 256K of memory, you

will not be able to enter even this many pairs without a 'Memory

Full' error. However, if your machine has more memory, you may

use this option to increase the amount of data you can input. You

must confirm your intention to use this option, because it will

erase any data in the input area.

There will come a time when you have entered data into the

worksheet, but you need to re-size the input area (for example,

because you underestimated the amount of data). When you re-size

the area, the worksheet will erase your input. But you can use

the /File Xtract option of 1-2-3 to save the data you have entered

and avoid re-typing it. Here's how.

To save the input data, first exit to 1-2-3 READY mode,

then press the Goto key (F5), type the word 'input', and

press the ENTER key. You will find yourself at the top left

corner of the input area. Then type the following 1-2-3

command (initial letters only):

/File Xtract Values

1-2-3 will prompt you for a file name. Choose a name for

the data, type it, and press ENTER. If a file with this

name already exists, 1-2-3 will ask you if it may be

replaced (ie. if 1-2-3 may overwrite it). When you have

chosen the file name, 1-2-3 asks you to specify the range to

be saved. Press the keys Right End Down in sequence,

followed by the ENTER key. Here, Right means the 6 key on

the numeric pad (in cursor mode), and Down means the 2 key.

1-2-3 will then save your data on disk, and you may proceed

to re-size the input area. (Hold down the Alt key and press

M to return to the main menu.)

When you have re-sized the input area, you can retrieve

your data from the disk by exiting to 1-2-3 READY mode and

moving to the top left corner of the input area (as

described above). Then type the following 1-2-3 command

(initial letters only):

/File Combine Copy Entire-file

1-2-3 will present a menu of the files on disk, from which

08/21/84 SREGR - 4

you should select the one you saved (or type its name) and

press the ENTER key. 1-2-3 will then retrieve your data

from the disk into the worksheet. Hold down the Alt key and

press M to return to the main menu.

There is one more option available to you for inputting data

to this worksheet. That is to use the 1-2-3 commands. For

example, you may already have the input data for your regression

analysis stored in a worksheet on disk. If the data occupies two

adjacent columns in the worksheet, with Y values in the left

column, you can use 1-2-3's /File Combine Copy command to copy the

data into the LREGR worksheet. To do this, simply choose Quit at

the main menu, press the Goto key (F5), type the word 'input', and

press ENTER. You will find yourself at the top left corner of the

data input area. Now you can type /fcc and follow the remaining

1-2-3 instructions to copy the data from the worksheet on disk.

When you have finished, hold down the Alt key and press I to

return to the input menu. Note, do not use /Move to move data

around in the worksheet. Moving data into a named range

invalidates the range name along with any formulas that depend on

it. You will probably destroy the worksheet's ability to

calculate properly if you use /Move commands in it.

2.2 Graph Details

The macro that displays the graph in this worksheet

deliberately avoids setting many of the graph parameters. For

example, the labels for the X- and Y- axes are initially set when

the worksheet is loaded, and are not changed. If you wish to use

your own labels, exit to 1-2-3 READY mode and change them. Then

hold down the Alt key and press M to return to the main menu. The

following is a list of the graph options that are set by the

macro. Attempts to reset these options will be futile.

- Graph subtitle

- Graph type and ranges

- Legends for ranges

- Range formats

To get a printed copy of any graph, first display it using

the graph menu. The last-displayed graph is called the current

graph. Then exit to 1-2-3 READY mode. Press the Graph key (F10),

and verify the graph. If you wish, you may use the 1-2-3 /Graph

commands to modify the graph before printing. Then type the

following 1-2-3 command (initial letters only):

/Graph Save

1-2-3 will ask you to specify a file name, and save the graph in

that file. You may then use the PrintGraph program to print the

graph. For instructions on how to use the PrintGraph program,

read the PrintGraph section of the 1-2-3 manual.

2.3 Print Details

The print macros assume you have a printer capable of 80-

08/21/84 SREGR - 5

column printing. Before printing each option, the macros allow

you to align the paper to the top of a new page. Press the ENTER

key when you are ready for printing to begin. At the end, the

macro will feed the paper to the top of a new page so that,

normally, you won't have to keep re-aligning the paper.

The final and intermediate results are printed as one,

because they will comfortably fit onto a 66-line page. The third

option can lead to several pages of output. The macro defines the

top border of the page to be the rows of the worksheet that

identify the columns being printed.

3. EXAMPLES

3.1 Radio or Television?

In the U.K., one must buy a television licence before one can

legally receive television programs. Prior to 1971, a radio

licence was required to receive radio signals, unless one bought a

television licence which covered both. With the rise of tele-

vision in the home, one might expect a fall in the number of radio

licences issued. Indeed, one might expect that for every

television licence bought, one less radio licence would be sold,

since most people owned radios before they bought televisions.

The numbers of radio and television licences issued for the years

1949 to 1968 are shown in the following table:

Radio and Television Licences

Issued in the United Kingdom.

YEAR RADIO TELEVISION

(Millions)

1949 11.7 0.1

1950 12.0 0.3

1951 11.6 0.7

1952 11.2 1.4

1953 10.7 2.2

1954 10.2 3.2

1955 9.4 4.4

1956 8.6 5.6

1957 7.6 6.8

1958 6.6 8.0

1959 5.4 9.2

1960 4.5 10.4

1961 3.9 11.2

1962 3.5 11.9

1963 3.2 12.5

1964 3.0 13.0

1965 2.8 13.3

1966 2.6 13.7

1967 2.5 14.3

1968 2.5 15.0

This table clearly shows the growth of television licences at

08/21/84 SREGR - 6

the expense of radio licences. The data in this table is

contained on your disk in a worksheet named TV. To use it to

illustrate the regression analysis, perform the following steps:

1. Load the SREGR worksheet and when the main menu appears,

Press Q to exit to 1-2-3 READY mode.

2. Press the Goto key (F5), then type the word 'input' and

press the ENTER key.

3. Type the following 1-2-3 command (initial letters only):

/File Combine Copy Entire-file TV

and press the ENTER key. This will bring the saved data

into the input area of your worksheet.

4. Hold down the Alt key and press M. This will return you to

the main menu.

5. Choose the Calc option. When calculations are complete, you

may use the results, graph, and print phases.

The results show a highly significant correlation. The

surprising part of the answer, however, is that the slope of the

regression line is -1.425, and not, as one might expect, -1.

Moreover the standard deviation of the slope - S(b) - is so small

(0.03) that it is inconceivable that the real slope is -1 given

the observations we have. This means the increase in the number

of television licences sold exceeds the decrease in the number of

radio licences by about 40%. Since anyone who bought or rented a

television almost certainly used a radio, one must look for an

explanation.

A possible explanation for the difference is that radio

licences were not treated seriously. It was common for people to

use unlicenced radio receivers, and few were prosecuted. But when

television licences were introduced, the need for a licence was

publicised, and users of unlicenced televisions were prosecuted

more actively. The 40% difference is likely due to people buying

a television licence when before they did not bother to buy a

radio licence. Another likely explanation is that the population

of viewers was growing in total.

The correlation coefficient (r) for this particular

regression is -0.9962, which is very close to -1 (meaning perfect

correlation). If you examine the graph, you will see that the

points do indeed lie very close to a straight line.

A note of warning is in order here. It can be very dangerous

to assume that a cause-and-effect relationship exists between two

factors, even when the fit is this good. One should take the

results of any statistical analysis with a large grain of salt,

unless there is a prior expectation of cause-and effect. In this

particular case, we do have such expectation because owning a

television licence removed the need to buy a radio licence.

Measurements such as these that are taken over a period of time

can be influenced by external factors which change over time.

This can lead to an apparent relationship when in fact none

exists. For example, television sales over some years were

positively correlated with admissions to mental institutions.

This statement could hardly justify an assertion that televisions

08/21/84 SREGR - 7

cause mental illness! Both simply happened to be growing over the

period the figures were observed.

3.2 Example 2 - Fictitious Data

This example leads you through a complete analysis of the

data given in the following table:

Linear Regression

Fictitious Data

X value Y value

1 3

3 5

5 6

7 8

9 12

11 12

Once you have loaded the SREGR worksheet, the following

keystrokes will perform the analysis:

Keystrokes Comments

{ENTER} Press the ENTER key to select the Input choice in

the main menu.

X Choose the XYPairs option in the input menu,

because this is how the source data is arranged.

Note that the macro will place the cell pointer in

the topmost cell of the X value input column.

1 {ENTER} This enters the first X value in the top cell.

The macro will move the cell pointer one column to

the left to receive the first Y value.

3 {ENTER} Enters the first Y value. The macro will now move

the cell pointer to the right and down so you can

enter the second X value.

3 {ENTER} Enter the remaining values.

5 {ENTER}

5 {ENTER}

6 {ENTER}

7 {ENTER}

8 {ENTER}

9 {ENTER}

12 {ENTER}

11 {ENTER}

12 {ENTER}

{Ctrl} {Break} Having entered all the pairs of values, hold down

the Ctrl key and press Break. This stops the

08/21/84 SREGR - 8

macro, and returns you to 1-2-3 READY mode.

{Alt} I Hold down the Alt key, and press I. This returns

you to the input menu.

E Select Est-X to obtain estimates of Y for

specified X values. The macro positions the cell

pointer at the first cell in the X input column

below the values you have already input.

0 {ENTER} Enter the X values for which estimates are

required.

2 {ENTER}

4 {ENTER}

6 {ENTER}

8 {ENTER}

10 {ENTER}

12 {ENTER}

{Ctrl} {Break} Return to the input menu.

{Alt} I

Q Return to main menu.

C Perform the calculations.

At this point, all the results have been obtained, and you

can display them by choosing Results in the main menu. Here's how

(starting at the main menu):

R Takes you to the Results menu from the main menu.

{ENTER} Displays the final results on the 'home' screen.

(Typically, you will already be on the home

screen, and this option will have no visible

effect.) On this screen, you can read the

calculated regression equation, the correlation

coefficient, and the Student's t value for testing

the significance of the regression.

I Displays the intermediate results. Here you can

see the number of (paired) observations, and the

mean of the (paired) X and Y values. The last

line on this screen shows the standard deviation

of the slope, which is used to derive the

Student's t statistic for the significance test of

the regression.

R Displays the raw data. The display starts at the

first screen-full of data, but you are presented

with a menu that allows you to scroll up and down

one screen-full at a time. Note the macro sets up

a window so the column headings are always

displayed above the data columns. You will not

need to scroll for this small set of data.

08/21/84 SREGR - 9

Q Returns you to the results menu.

Q Returns you to the main menu from the results

menu.

You can display the graph provided by this worksheet as

follows:

G Takes you from the main menu to the graphics menu.

Initially, the graphs are set up to display on a

color monitor. If you have a monochrome monitor

(with a graphics adaptor card), you should type

the following three commands:

O Selects the graphics options menu.

B Tells 1-2-3 to plot graphs in monochrome (B&W).

Q Returns you to the graphics menu.

R Displays the graph showing the raw data points,

and the regression line through them. From this

graph, you can see how well (or not) the points

follow the straight line. Press any key to return

to the graphics menu.

Q Returns you to the main menu.

If you wish, you may Quit all menus and set up the graph

title (not subtitle), and the X- and Y-axis legends to suit your

needs. Then use Alt-M to return to the menus and re-display the

graph. Since the graphics macro does not reset these parameters

each time, they will appear on the graph as you have set them.

To print the results, first make sure your printer is turned

on and loaded with paper. Then, from the main menu, type the

following:

P To go to the print menu.

R To print the final and intermediate results.

Press ENTER when you are ready for printing to

begin.

D To print the raw data and Y estimates. Press

ENTER when you are ready for printing to begin.

Q To return to the main menu.

Before starting each section of output, the macros pause for

you to align the paper. After each section of output, they space

to the top of the next page, so you should only have to align the

paper once - before the first section.

08/21/84 SREGR - 10

Menu Tree for SREGR Worksheet

Input - Input X-Y data.

. XYPairs - Input X-Y pairs of values (X first).

. YXPairs - Input Y-X pairs of values (Y first).

. Est-X - Enter X values to obtain Y estimates.

. Range - Use 1-2-3 standard Range Input facility.

. Number - Set maximum number of data pairs (erases data).

. . No - Do not erase or resize data area.

. . Yes - Erase and resize data area.

. Quit - Return to main menu.

Blank - Blank out data input area.

. No - Do not blank out input area; return to main menu.

. Yes - Blank out input area, then return to main menu.

Calc - Calculate all results.

Results - Display results on the screen.

. Final - Show final results (regression equation, etc.).

. Intermed - Show intermediate results (sums of squares, etc.).

. Rawdata - Show raw input data and Y estimates.

. . Down - Scroll down one screen.

. . Up - Scroll up one screen.

. . Quit - Return to results menu.

. Quit - Return to main menu.

Graph - Display graphs.

. Rawdata - Display graph of input data and regression line.

. Options - Select graph options.

. . Color - Display graphs in color.

. . B&W - Display graphs in monochrome.

. . Quit - Return to graph menu.

. View - Re-display the last-seen graph.

. Quit - Return to main menu.

Print - Send results to printer.

. Results - Print final and intermediate results.

. Data - Print raw data and Y estimates.

. Quit - Return to main menu.

Agenda - Return to worksheet agenda.

. No - Do not leave this worksheet; return to main menu.

. Yes - Clear this worksheet and load agenda.

Quit - Exit to 1-2-3 READY mode.

08/21/84 SREGR - 11

Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/