Dec 292017

MAPSTAT is a very serious multivariate statistical analysis package capable of meeting most users’ analytical needs. | |||
---|---|---|---|

File Name | File Size | Zip Size | Zip Type |

——- | 1 | 1 | stored |

——– | 1 | 1 | stored |

BEGIN.BAT | 915 | 381 | deflated |

MAPSTAT.COM | 12037 | 8516 | deflated |

MAPSTAT.DOC | 33024 | 11212 | deflated |

MAPSTCLS.CHN | 7050 | 3634 | deflated |

MAPSTCOR.CHN | 4928 | 2618 | deflated |

MAPSTDSC.CHN | 6778 | 3750 | deflated |

MAPSTFAC.CHN | 15023 | 8001 | deflated |

MAPSTFIX.CHN | 1505 | 1075 | deflated |

MAPSTHYP.CHN | 6606 | 3634 | deflated |

MAPSTMLR.CHN | 4903 | 2881 | deflated |

MAPSTMNV.CHN | 9256 | 4917 | deflated |

MAPSTPLT.CHN | 4932 | 2913 | deflated |

MAPSTPTL.CHN | 4858 | 2653 | deflated |

MAPSTSRT.COM | 11855 | 8557 | deflated |

MAPSTTRN.CHN | 9004 | 5010 | deflated |

MAPSTXTB.CHN | 8072 | 4262 | deflated |

~~~~~~~~.~~~ | 1 | 1 | stored |

# Download File MAPSTAT.ZIP Here

## Contents of the MAPSTAT.DOC file

MULTIVARIATE ANALYSIS PACKAGE 2.0

Copyright 1985, 86, 87, 88

Douglas L. Anderton

Department of Sociology

University of Chicago

1126 E. 59th Street

Chicago, IL 60637

These programs are released for distribution so long as 1) any

charges involved do not exceed costs of media and mailing, and 2)

no portion of programs is used for commercial resale without

written permission of the author.

INTRODUCTION:

MAPSTAT is a very serious multivariate statistical analysis

package capable of meeting 90% or more of most users' analytical

needs. The routines have, at this point, been well tested and

provide the most frequently used procedures of the relatively

expensive statistical packages without cost. Unlike any

commercial software package, Turbo Pascal (@Borland Int'l) source

code is included for modifications and elaborations at your own

risk.

If data are properly arranged (discussed below) MAPSTAT can

theoretically analyze an unlimited number of variables and cases.

It has been tested on data files containing over 200 variables

and 10,000 cases. It is highly recommended that you read the

entire documentation file before using or modifying MAPSTAT. It

is equally important that you have a knowledge of any statistical

procedure before you attempt to use it and interpret the results.

Fourteen subprograms are included in this seventh release of

MAPSTAT. All of the statistical programs are evoked from the

same common menu which is displayed when the command:

>MAPSTAT

is evoked from the DOS prompt. The currently available sub-

programs in MAPSTAT include:

1) DESCRPT - descriptive statistics and frequency histograms

2) CORREL - correlation and covariance matrices

3) REGRESS - multiple linear regression

4) CROSSTAB - n-way crosstabulation and association tests

5) TRANSFRM - data transformations

6) HYPOTHS - simple hypotheses test on means and variances

7) PARTIAL - partial correlation coefficients

8) FACTOR - principle axis factoring with rotations

9) CLUSTER - kmeans clustering program

10) PLOT - simple 2 dimensional plots

12) MANOVA - multiple dependent variable analysis of variance

13) FIXFREE - utility for fixed to free format file conversion

14) SORT - utility to sort mapstat files with DOS SORT

Features and limitations of these programs are discussed below.

DESIGN PHILOSOPHY:

First, MAP is written as a sequential case processor to avoid

memory resident storage and achieve the greatest speed possible.

This has several consequences: 1) the package contains powerful

statistical analysis programs without horrendous memory

requirements; 2) however, the cost arises in that for redundant

functions such as histograms, regression residuals, etc., the

package currently requires multiple passes at the data. Even for

large data sets the programs are sufficiently fast to make such

passes reasonable.

Major benefits of this strategy include the (A) ability to run

the package on a floppy disk system without a hard drive

(including many portable and laptop machines); (B) ability to run

the package with as little as 56K memory; (C) the analysis of a

virtually unlimited number of cases within these modest hardware

requirements; and (D) a blinding speed compared with many larger

and more cumbersome statistical packages. Unlike many the memory

mammoths, MAPSTAT is ideal for running under multitasking

systems (e.g. DESQVIEW, DOSAMATIC) in that it requires so little

memory that it leaves room for many other tasks to be running.

Even if you currently use other statistical packages for

analysis, I suspect you will find MAPSTAT more useful and

accessible for many of your routine computing tasks.

INPUT DATA REQUIREMENTS:

MAP expects to find your data in a free format with at least one

blank separating each variable and a new line at the end of each

line. All variables for each case must be on a single line, i.e.

new lines separate records. It will not accept alphanumeric

data. Programs assume all data transformation has been performed

(e.g. CROSSTAB expects a finite number of values, not necessarily

integer value). Thus, it is very desirable to master the more

difficult TRANSFRM transformation program before making full use

of MAPSTAT. These are the only data requirements.

As with all statistical packages on all computers, extreme data

values are less precise. For example, means on variable values

such as 0.000001, 0.000002, 0.000003 or 1E+10, 2E+10, 3E+10 are

less accurate than the equivalent computations on 1, 2, and 3.

Thus, after an initial DESCRIPT run accuracy will be improved in

subsequent analyses if such values are transformed with TRANSFRM

by a simple scale shift to more reasonable ranges (e.g. in the

above examples multiply by 1,000,000 or divide by 1E+09).

A simple data file of four variables and five cases might then

look like:

100 50 90 -9

110 65 91.1 10

104 72 92.63 -9

107 70 92 11

97 36 99 14

Note there is no requirement that variables be aligned, only that

they be separated by a space with one case (or record) per line.

Of course, we will want to be able to identify variables by an

alphanumeric name, e.g. INCOME, EDUCTION, RESIDNCE, AGE. We will

also want to be able to identify the codes for each variable

which we use to indicate that the value for that variable is

missing, e.g. due to non-response on a survey item or non-

applicable items for this case. Codebook files containing

variable names and missing values are thus provided for.

Alternatively, if the use prefers, this data may be entered

interactively from the keyboard. For further information on

specifying variable names and missing values, see the details

instructions on 'running the programs' below.

RUNNING THE PROGRAMS:

1. Specifying Data Input and Output Files -

After invoking the programs they will ask for the name of an

input data file (or a file created from a prior MAP run - for

example, the output of CORREL is used by REGRESS), and the name

of an output file. For printer output specify the filename as

LST: and for screen output specify CON:. (An exception is

TRANSFRM, which uses buffered output routines will accept LST:

and CON: but will send output to LIST.TMP and CONSOLE.TMP disk

files respectively.) To send output to a disk file or obtain

input from a disk file simply enter the name of the file. This

file must reside on the current drive in the current

subdirectory.

For example, invoking MAPSTAT you should see the main menu:

***** MAPSTAT v1.6 MENU *****

(Copyright 1985,6,7 D. Anderton)

A. Data Transformation, Selection and Recoding

B. Descriptive Statistics and Histograms

C. Two Dimensional Plots or Scattergrams

D. Hypothesis Testing on Variables or Subgroups

E. Correlation and Covariance Matrices

F. Partial Correlation and Covariance Matrices

G. Multiple Linear Regression

H. Factor Analysis with Orthogonal Rotations

I. N-Way Crosstabs and Categorical Association

J. Clustering by KMeans Algorithm

K. Multiple Analysis of Variance

L. Convert Fixed Format to Free Format Mapstat File

M. Sort a Mapstat File with DOS SORT.EXE

X. Exit to OpSys

Enter Selection:

After selecting 'B' for descriptive statistics MAPSTAT will

display:

*** DESCRPT: DESCRIPTIVE STATISTICS ***

Name of the data file?

You must now enter the name of a data file on the current disk

and subdirectory, e.g. 'MYDAT.DAT'. If MAPSTAT cannot find this

file an error message will be written and you will be ejected

from the program. Otherwise you will see:

Name of the output file?

You may now enter either (a) the name of a new file to be created

to contain output on the current disk and subdirectory, (b)

'CON:' (no quotations) to send output to the screen, or (c)

'LST:' (again without quotations) to send output to the current

printer device.

If you do not make a drastic mistake (e.g. you have room for the

new disk file or the printer is turned on) you will then see the

prompt:

Name of the codebook file (or NONE)?

for appropriate MAPSTAT programs.

2. Specifying Codebook Variable Description Files -

If the input to the program is raw data (i.e. it is not one of

the procedures which input a prior CORREL matrix), then the

program will ask for a codebook file. The codebook file contains

three items of input for each variable in the data file (1) the

column number, (2) a variable name of eight characters, and (3) a

missing value code for missing values. Again, I repeat, one

line must be provided for each variable in the data file (whether

it is used in this particular analysis or not). All three items

must be provided for each variable on a new line and separated by

blanks. For the sample given above in the description of data

file we might have:

1 INCOME -1E37

2 EDUCTION -1E37

3 RESIDNCE -1E37

4 AGE -9

Note that eight spaces must be allowed for variable names, leave

blanks if necessary to fill out the string. Note also that a

missing value code must be given for every variable. The example

above used MAPSTAT's default value of -1E37 for missing data in

the first three variables and a value -9 for missing in the

fourth variable. In the sample data set -9 was actually used to

indicate missing values while the first three variables had no

missing values. When no missing values exist the value -1E37 is

simply identical to the default missing value used by MAPSTAT

when it generates new variables in TRANSFRM, etc. This or

another equally implausible value may be given in the codebook.

For normal usage you should construct such a codebook file in the

same drive and directory as the data file and enter the name in

response to the prompt for a codebook file name.

Alternatively, if the user specified 'none' in answer to the

codebook file query, variable names will default to variable

numbers and the default missing value will be assumed. This is

not a recommended option if you will return to your output

sometime in the future. It is, however, convenient for quick

oneshot runs on small data files.

3. Variable Column Identification -

After file names the programs will typically request the number

of variables in the data file and then the number of variables to

be used in the present run. For example, a DESCRPT run might be

run on a file containing lines for 500 cases each with 12

variables, only 4 of which we desire to analyze in the present

run. The total number of variables would then be entered as 12

and the number for the present run as 4 in response to the

prompts:

How many variables in data file?

Number of variables to use in DESCRPT?

For each variable to be used the program will request information

on the column number of the variable (e.g. 1 for the first

variable, 2 for the second, etc.). These are column numbers in

the raw file not among the subset to be used. In the above

example, say the first, third, sixth, and eleventh of the 12

variables were to be used, the user would enter 1

3 6 12 as responses to the prompts:

Column number for variable 1?

Column number for variable 2?

Column number for variable 3?

Column number for variable 4?

4. Specification of Groups, Weights and Special variables -

Occasionally, the programs will ask you to identify one of the

variables for use in weighting data, grouping data, as a

dependent variable, etc. Again, reference is by original column

number of the input data set. For example, if the descriptives

in the example above were to be weighted by population which is

contained as the sixth variable, you would identify the weight as

column 6, it's position in the raw data file. All of the

variables used as weights, groups, etc., must have been included

in the original number of variables to use and selection of the

columns for the analysis. That is, it would not be possible to

specify, for example, column 4 as a weight since it has not been

specified in the variable list above. DESCRPT, for example will

provide the prompts:

Of these Column numbers which is weight (0=none)?

Of these Column numbers which is grouping (0=none)?

To which we may respond with either then column number of a prior

included variable or '0' if no weighting or grouping

(respectively) is to be done.

5. Hints on Further Documentation -

All other information necessary is prompted for with what I hope

are explicit prompts. If you have problems as to input queries,

or the interpretation of output, refer to a statistics book.

Some of the multivariate routines are recognizably influenced by

those in Fortran by Cooley and Lohnes in their Multivariate Data

Analysis book. The Kmeans clustering routine is found in almost

any book on cluster analysis. Some routines lifted from

numerical methods books, etc., have references in the source

code. The transformation options are relatively well elaborated

if you initially specify to input transformations for the CON:

file. Once you become familiar with the program you can input

transformations from files.

Because of the more difficult nature of TRANSFRM, a special

section on its usage is presented below.

6. Hints of Power Usage -

There are a number of features which the design philosophy of

MAPSTAT preclude. However, most of these features are readily

derivable through coupling TRANSFRM with the other subprograms.

For example, many regression packages output residuals from the

regression and plots of the standardized residuals, etc. MAPSTAT

does not force such a second pass through the data since it is

designed for large data sets without retention of the data in

memory. If the user desires such an analysis the residuals could

be readily computed using TRANSFRM and then plotted with PLOT.

Similarly, FACTOR produces score coefficients which could be used

to generate factor scores for further analysis, etc. Dummy

variables can be coded through use of the recoding facilities in

TRANSFRM and used to compute complicated general linear model

analyses of variance (e.g. GLM/ANOVA's) through REGRESS.

The list goes on, and on, and on. The more you know about

statistics and what you are doing the more you will find these

programs of use. At the same time, if you are a basic user you

will probably not require more than the basic output provided by

routines.

YOUR FIRST ENCOUNTER:

A recommended first experimentation is to begin with simple

descriptive statistics using program DESCRPT followed by

bivariate correlations using CORREL. Soon after this you should

attempt to learn the most difficult, and perhaps most useful,

program TRANSFRM for data transformations and sample selection.

Once you have mastered the TRANSFRM program all remaining

programs should come easily. As noted below, many of the

multivariate programs take a correlation matrix generated by

CORREL as input for further analysis. This allows one

correlation matrix to be generated for a large dataset and many

analyses to be computed without recomputing the correlations.

The CROSSTAB program for analysis of frequency tables is a

particularly useful program which will handle up to seven-way

tables and automatically generate all applicable statistics of

association.

PROGRAM LIMITATIONS:

The addition of codebooks and transformation files makes these

routines roughly competitive with other micro statistics

packages. Given you have received them free of cost and,

"omigosh," with the source code, they are extremely flexible and

useful tools for data analysis.

Both DESCRPT and CORREL now allow weighted data to be entered.

While the Spicer algorithm provides good accuracy on computations

in both these programs it is not as robust against weighted

data. The results are sufficient for most purposes but exercise

caution with heavily weighted data (you should keep your weights

in a reasonable scale range - e.g. if you are weighting by

population make sure you are weighting with something like 1.2 in

millions rather than using 1,200,000 as a weight - then you will

have little cause for concern.

While each of the programs handles virtually unlimited numbers of

cases, you should be cautious of any statistical computations

which require a statistical package to manipulate very large

sums. In the programs using the Spicer algorithm this problem is

minimized. However if any statistical package results in either

gigantic or minuscule numbers within the output take the time to

transform the scale of your variables to avoir such strains.

In addition, each MAPSTAT subprogram has some limits on the

number of variables which may be included in computations (not

the number in the data, only those included for analysis in a

particular run). The current settings (which may be modified)

are:

A. Data Transformation, Selection and Recoding - 100 variables incl created

B. Descriptive Statistics and Histograms - 100 variables

C. Two Dimensional Plots or Scattergrams - 30 variables

D. Hypothesis Testing on Variables or Subgroups- 2 variables

E. Correlation and Covariance Matrices - 50 variables

F. Partial Correlation and Covariance Matrices - 30 variables

G. Multiple Linear Regression - 30 variables

H. Factor Analysis with Orthogonal Rotations - 30 variables

I. N-Way Crosstabs and Categorical Association - 8 variables with up

to 25 codes for variables resulting in not more than

3500 cells in the table

J. Clustering by KMeans Algorithm - 10 variables with

max-min number of clusters to consider less than 25

(e.g. consider numbers of clusters between 95 and 80

so 95-80<=25)

K. Multiple Analysis of Variance - 30 variables

L. Convert Fixed Format to Free Format Mapstat File

- 255 variables

M. Sort a Mapstat File with DOS SORT.EXE - DOS SORT.EXE Required

These settings are easily altered at the beginning of any program

if you wish to recompile them. They have been limited to keep

the programs to they will run under 56K and so that users do not

make unreasonable demands on their own capacities and data.

It is easier to divide data into meaningful sets of variables and

work with them than to digest a correlation matrix of 500

variables. However, you are free to use and abuse the source

code as you please so long as you abide by the copyright above.

HARDWARE REQUIREMENTS AND RECOMPILING THE PROGRAMS:

MAP is written in version 2 (or 3) of Turbo Pascal (@Borland

Intl). It has been written to compile with less than 56k. If

you modify the programs and wish to recompile them you should be

familiar with the Borland compiler. First compile all *.PAS

files other than the main menu MAPSTAT.PAS using the 'cHain'

file option in the 'Options' menu of Turbo. For each .CHN file

which results make a note of the resulting code and data size.

Finally, compile MAPSTAT.PAS using the 'Com' option in the

'Options' menu of Turbo. Set the 'Code' and 'Data' segment sizes

to the largest recorded for any of the .CHN files. Failure to

follow these instructions will result in periodic program

crashes.

Rename all *.CHN files to the names given in the file

MAPSTAT.PAS. REMEMBER in MSDOS you must compile all of the .CHN

files first and keep track of the largest code and data segment

sizes, I think as of now Factor has the greatest code size and

Correl the greatest data size. These must be set with O and D

commands before compiling the main menu MAPSTAT.PAS.

Only several statements must be altered to run the programs on

CPM machines. Change HALT calls to BDOS(0) and try to compile.

As I recall only two or three other lines need to be changed out

of all the code herein. The initial versions of MAPSTAT were

written for a KayproII '83 and many such machines are currently

running MAPSTAT in countries both within and outside the States.

PLOT contains printer control codes for the EPSON MX80 in

procedure Openfiles, modify these codes to suit your printer and

recompile using the Turbo (@Borland Int'l) compiler if your

printer is not compatible with, or capable of emulating, the

Epson standard codes. If your printer is not compatible and you

do not have sufficient knowledge to recompile the programs you

may continue to use MAPSTAT and simply avoid the PLOT subprogram.

SUPPORT:

Users are encouraged to REPORT BUGS and make REQUESTS for future

versions. Do not release your own versions or modifications

using the copyrighted MAP or MAPSTAT logos - and abide by the

above copyright notice. No liabilities or guarantee of technical

support may be assumed given the cost free nature of the

programs. Telephone requests for support will not be responded

to, all questions and/or requests for assistance should be made

through the mail and addressed to the author. Responding to such

queries is an activity which my schedule demands be placed at a

very low priority. I will place priority on responding to clear

inquiries including printed output and self-addressed stamped

envelopes for reply.

If you choose to register your copy of MAPSTAT (no fee is

required), send one self addressed floppy disk mailer with

postage affixed and include a DSDD DOS FORMATTED disk in the

mailer. Include a note to tell me which version of the program

you have and where you obtained it. When a substantial new

release of MAPSTAT is available I will forward you a copy so long

as time and the number of users remains manageable. There are

currently 177 registered users of MAPSTAT in 21 countries around

the world.

SPECIAL SUPPLEMENT ON PROGRAM TRANSFRM:

In part because it is a very powerful utility, the data

transformation subprogram TRANSFRM is more complicated to use than

other routines in MAPSTAT. This is similar to other statistical

packages where data transformation languages are the most difficult

for the novice.

The transformation language in MAPSTAT is a RPN (reverse polish

notation) language similar to that in many scientific calculators

such as those make popular by Hewlett-Packard. If you are

experienced with RPN logic you will find the program easier to

master; if not, a small amount of perseverance will pay off.

TRANSFRM will prompt for a transformation file containing

statements to recode, compute, or otherwise transform data:

Name of the transformation file (or con:)?

The first few times you run TRANSFRM enter con: for console input

rather than attempting to create a transformation file. This

will then display a list of available transformations to the

screen:

*** TRANSFRM: DATA TRANSFORMATION ***

Valid Arithmetic Operators:

+ - +

Turbo Pascal Functions Supported:

ABS ARCTANN COS EXP FRAC INT

LN SIN SQR SQRT ROUND TRUNC

RANDOM

Nonstandard MAP functions supported:

CASEN IF IFS LAG MOD NORMAL

POW REC

Number Entry:

Leading minus allowed (not plus) number must be less than or

equal to 11 digits, e.g. .001 12 -.0000005 etc.

Note: no check of statements is provided until runtime. [n]

refers to the nth variable read, not the nth column.

Comments may follow transformations on the same line

except END statement. Functions must be UPPERCASE.

This menus gives the names of all transformation functions known

to MAPSTAT. For example, LN is the standard Turbo Pascal natural

log function, NORMAL is a non-standard function to return random

numbers with a normal (0,1) distribution, REC is a recode

function, etc. These functions are described in greater detail

below.

Upon pressing a key to continue, you will get the second menu of

the TRANSFRM program which explains statement syntax with some

short examples:

*** TRANSFRM: DATA TRANSFORMATION ***);

Data transformation statements are entered in RPN (reverse polish

notation) with blanks separating each operator, constant, or

variable. Statements are terminated by '=' to end the statement

and the variable number to receive the value. Variables are

referred to by column number in brackets '[n]'. New variables

created by transformations are added to the data file. Use

successive numbers for new variables (if you read four variables

the first you create should be '[5]' etc.) 'END' in the first

three columns will end input of transformations.

Examples:

To put the square toot of 3.2 times the first variable into the

first-

->3.2 [1] * SQRT = [1]

To create a new sixth variable as the natural logarithm of the

second divided by the fifth -

->[5] [2] LN / = [6]

To recode second variable if between 10 and 50 to value 3 -

->[2] 10 50 3 REC = [2]

A summary of available operators is displayed during entry.

If you are not familiar with RPN note the order of the examples

in these menu samples. In the first example the number 3.2 is

placed on a 'stack' of variables. Then, the value of the first

variable read in is placed on top of the value 3.2 on the stack.

When the operator '*' (multiply) is encountered it gets the

needed data from the stack. That is, it gets a value of variable

[1] then multiplies by the next element left on the stack, 3.2.

Finally, it places the result back on the stack in place of the

two elements it removed. When the next operator is encountered

SQRT (square root) it gets the needed data (in this case one

value) from the number placed on the stack last (by the result of

the multiply), takes the square root of the number and places it

back on the stack in place of that removed. Finally, the '='

operator says remove the last value placed on the stack and

assign it as the new value for variable [1].

Again, if you are not familiar with RPN, work your way through

the other two examples to see how it works and try experimenting

with a few simple transformations on a small test dataset before

relying on TRANSFRM.

A brief summary of how operators work with the stack will aid you

in writing TRANSFRM statements:

OPERATOR ACTION

Valid Arithmetic Operators:

+ Adds the last two values on the stack

- If not attached to the front of a negative

number (with no spaces) subtracts from the

last value on the stack the preceding value

on the stack

* Multiplies last two values on the stack

/ Divides last value on the stack by the

immediately preceding value on the stack

= Assigns current value of the stack to the

following variable.

Turbo Pascal Functions Supported:

ABS Replaces last number placed on the stack with

its absolute value

ARCTAN Replaces last number placed on the stack with

its arctangent function

COS Replaces last number placed on the stack with

its cosine function

EXP Replaces last number placed on the stack with

its natural exponent (i.e. e raised to that

power)

FRAC Replaces last number placed on the stack with

the fractional part of the number only (i.e.

'4.2 FRAC = [2]' will place .2 in variable 2

INT Replaces last number placed on the stack with

the greatest integer number less than or

equal to it

LN Replaces last number placed on the stack with

its natural logarithm

SIN Replaces last number placed on the stack with

its sine function

SQR Replaces last number placed on the stack with

its squared value

SQRT Replaces last number placed on the stack with

its square root

ROUND Replaces last number placed on the stack with

the value rounded to the nearest integer

TRUNC Effectively identical to INT above

RANDOM Places a uniformly distributed random number

between 0 and 1 on the stack

Nonstandard MAP Functions Supported:

CASEN Places the observation or case number of the

current case onto the stack

IF Operations on the same line continue only if

the top value on the stack is greater than or

equal to zero, i.e. '[2] IF = [3]' will

assign the value of variable 2 to variable 3

only if it is greater than or equal to zero,

'5 [2] - IF 1 = [3]' will assign the value 1

to variable 3 only if variable 2 is greater

than or equal to the value 5.

IFS The case will be included in the output data

file only if the last value placed on the

stack is greater than zero, i.e. '[2] IFS'

will select the subsample of cases with

variable 2 is greater than zero.

LAG Replaces the current value of the stack with

the similar value from the previous

observation, i.e. '[2] LAG = [3]' will set

variable 3 equal to the value of variable 2

lagged by one case

MOD Places the modulus of the last number placed

on the stack divided by the previous number

on the stack back onto the stack, i.e. '10

123 MOD = [3]' places the remainder 3 of

dividing 123 by 10 into variable 3

NORMAL Places a random number following a standard

normal distribution (with mean 0 and standard

deviation of 1) on the stack

POW Raises the last value placed on the stack to

the power of the immediately prior value

placed on the stack, i.e. '5 [2] POW = [3]'

will set the third variable to the fifth

power of the second

REC If the value placed 4 deep on the stack is

less than or equal to the value 2 deep and

also is greater than or equal to the value 3

deep then the last value on the stack is

returned to the stack, otherwise the value 4

deep is returned, i.e. '[5] 0 10 1 REC = [5]'

will recode variable five to the value 1 if

it is greater than or equal to 0 and less

than or equal to 10, otherwise it is left

unchanged.

Terminator:

END Signals the end of transformation statements

If you enter your transformations from the console, each line is

prompted for by a '->'. Enter the transformation statement

followed by a carriage return. To end transformations enter the

function 'END' as the first and last item on the line followed by

a carriage return i.e. '->END'. When you become proficient with

transformation statements, you may enter these statements in a

file (followed by comments) and simply give the name of this file

to TRANSFRM at the prompt for a transformations file discussed

above. A simple transformations file might look something like:

NORMAL [1] + = [3] set var 3 equal to var 1 plus random error

[2] 0 10 1 REC = [2] recode var 2 to 1 if 0<=var 2<=10

[2] 10 25 2 REC = [2] recode var 2 to 2 if 10<=var 2<=25

[2] 25 50 3 REC = [2] recode var 2 to 3 if 25<=var 2<=50

50 [2] - IF 4 = [2] recode var 2 to 4 if var 2 >=50

[2] .999 - IF 0 = [2] recode var 2 to 0 if var 2 < 1 (i.e.<=.999)

Copyright 1985, 86, 87, 88

Douglas L. Anderton

Department of Sociology

University of Chicago

1126 E. 59th Street

Chicago, IL 60637

These programs are released for distribution so long as 1) any

charges involved do not exceed costs of media and mailing, and 2)

no portion of programs is used for commercial resale without

written permission of the author.

INTRODUCTION:

MAPSTAT is a very serious multivariate statistical analysis

package capable of meeting 90% or more of most users' analytical

needs. The routines have, at this point, been well tested and

provide the most frequently used procedures of the relatively

expensive statistical packages without cost. Unlike any

commercial software package, Turbo Pascal (@Borland Int'l) source

code is included for modifications and elaborations at your own

risk.

If data are properly arranged (discussed below) MAPSTAT can

theoretically analyze an unlimited number of variables and cases.

It has been tested on data files containing over 200 variables

and 10,000 cases. It is highly recommended that you read the

entire documentation file before using or modifying MAPSTAT. It

is equally important that you have a knowledge of any statistical

procedure before you attempt to use it and interpret the results.

Fourteen subprograms are included in this seventh release of

MAPSTAT. All of the statistical programs are evoked from the

same common menu which is displayed when the command:

>MAPSTAT

is evoked from the DOS prompt. The currently available sub-

programs in MAPSTAT include:

1) DESCRPT - descriptive statistics and frequency histograms

2) CORREL - correlation and covariance matrices

3) REGRESS - multiple linear regression

4) CROSSTAB - n-way crosstabulation and association tests

5) TRANSFRM - data transformations

6) HYPOTHS - simple hypotheses test on means and variances

7) PARTIAL - partial correlation coefficients

8) FACTOR - principle axis factoring with rotations

9) CLUSTER - kmeans clustering program

10) PLOT - simple 2 dimensional plots

12) MANOVA - multiple dependent variable analysis of variance

13) FIXFREE - utility for fixed to free format file conversion

14) SORT - utility to sort mapstat files with DOS SORT

Features and limitations of these programs are discussed below.

DESIGN PHILOSOPHY:

First, MAP is written as a sequential case processor to avoid

memory resident storage and achieve the greatest speed possible.

This has several consequences: 1) the package contains powerful

statistical analysis programs without horrendous memory

requirements; 2) however, the cost arises in that for redundant

functions such as histograms, regression residuals, etc., the

package currently requires multiple passes at the data. Even for

large data sets the programs are sufficiently fast to make such

passes reasonable.

Major benefits of this strategy include the (A) ability to run

the package on a floppy disk system without a hard drive

(including many portable and laptop machines); (B) ability to run

the package with as little as 56K memory; (C) the analysis of a

virtually unlimited number of cases within these modest hardware

requirements; and (D) a blinding speed compared with many larger

and more cumbersome statistical packages. Unlike many the memory

mammoths, MAPSTAT is ideal for running under multitasking

systems (e.g. DESQVIEW, DOSAMATIC) in that it requires so little

memory that it leaves room for many other tasks to be running.

Even if you currently use other statistical packages for

analysis, I suspect you will find MAPSTAT more useful and

accessible for many of your routine computing tasks.

INPUT DATA REQUIREMENTS:

MAP expects to find your data in a free format with at least one

blank separating each variable and a new line at the end of each

line. All variables for each case must be on a single line, i.e.

new lines separate records. It will not accept alphanumeric

data. Programs assume all data transformation has been performed

(e.g. CROSSTAB expects a finite number of values, not necessarily

integer value). Thus, it is very desirable to master the more

difficult TRANSFRM transformation program before making full use

of MAPSTAT. These are the only data requirements.

As with all statistical packages on all computers, extreme data

values are less precise. For example, means on variable values

such as 0.000001, 0.000002, 0.000003 or 1E+10, 2E+10, 3E+10 are

less accurate than the equivalent computations on 1, 2, and 3.

Thus, after an initial DESCRIPT run accuracy will be improved in

subsequent analyses if such values are transformed with TRANSFRM

by a simple scale shift to more reasonable ranges (e.g. in the

above examples multiply by 1,000,000 or divide by 1E+09).

A simple data file of four variables and five cases might then

look like:

100 50 90 -9

110 65 91.1 10

104 72 92.63 -9

107 70 92 11

97 36 99 14

Note there is no requirement that variables be aligned, only that

they be separated by a space with one case (or record) per line.

Of course, we will want to be able to identify variables by an

alphanumeric name, e.g. INCOME, EDUCTION, RESIDNCE, AGE. We will

also want to be able to identify the codes for each variable

which we use to indicate that the value for that variable is

missing, e.g. due to non-response on a survey item or non-

applicable items for this case. Codebook files containing

variable names and missing values are thus provided for.

Alternatively, if the use prefers, this data may be entered

interactively from the keyboard. For further information on

specifying variable names and missing values, see the details

instructions on 'running the programs' below.

RUNNING THE PROGRAMS:

1. Specifying Data Input and Output Files -

After invoking the programs they will ask for the name of an

input data file (or a file created from a prior MAP run - for

example, the output of CORREL is used by REGRESS), and the name

of an output file. For printer output specify the filename as

LST: and for screen output specify CON:. (An exception is

TRANSFRM, which uses buffered output routines will accept LST:

and CON: but will send output to LIST.TMP and CONSOLE.TMP disk

files respectively.) To send output to a disk file or obtain

input from a disk file simply enter the name of the file. This

file must reside on the current drive in the current

subdirectory.

For example, invoking MAPSTAT you should see the main menu:

***** MAPSTAT v1.6 MENU *****

(Copyright 1985,6,7 D. Anderton)

A. Data Transformation, Selection and Recoding

B. Descriptive Statistics and Histograms

C. Two Dimensional Plots or Scattergrams

D. Hypothesis Testing on Variables or Subgroups

E. Correlation and Covariance Matrices

F. Partial Correlation and Covariance Matrices

G. Multiple Linear Regression

H. Factor Analysis with Orthogonal Rotations

I. N-Way Crosstabs and Categorical Association

J. Clustering by KMeans Algorithm

K. Multiple Analysis of Variance

L. Convert Fixed Format to Free Format Mapstat File

M. Sort a Mapstat File with DOS SORT.EXE

X. Exit to OpSys

Enter Selection:

After selecting 'B' for descriptive statistics MAPSTAT will

display:

*** DESCRPT: DESCRIPTIVE STATISTICS ***

Name of the data file?

You must now enter the name of a data file on the current disk

and subdirectory, e.g. 'MYDAT.DAT'. If MAPSTAT cannot find this

file an error message will be written and you will be ejected

from the program. Otherwise you will see:

Name of the output file?

You may now enter either (a) the name of a new file to be created

to contain output on the current disk and subdirectory, (b)

'CON:' (no quotations) to send output to the screen, or (c)

'LST:' (again without quotations) to send output to the current

printer device.

If you do not make a drastic mistake (e.g. you have room for the

new disk file or the printer is turned on) you will then see the

prompt:

Name of the codebook file (or NONE)?

for appropriate MAPSTAT programs.

2. Specifying Codebook Variable Description Files -

If the input to the program is raw data (i.e. it is not one of

the procedures which input a prior CORREL matrix), then the

program will ask for a codebook file. The codebook file contains

three items of input for each variable in the data file (1) the

column number, (2) a variable name of eight characters, and (3) a

missing value code for missing values. Again, I repeat, one

line must be provided for each variable in the data file (whether

it is used in this particular analysis or not). All three items

must be provided for each variable on a new line and separated by

blanks. For the sample given above in the description of data

file we might have:

1 INCOME -1E37

2 EDUCTION -1E37

3 RESIDNCE -1E37

4 AGE -9

Note that eight spaces must be allowed for variable names, leave

blanks if necessary to fill out the string. Note also that a

missing value code must be given for every variable. The example

above used MAPSTAT's default value of -1E37 for missing data in

the first three variables and a value -9 for missing in the

fourth variable. In the sample data set -9 was actually used to

indicate missing values while the first three variables had no

missing values. When no missing values exist the value -1E37 is

simply identical to the default missing value used by MAPSTAT

when it generates new variables in TRANSFRM, etc. This or

another equally implausible value may be given in the codebook.

For normal usage you should construct such a codebook file in the

same drive and directory as the data file and enter the name in

response to the prompt for a codebook file name.

Alternatively, if the user specified 'none' in answer to the

codebook file query, variable names will default to variable

numbers and the default missing value will be assumed. This is

not a recommended option if you will return to your output

sometime in the future. It is, however, convenient for quick

oneshot runs on small data files.

3. Variable Column Identification -

After file names the programs will typically request the number

of variables in the data file and then the number of variables to

be used in the present run. For example, a DESCRPT run might be

run on a file containing lines for 500 cases each with 12

variables, only 4 of which we desire to analyze in the present

run. The total number of variables would then be entered as 12

and the number for the present run as 4 in response to the

prompts:

How many variables in data file?

Number of variables to use in DESCRPT?

For each variable to be used the program will request information

on the column number of the variable (e.g. 1 for the first

variable, 2 for the second, etc.). These are column numbers in

the raw file not among the subset to be used. In the above

example, say the first, third, sixth, and eleventh of the 12

variables were to be used, the user would enter 1

3

Column number for variable 1?

Column number for variable 2?

Column number for variable 3?

Column number for variable 4?

4. Specification of Groups, Weights and Special variables -

Occasionally, the programs will ask you to identify one of the

variables for use in weighting data, grouping data, as a

dependent variable, etc. Again, reference is by original column

number of the input data set. For example, if the descriptives

in the example above were to be weighted by population which is

contained as the sixth variable, you would identify the weight as

column 6, it's position in the raw data file. All of the

variables used as weights, groups, etc., must have been included

in the original number of variables to use and selection of the

columns for the analysis. That is, it would not be possible to

specify, for example, column 4 as a weight since it has not been

specified in the variable list above. DESCRPT, for example will

provide the prompts:

Of these Column numbers which is weight (0=none)?

Of these Column numbers which is grouping (0=none)?

To which we may respond with either then column number of a prior

included variable or '0' if no weighting or grouping

(respectively) is to be done.

5. Hints on Further Documentation -

All other information necessary is prompted for with what I hope

are explicit prompts. If you have problems as to input queries,

or the interpretation of output, refer to a statistics book.

Some of the multivariate routines are recognizably influenced by

those in Fortran by Cooley and Lohnes in their Multivariate Data

Analysis book. The Kmeans clustering routine is found in almost

any book on cluster analysis. Some routines lifted from

numerical methods books, etc., have references in the source

code. The transformation options are relatively well elaborated

if you initially specify to input transformations for the CON:

file. Once you become familiar with the program you can input

transformations from files.

Because of the more difficult nature of TRANSFRM, a special

section on its usage is presented below.

6. Hints of Power Usage -

There are a number of features which the design philosophy of

MAPSTAT preclude. However, most of these features are readily

derivable through coupling TRANSFRM with the other subprograms.

For example, many regression packages output residuals from the

regression and plots of the standardized residuals, etc. MAPSTAT

does not force such a second pass through the data since it is

designed for large data sets without retention of the data in

memory. If the user desires such an analysis the residuals could

be readily computed using TRANSFRM and then plotted with PLOT.

Similarly, FACTOR produces score coefficients which could be used

to generate factor scores for further analysis, etc. Dummy

variables can be coded through use of the recoding facilities in

TRANSFRM and used to compute complicated general linear model

analyses of variance (e.g. GLM/ANOVA's) through REGRESS.

The list goes on, and on, and on. The more you know about

statistics and what you are doing the more you will find these

programs of use. At the same time, if you are a basic user you

will probably not require more than the basic output provided by

routines.

YOUR FIRST ENCOUNTER:

A recommended first experimentation is to begin with simple

descriptive statistics using program DESCRPT followed by

bivariate correlations using CORREL. Soon after this you should

attempt to learn the most difficult, and perhaps most useful,

program TRANSFRM for data transformations and sample selection.

Once you have mastered the TRANSFRM program all remaining

programs should come easily. As noted below, many of the

multivariate programs take a correlation matrix generated by

CORREL as input for further analysis. This allows one

correlation matrix to be generated for a large dataset and many

analyses to be computed without recomputing the correlations.

The CROSSTAB program for analysis of frequency tables is a

particularly useful program which will handle up to seven-way

tables and automatically generate all applicable statistics of

association.

PROGRAM LIMITATIONS:

The addition of codebooks and transformation files makes these

routines roughly competitive with other micro statistics

packages. Given you have received them free of cost and,

"omigosh," with the source code, they are extremely flexible and

useful tools for data analysis.

Both DESCRPT and CORREL now allow weighted data to be entered.

While the Spicer algorithm provides good accuracy on computations

in both these programs it is not as robust against weighted

data. The results are sufficient for most purposes but exercise

caution with heavily weighted data (you should keep your weights

in a reasonable scale range - e.g. if you are weighting by

population make sure you are weighting with something like 1.2 in

millions rather than using 1,200,000 as a weight - then you will

have little cause for concern.

While each of the programs handles virtually unlimited numbers of

cases, you should be cautious of any statistical computations

which require a statistical package to manipulate very large

sums. In the programs using the Spicer algorithm this problem is

minimized. However if any statistical package results in either

gigantic or minuscule numbers within the output take the time to

transform the scale of your variables to avoir such strains.

In addition, each MAPSTAT subprogram has some limits on the

number of variables which may be included in computations (not

the number in the data, only those included for analysis in a

particular run). The current settings (which may be modified)

are:

A. Data Transformation, Selection and Recoding - 100 variables incl created

B. Descriptive Statistics and Histograms - 100 variables

C. Two Dimensional Plots or Scattergrams - 30 variables

D. Hypothesis Testing on Variables or Subgroups- 2 variables

E. Correlation and Covariance Matrices - 50 variables

F. Partial Correlation and Covariance Matrices - 30 variables

G. Multiple Linear Regression - 30 variables

H. Factor Analysis with Orthogonal Rotations - 30 variables

I. N-Way Crosstabs and Categorical Association - 8 variables with up

to 25 codes for variables resulting in not more than

3500 cells in the table

J. Clustering by KMeans Algorithm - 10 variables with

max-min number of clusters to consider less than 25

(e.g. consider numbers of clusters between 95 and 80

so 95-80<=25)

K. Multiple Analysis of Variance - 30 variables

L. Convert Fixed Format to Free Format Mapstat File

- 255 variables

M. Sort a Mapstat File with DOS SORT.EXE - DOS SORT.EXE Required

These settings are easily altered at the beginning of any program

if you wish to recompile them. They have been limited to keep

the programs to they will run under 56K and so that users do not

make unreasonable demands on their own capacities and data.

It is easier to divide data into meaningful sets of variables and

work with them than to digest a correlation matrix of 500

variables. However, you are free to use and abuse the source

code as you please so long as you abide by the copyright above.

HARDWARE REQUIREMENTS AND RECOMPILING THE PROGRAMS:

MAP is written in version 2 (or 3) of Turbo Pascal (@Borland

Intl). It has been written to compile with less than 56k. If

you modify the programs and wish to recompile them you should be

familiar with the Borland compiler. First compile all *.PAS

files other than the main menu MAPSTAT.PAS using the 'cHain'

file option in the 'Options' menu of Turbo. For each .CHN file

which results make a note of the resulting code and data size.

Finally, compile MAPSTAT.PAS using the 'Com' option in the

'Options' menu of Turbo. Set the 'Code' and 'Data' segment sizes

to the largest recorded for any of the .CHN files. Failure to

follow these instructions will result in periodic program

crashes.

Rename all *.CHN files to the names given in the file

MAPSTAT.PAS. REMEMBER in MSDOS you must compile all of the .CHN

files first and keep track of the largest code and data segment

sizes, I think as of now Factor has the greatest code size and

Correl the greatest data size. These must be set with O and D

commands before compiling the main menu MAPSTAT.PAS.

Only several statements must be altered to run the programs on

CPM machines. Change HALT calls to BDOS(0) and try to compile.

As I recall only two or three other lines need to be changed out

of all the code herein. The initial versions of MAPSTAT were

written for a KayproII '83 and many such machines are currently

running MAPSTAT in countries both within and outside the States.

PLOT contains printer control codes for the EPSON MX80 in

procedure Openfiles, modify these codes to suit your printer and

recompile using the Turbo (@Borland Int'l) compiler if your

printer is not compatible with, or capable of emulating, the

Epson standard codes. If your printer is not compatible and you

do not have sufficient knowledge to recompile the programs you

may continue to use MAPSTAT and simply avoid the PLOT subprogram.

SUPPORT:

Users are encouraged to REPORT BUGS and make REQUESTS for future

versions. Do not release your own versions or modifications

using the copyrighted MAP or MAPSTAT logos - and abide by the

above copyright notice. No liabilities or guarantee of technical

support may be assumed given the cost free nature of the

programs. Telephone requests for support will not be responded

to, all questions and/or requests for assistance should be made

through the mail and addressed to the author. Responding to such

queries is an activity which my schedule demands be placed at a

very low priority. I will place priority on responding to clear

inquiries including printed output and self-addressed stamped

envelopes for reply.

If you choose to register your copy of MAPSTAT (no fee is

required), send one self addressed floppy disk mailer with

postage affixed and include a DSDD DOS FORMATTED disk in the

mailer. Include a note to tell me which version of the program

you have and where you obtained it. When a substantial new

release of MAPSTAT is available I will forward you a copy so long

as time and the number of users remains manageable. There are

currently 177 registered users of MAPSTAT in 21 countries around

the world.

SPECIAL SUPPLEMENT ON PROGRAM TRANSFRM:

In part because it is a very powerful utility, the data

transformation subprogram TRANSFRM is more complicated to use than

other routines in MAPSTAT. This is similar to other statistical

packages where data transformation languages are the most difficult

for the novice.

The transformation language in MAPSTAT is a RPN (reverse polish

notation) language similar to that in many scientific calculators

such as those make popular by Hewlett-Packard. If you are

experienced with RPN logic you will find the program easier to

master; if not, a small amount of perseverance will pay off.

TRANSFRM will prompt for a transformation file containing

statements to recode, compute, or otherwise transform data:

Name of the transformation file (or con:)?

The first few times you run TRANSFRM enter con: for console input

rather than attempting to create a transformation file. This

will then display a list of available transformations to the

screen:

*** TRANSFRM: DATA TRANSFORMATION ***

Valid Arithmetic Operators:

+ - +

Turbo Pascal Functions Supported:

ABS ARCTANN COS EXP FRAC INT

LN SIN SQR SQRT ROUND TRUNC

RANDOM

Nonstandard MAP functions supported:

CASEN IF IFS LAG MOD NORMAL

POW REC

Number Entry:

Leading minus allowed (not plus) number must be less than or

equal to 11 digits, e.g. .001 12 -.0000005 etc.

Note: no check of statements is provided until runtime. [n]

refers to the nth variable read, not the nth column.

Comments may follow transformations on the same line

except END statement. Functions must be UPPERCASE.

This menus gives the names of all transformation functions known

to MAPSTAT. For example, LN is the standard Turbo Pascal natural

log function, NORMAL is a non-standard function to return random

numbers with a normal (0,1) distribution, REC is a recode

function, etc. These functions are described in greater detail

below.

Upon pressing a key to continue, you will get the second menu of

the TRANSFRM program which explains statement syntax with some

short examples:

*** TRANSFRM: DATA TRANSFORMATION ***);

Data transformation statements are entered in RPN (reverse polish

notation) with blanks separating each operator, constant, or

variable. Statements are terminated by '=' to end the statement

and the variable number to receive the value. Variables are

referred to by column number in brackets '[n]'. New variables

created by transformations are added to the data file. Use

successive numbers for new variables (if you read four variables

the first you create should be '[5]' etc.) 'END' in the first

three columns will end input of transformations.

Examples:

To put the square toot of 3.2 times the first variable into the

first-

->3.2 [1] * SQRT = [1]

To create a new sixth variable as the natural logarithm of the

second divided by the fifth -

->[5] [2] LN / = [6]

To recode second variable if between 10 and 50 to value 3 -

->[2] 10 50 3 REC = [2]

A summary of available operators is displayed during entry.

If you are not familiar with RPN note the order of the examples

in these menu samples. In the first example the number 3.2 is

placed on a 'stack' of variables. Then, the value of the first

variable read in is placed on top of the value 3.2 on the stack.

When the operator '*' (multiply) is encountered it gets the

needed data from the stack. That is, it gets a value of variable

[1] then multiplies by the next element left on the stack, 3.2.

Finally, it places the result back on the stack in place of the

two elements it removed. When the next operator is encountered

SQRT (square root) it gets the needed data (in this case one

value) from the number placed on the stack last (by the result of

the multiply), takes the square root of the number and places it

back on the stack in place of that removed. Finally, the '='

operator says remove the last value placed on the stack and

assign it as the new value for variable [1].

Again, if you are not familiar with RPN, work your way through

the other two examples to see how it works and try experimenting

with a few simple transformations on a small test dataset before

relying on TRANSFRM.

A brief summary of how operators work with the stack will aid you

in writing TRANSFRM statements:

OPERATOR ACTION

Valid Arithmetic Operators:

+ Adds the last two values on the stack

- If not attached to the front of a negative

number (with no spaces) subtracts from the

last value on the stack the preceding value

on the stack

* Multiplies last two values on the stack

/ Divides last value on the stack by the

immediately preceding value on the stack

= Assigns current value of the stack to the

following variable.

Turbo Pascal Functions Supported:

ABS Replaces last number placed on the stack with

its absolute value

ARCTAN Replaces last number placed on the stack with

its arctangent function

COS Replaces last number placed on the stack with

its cosine function

EXP Replaces last number placed on the stack with

its natural exponent (i.e. e raised to that

power)

FRAC Replaces last number placed on the stack with

the fractional part of the number only (i.e.

'4.2 FRAC = [2]' will place .2 in variable 2

INT Replaces last number placed on the stack with

the greatest integer number less than or

equal to it

LN Replaces last number placed on the stack with

its natural logarithm

SIN Replaces last number placed on the stack with

its sine function

SQR Replaces last number placed on the stack with

its squared value

SQRT Replaces last number placed on the stack with

its square root

ROUND Replaces last number placed on the stack with

the value rounded to the nearest integer

TRUNC Effectively identical to INT above

RANDOM Places a uniformly distributed random number

between 0 and 1 on the stack

Nonstandard MAP Functions Supported:

CASEN Places the observation or case number of the

current case onto the stack

IF Operations on the same line continue only if

the top value on the stack is greater than or

equal to zero, i.e. '[2] IF = [3]' will

assign the value of variable 2 to variable 3

only if it is greater than or equal to zero,

'5 [2] - IF 1 = [3]' will assign the value 1

to variable 3 only if variable 2 is greater

than or equal to the value 5.

IFS The case will be included in the output data

file only if the last value placed on the

stack is greater than zero, i.e. '[2] IFS'

will select the subsample of cases with

variable 2 is greater than zero.

LAG Replaces the current value of the stack with

the similar value from the previous

observation, i.e. '[2] LAG = [3]' will set

variable 3 equal to the value of variable 2

lagged by one case

MOD Places the modulus of the last number placed

on the stack divided by the previous number

on the stack back onto the stack, i.e. '10

123 MOD = [3]' places the remainder 3 of

dividing 123 by 10 into variable 3

NORMAL Places a random number following a standard

normal distribution (with mean 0 and standard

deviation of 1) on the stack

POW Raises the last value placed on the stack to

the power of the immediately prior value

placed on the stack, i.e. '5 [2] POW = [3]'

will set the third variable to the fifth

power of the second

REC If the value placed 4 deep on the stack is

less than or equal to the value 2 deep and

also is greater than or equal to the value 3

deep then the last value on the stack is

returned to the stack, otherwise the value 4

deep is returned, i.e. '[5] 0 10 1 REC = [5]'

will recode variable five to the value 1 if

it is greater than or equal to 0 and less

than or equal to 10, otherwise it is left

unchanged.

Terminator:

END Signals the end of transformation statements

If you enter your transformations from the console, each line is

prompted for by a '->'. Enter the transformation statement

followed by a carriage return. To end transformations enter the

function 'END' as the first and last item on the line followed by

a carriage return i.e. '->END'. When you become proficient with

transformation statements, you may enter these statements in a

file (followed by comments) and simply give the name of this file

to TRANSFRM at the prompt for a transformations file discussed

above. A simple transformations file might look something like:

NORMAL [1] + = [3] set var 3 equal to var 1 plus random error

[2] 0 10 1 REC = [2] recode var 2 to 1 if 0<=var 2<=10

[2] 10 25 2 REC = [2] recode var 2 to 2 if 10<=var 2<=25

[2] 25 50 3 REC = [2] recode var 2 to 3 if 25<=var 2<=50

50 [2] - IF 4 = [2] recode var 2 to 4 if var 2 >=50

[2] .999 - IF 0 = [2] recode var 2 to 0 if var 2 < 1 (i.e.<=.999)

December 29, 2017
Add comments