David C. Young
For use with MSDOS computers.
Copyright 1991 David C. Young
This program finds both coefficients and exponents for a curve of best
fit by the least squares definition of best fit.
The program contains functions for creating and revising it's data
input files. There is also a function titled "Display any ASCII file", which
can be used to display the output files.
Each data field is read from a data file. Each data point in the file
is accompanied by a key piece of information, which uniquely identifies that
The program will read in any number of data files and only use those
data points containing information for every field.
STARTING THE PROGRAM:
Start the program by changing to the directory or drive containing the
program and typing CURVE.
The program can be installed on a hard drive by using the DOS copy
command to copy all of the files to the desired location on the hard drive.
USING THE PROGRAM:
The basic process of using this program is:
1. Create data files. The data in the data files is keyed.
An example of using keys would be if you wanted to find an equation to
predict the price of your favorite stock, you might want to enter various
items that could be used to predict, such as Gross National Product, or
the price of tea in China. The price of your stock would also be in a file
keyed by month. You would key these values by the date that they are for.
Thus you would have a file with the price of tea in China during various
months. The month would be the key value. Once you had several files, all
keyed by month, you might guess that the price of your stock varies in the
same way as the GNP times the price of tea in China.
If you don't have all of the necessary data (due to irregular
delivery of the Bejing Times), the program will still work. It will just
only take into account those months, for which you have a complete set of
data, without bothering you with the details.
2. Type in an equation. For the example above, the equation
might be S=A*G*T+B or S=A*S^C+B*T^D+E where S is the stock price, G is
the GNP, T is the price of tea in China and A through E are numbers that
you want the computer to calculate.
3. Specify which letters represent known data points to be
read from data files. S, G & T in our example are read from files.
C & D are computed exponents. A, B & E are computed coeficients.
4. Input file names for summary and analysis files.
A summary file contains the stuff put on screen at the end of the calculation,
the values for computed numbers and the average & maximum deviations (measures
of how well the equation fits the data). An analysis file contains the
key values, actual values (for your stock) and computed values. Analysis
files can be read into many popular spread sheets and graphing programs,
so that you can graph the data to better see how well the calculation
5. For computed exponents, input starting values and the
number of decimal places to calculate.
The equation to be input consists of single upper or lower case
letters to represent both known pieces of data and numbers to be calculated,
along with an equals sign "=" and four mathematical operators:
"+" - addition
"*" - multiplication
"/" - division
"^" - exponentiation
Some examples of valid equations are:
a = b * c + d
a * b^c + d = e
a = b * C^d * E^e + f * g^h + i
DATA FILE FORMAT:
The data file is an ASCII file consisting of a header and up to 65000
key and data values.
The numbers identifying what type of key and data values are present
are as follows:
1 - string
2 - real
3 - character
4 - integer
The key field can be of any of these types. The data field can be
used for curve fitting only if it is of type real or integer. Integer data
fields are treated by curve fitter as real values.
The data file format is:
SUMMARY FILE FORMAT:
The summary file is an ASCII file, which contains exactly the same
information, which is displayed on the screen at the end of a calculation.
The summary file format is:
Average Deviation =
Maximum Deviation =
Known data points
Known taken from
(same as line above for each data file used)
Computed Exponents (if any)
(same as line above for each exponent computed)
(same as line above for each coefficient computed)
ANALYSIS FILE FORMAT:
The analysis file is an ASCII file containing information for
comparing the known values to the calculated values for the property being
The analysis file format is:
LIMITATIONS OF THE PROGRAM:
Some known deficiencies with the program that I hope to get around to
fixing in the future are.
1. Parenthesis are not allowed in the equations.
2. Constants are not allowed, although you can create a data
file with the same number for every data point if necessary.
3. The list of functions which are not supported is massive.
It starts with the trigonometric functions.
4. The exponents that are found represent local minimums
only, so pick your initial values wisely, or try a few that you think might
be in the right range.
However, to the best of my knowledge, for what this program does do,
it does it correctly.
ABOUT CURVE FITTING:
The program generally finds the coefficients of best fit for each term
in an equation and finds the exponent of best fit for any variable desired.
The exponents are gotten through a multivarible simplex minimization routine.
The coefficients are gotten at every step of the way through the matrix
algebra least squares method (mathematically equivalent to linear regression).
If your theory shows that an equation should have a particular form,
it is best to work with that form. However, if you don't know what form to
use and want to fit your data by a brute force method, here are some
1. Have the program find a coefficient for every term in the
2. Have the program find the constant term.
3. The generic most powerful fitting is one in which every
term consists of a fitted coefficient and a single variable with a fitted
exponent .. and then the constant term is added on. This is often the best
fit because the most parameters are being fitted.
4. Remember that you can fit anything if you are fitting more
parameters than there are items in your data set, however this fit may be
useless when applied to new data points falling between or past the original
data points. For best results always make sure that you have considerably
more data points than the number of parameters that are being fitted,
otherwise you may be fooling yourself.
Version 1.1 of this program is being offered FREE to the world with no
guarantees expressed or implied ... etc, etc, etc. Version 1.1 may be freely
distributed to anyone and everyone, as long as these instructions are
kept with it.
I have completed version 2.1 of this program. The new features
present in version 2.1 are:
1. It allows multiple character identifiers.
2. It allows user inserted parenthesis.
3. It allows the use of numerical constants.
4. It has twenty new functions covering trigonometry,
logarithms and a few others items.
If you would like to buy version 2.1, send $15.00 to:
Okemos, MI 48864
Please, specify what size and density of disk for me to send. I will send
you one disk containing version 2.1 (or whatever the most recent version is)
containing the program and documentation files. This price does not include
any updates past the version that I send you, but I will try to keep you
informed of future versions. Please, send cash, check or money orders. I
cannot accept credit card orders. Version 2.1 is sold as is with no
guarantees expressed or implied.
If you have any questions, you can also reach me by e-mail at:
internet: [email protected]
bitnet: [email protected]