11

CHAPTER 2 - DATA

Before you start using data, you need to know what data looks
like. It is not necessary for the data to have a name. For
instance, the following definition is perfectly legal:

db "Mary had a little lamb."

Unfortunately, the assembler has no way to find it. The normal
thing is to start the line with a name, and then give the
definition of the data. The assembler processes the data line by
line, so a definition on one line does not carry over to another
line. We can have:

poem db "Mary had a little lamb,"

Notice that names for data don't have colons after them. What if
we wanted to continue the poem? It isn't going to fit all on one
line. No problem. All we need to do is define the following lines
without a name.

poem db "Mary had a little lamb,"
db "It's fleas were white as snow,"
db "And everywhere that Mary went,"
db "She scratched and scratched and scratched."

The assembler still can't find lines 2-4, but starting at the
first byte of "poem", it can go all the way through the poem one
byte after the other. By the way, there are no carriage returns
in the poem right now. They will come later.

So we have the name part, the db part, and the data part. What is
that db anyway. It stands for Define Byte. Whenever you give the
name "poem" to the assembler, it knows that you want to deal with
the data one byte at a time. If you try working a word at a time,
you will get an assembler error. The legal definitions are:

DB define byte [ 1 byte ]
DW define word [ 2 bytes ]
DD define doubleword [ 2X2 bytes = 4 bytes ]
DQ define quadword [ 4X2 bytes = 8 bytes ]
DT define ten-byte [ 10 bytes ]
DF define farword [ 6 bytes - used for 80386 only ]

Every time you use one of these directives, the assembler
allocates the number of bytes in brackets for EACH variable. For
instance in:

db "Mary had a little lamb,"

each character inside the quotes is a variable. That's 23
variables X 1 byte = 23 bytes. In:

______________________

The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson

The PC Assembler Tutor 12
______________________

dq 0, 1, 2, 3, 4

each number is a variable. 5 variables X 8 bytes = 40 bytes.
Notice from these examples that you can have more than one
variable on a line but they all share the same defining type.
What do you do if you have an uninitialized variable, i.e. you
don't know its starting value? Easy as pie. Here's a four byte
variable:

some_data dd ?

The question mark lets the assembler know that you didn't forget
the number but rather you didn't know the number.

The commas are separators. When you write a comma, the assembler
expects another piece of data on the line. If it doesn't get the
number, it is an error. That means there can be no commas inside
a number.

dw 32,421

is two variables: 32 and 421.

What if you want to make an array? The assembler has a directive
for that too:

dw 150 dup ( 400 )

The 'dup' is for duplicate. This makes 150 two byte copies and
puts the number 400 in each one.

db 273 dup ( 'c' )

This makes 273 one byte copies and puts the letter 'c' in each
one.

dd 459 dup ( 1, 2, 3, 4, 5 )

This makes 459 copies of what is inside the parentheses. That
means ( 5 variables X 4 bytes ) X 459 for a total of 9180 bytes.
Starting from the beginning of the array, we will have the
sequence: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3,...

dq 20000 dup ( 455 )

This makes 20000 eight byte copies and causes an assembler error
because there is a limit of 65,536 bytes for the data and you
have used 160,000 bytes (20,000 X 8).

db 7 dup ( 'Mary had a little lamb,')

This makes 7 copies of 'Mary had ..' which is 23 bytes, for a
total of 161 bytes.

dw 39 dup ( 28 dup ( 0 ) )

Chapter 2 - Data 13
________________

The assembler even supports nesting, so you can make a
multi-dimensional array. This is a 39 X 28 array initialized to
zero. 39 copies of 28 two byte numbers is 2184 bytes.

The standard form for arrays is (1) first define the data type,
(2) then say how long the array is followed by the keyword "dup"
and (3) put the initial value inside the parentheses. What if you
don't know the initial value? Simple:

dw 347 dup ( ? )

The question mark lets the assembler know that you don't know.

DEFINING NUMBERS

What kinds of data can you have?

1. A single character inside single or double quotes:
'a' , "&" , '|'

2. A string inside single or double quotes:

Each character is stored as a byte, and the bytes are stored
consecutively. If the array starts at address 2743, 2743 = 'M',
2744 = 'a', 2745 = 'r', 2746 = 'y', 2747 = ' ', etc. As usual in
these instances, if you want a double quote inside a double
quoted string or a single quote inside a single quoted string,
you need to use a pair:

"Mary asked her fleas ""Why don't you join the circus?"""
'Mary asked her fleas "Why don''t you join the circus?"'

3. A decimal number. Decimal is the default:

27, 44, 641, 89

4. A hex number. A hex number must start with a number, so if the
highest digit is A - F, there must be a 0 in front.{1} b77h is

illegal, 0b77h is legal. All hex numbers must be followed by an
'h':

0a162H , 0329H , 0DDDh , 7h

5. An octal (base 8) number. An octal is followed either by the
____________________

1 When the assembler looks at something it needs to know
whether it is a name or a number. Is 'A7' a name or a hex number?
Is '3D' a name or a number? To solve this problem, all assemblers
and all compilers insist that -> if the first character is a
number, it's a number; if the first character is not a number, it
is not a number. That is why you can't start a variable name with
a number.

The PC Assembler Tutor 14
______________________

letter q or the letter o:

641q , 2345o , 1472o

6. A binary number. A binary number is followed by a b:

0100100b , 1b , 01001000111010b

Any of these types can be mixed on a line. For instance:

db "Mary had a little lamb," , 13 , 10

13 followed by 10 is CRLF, the PC signal for a carriage return. A
string in the C language ends with the number 0. If we wanted a C
string with CRLF, we would have:

db "Mary had a little lamb," , 13 , 10 , 0

Another mixed example:

dw 7 , 010010b , 0FFFFh , 037q

is dopey but legal.

You can also have an equation, as long it resolves to a number.
This calculation is done by the assembler, so the values of
variables are not allowed:

dw ( ( 19 * 7 * 25 ) + 6 ) / ( 9 + 7 )

is legal, but:

data1 dw 25
data2 dw 7
dw ( ( 19 * 7 * data1 ) + 6 ) / ( 9 + data2 )

is illegal. Everything must be a constant. Remember that when the
assembler starts calculating it might truncate the partial
answers, so don't get too fancy.

Chapter 2 - Data 15
________________

SUMMARY

The assembler works one line at a time. Each line with data must

DATA TYPES

DB define byte ( 1 byte )
DW define word ( 2 bytes )
DD define doubleword ( 2X2 bytes = 4 bytes )
DQ define quadword ( 4X2 bytes = 8 bytes )
DT define ten-byte ( 10 bytes )
DF define farword ( 6 bytes - used for 80386 only )

COMMON INTEGER TYPES

TYPE MAX SIGNED MAX UNSIGNED

byte -128/+127 255
word -32768/+32767 65535
doubleword -2147483648/+2147483647 4294967295

Note that the max. negative integer is 1 larger than the
max. positive integer.

POSSIBLE BASES FOR CONSTANTS

b binary data
o,q octal data
d decimal data (default)

ARRAY DEFINITIONS

d* num1 dup ( data1 )

Using the d* data type (db, dw, dd, dq, etc.) make num1 copies of
data1 (data1 may be either a single piece of data or a group of
data.)

MULTIPLE DATA ON ONE LINE

Different data elements on the same line are separated by commas.
All elements on the same line have the same data type.