Dec 232017
Info on digital sound formats.
File AUDIO.ZIP from The Programmer’s Corner in
Category Music and Digitized Voice
Info on digital sound formats.
File Name File Size Zip Size Zip Type
AUDIO.TXT 60867 22482 deflated

Download File AUDIO.ZIP Here

Contents of the AUDIO.TXT file

X-NEWS: pbs alt.binaries.sounds.d: 1624
Relay-Version: VMS News - V6.0-3 14/03/90 VAX/VMS V5.4; site
Newsgroups: alt.binaries.sounds.misc,alt.binaries.sounds.d,comp.dsp,news.answers
Subject: FAQ: Audio File Formats (version 2.6)
From: [email protected] (Guido van Rossum)
Date: 28 Sep 92 20:51:29 GMT
Reply-To: [email protected]
Sender: [email protected]
Followup-To: alt.binaries.sounds.d,comp.dsp
Expires: 26 Oct 92 20:51:24 GMT
Approved: [email protected]
Lines: 1408
Xref: pbs alt.binaries.sounds.misc:3298 alt.binaries.sounds.d:1624 comp.dsp:3048 news.answers:3028

Archive-name: audio-fmts/part1
Submitted-by: Guido van Rossum
Version: 2.6
Last-modified: 28-Sep-1992

FAQ: Audio File Formats (version 2.6)

Table of contents

Device characteristics
Popular sampling rates
Compression schemes
Current hardware
File formats
File conversions
Playing audio files on UNIX
Playing audio files on micros
The Sound Site Newsletter
Posting sounds


FTP access for non-internet sites
AIFF Format (Audio IFF)
The NeXT/Sun audio file format
IFF/8SVX Format
Playing sound on a PC
The EA-IFF-85 documentation
US Federal Standard 1016 availability
Creative Voice (VOC) file format
RIFF WAVE (.WAV) file format


This is version 2 of this FAQ, which I started in November 1991 under
the name "The audio formats guide". I bumped the major version number
since the Subject and Newsgroups headers have changed to make the
subject more informative and give the guide a wider audience. I also
added a Table of contents section at the top.

I am posting this about once a fortnight, either unchanged (just to
inform new readers), or updated (if I learn more or when new hardware
or software becomes popular). I post to alt.binaries.sounds.{misc,d}
and to comp.dsp, for maximal coverage of people interested in audio,
and to news.answers, for easy reference.

A companion posting with subject "Change to: ..." is occasionally
posted listing the diffs between a new version and the last. This is
not reposted, and it is suppressed when the diffs are bigger than the
new version.

Send updates, comments and questions to ; flames to

I'd like to thank everyone who sent me mail with updates for previous
versions. The list of names is really too long to list you all...

--Guido van Rossum, CWI, Amsterdam
"Lobster thermidor aux crevettes with a mornay sauce garnished with
truffle pate, brandy and a fried egg on top and spam"

Device characteristics

In this text, I will only use the term "sample" to refer to a single
output value from an A/D converter, i.e., a small integer number
(usually 8 or 16 bits).

Audio data is characterized by the following parameters, which
correspond to settings of the A/D converter when the data was
recorded. Naturally, the same settings must be used to play the data.

- sampling rate (in samples per second), e.g. 8000 or 44100

- number of bits per sample, e.g. 8 or 16

- number of channels (1 for mono, 2 for stereo, etc.)

Approximate sampling rates are often quoted in Hz or kHz ([kilo-]
Hertz), however, the politically correct term is samples per second
(samples/sec). Sampling rates are always measured per channel, so for
stereo data recorded at 8000 samples/sec, there are actually 16000
samples in a second. I will sometimes write 8 k as a shorthand for
8000 samples/sec.

Multi-channel samples are generally interleaved on a frame-by-frame
basis: if there are N channels, the data is a sequence of frames,
where each frame contains N samples, one from each channel. (Thus,
the sampling rate is really the number of *frames* per second.) For
stereo, the left channel usually comes first.

The specification of the number of bits for U-LAW (pronounced mu-law
-- the u really stands for the Greek letter mu) samples is somewhat
problematic. These samples are logarithmically encoded in 8 bits,
like a tiny floating point number; however, their dynamic range is
that of 14 bit linear data. Source for converting to/from U-LAW
(written by Jef Poskanzer) is distributed as part of the SOX package
mentioned below; it can easily be ripped apart to serve in other
applications. The official definition is the CCITT standard G.711.

(There exists another encoding similar to U-LAW, called A-LAW, which
is used as a European telephony standard. I don't know how it differs
from U-LAW. There is less support for it in UNIX workstations.)

Popular sampling rates

Some sampling rates are more popular than others, for various reasons.
Some recording hardware is restricted to (approximations of) some of
these rates, some playback hardware has direct support for some. The
popularity of divisors of common rates can be explained by the
simplicity of clock frequency dividing circuits :-).

Samples/sec Description

5500 One fourth of the Mac sampling rate (rarely seen).

7333 One third of the Mac sampling rate (rarely seen).

8000 Exactly 8000 samples/sec is a telephony standard that
goes together with U-LAW (and also A-LAW) encoding.
Some systems use an slightly different rate; in
particular, the NeXT workstation uses 8012.8210513,
apparently the rate used by Telco CODECs.

11 k Either 11025, a quarter of the CD sampling rate,
or half the Mac sampling rate (perhaps the most
popular rate on the Mac).

16000 Used by, e.g. the G.722 compression standard.

18.9 k CD-ROM/XA standard.

22 k Either 22050, half the CD sampling rate, or the Mac
rate; the latter is precisely 22254.545454545454 but
usually misquoted as 22000.

32000 Used in digital radio, NICAM (Nearly-Instantaneous
Companded Audio Multiplex [IBA/BREMA/BBC]) and other
TV work, at least in the UK; also long play DAT and
Japanese HDTV.

37.8 k CD-ROM/XA standard for higher quality.

44056 This weird rate is used by professional audio
equipment to fit an integral number of samples in a
video frame.

44100 The CD sampling rate. (Professional DAT also supports
this rate.)

48000 The DAT (Digital Audio Tape) sampling rate for
domestic use.

Files samples on SoundBlaster hardware have sampling rates that are
divisors of 1000000.

While professinal musicians disagree, most people don't have a problem
if recorded sound is played at a slightly different rate, say, 1-2%.
On the other hand, if recorded data is being fed into a playback
device in real time (say, over a network), even the smallest
difference in sampling rate can frustrate the buffering scheme used...

There may be an emerging tendency to standardize on only a few
sampling rates and encoding styles, even if the file formats may
differ. The suggested rates and styles are:

rate (samp/sec) style mono/stereo

8000 8-bit U-LAW mono
22050 8-bit linear unsigned mono and stereo
44100 16-bit linear signed mono and stereo

Compression schemes

Strange though it seems, audio data is remarkably hard to compress
effectively. For 8-bit data, a Huffman encoding of the deltas between
successive samples is relatively successful. For 16-bit data,
companies like Sony and Philips have spent millions to develop
proprietary schemes.

Public standards for voice compression are slowly gaining popularity,
e.g. CCITT G.721 and G.723 (ADPCM at 32 and 24 kbits/sec). (ADPCM ==
Adaptive Delta Pulse Code Modulation.) Free source code for a *fast*
32 kbits/sec ADPCM algorithm is available by ftp from as

There are also two US federal standards, 1016 (Code excited linear
prediction (CELP), 4800 bits/s) and 1015 (LPC-10E, 2400 bits/s). See
also the appendix for 1016.

(Note that U-LAW and silence detection can also be considered
compression schemes.)

Here's a note about audio codings by Van Jacobson :
Several people used the words "LPC" and "CELP" interchangably. They
are very different. An LPC (Linear Predictive Coding) coder fits
speech to a simple, analytic model of the vocal tract, then throws
away the speech & ships the parameters of the best-fit model. An LPC
decoder uses those parameters to generate synthetic speech that is
usually more-or-less similar to the original. The result is
intelligible but sounds like a machine is talking. A CELP (Code
Excited Linear Predictor) coder does the same LPC modeling but then
computes the errors between the original speech & the synthetic model
and transmits both model parameters and a very compressed
representation of the errors (the compressed representation is an
index into a 'code book' shared between coders & decoders -- this is
why it's called "Code Excited"). A CELP coder does much more work
than an LPC coder (usually about an order of magnitude more) but the
result is much higher quality speech: The FIPS-1016 CELP we're working
on is essentially the same quality as the 32Kb/s ADPCM coder but uses
only 4.8Kb/s (the same as the LPC coder).

Finally, the comp.compression FAQ has some text on the 6:1 audio
compression scheme used by MPEG (a video compression standard-to-be).
It's interesting to note that video compression reaches much higher
ratios (like 26:1). This FAQ is ftp'able from
[] in directory /pub/usenet/news.answers/compression-faq,
files part1 and part2.

Comp.compression also carries a regular posting "How to uncompress
anything" by David Lemson , which (tersely) hints on
which program you need to uncompress a file whose name ends in .
for almost any conceivable . Ftp'able from
( in the directory /doc/pcnet as the file compression.

Current hardware

I am aware of the following computer systems that can play back and
(sometimes) record audio data, with their characteristics. Note that
for most systems you can also buy "professional" sampling hardware,
which supports much better quality, e.g. >= 44.1 k 16 bits stereo.
The characteristics listed here are a rough estimate of the
capabilities of the basic hardware only (and even here I am on thin
ice, with systems becoming ever more powerful).

machine bits max sampling rate #output channels

Mac 8 22k 1
Apple IIgs 8 32k / >70k 8(st)
PC/Soundblaster v1 8 13k / 22k 1
PC/Soundblaster v2 8 15k / 44.1k 1
PC/PAS-16 16 44.1k ?(st)
Atari ST 8 22k 1
Atari STe,TT 8 50k 2
Amiga 8 ~29k 4(st)
Sun Sparc U-LAW 8k 1
Sun Sparcst. 10 16 48k 1(st)
NeXT U-LAW,8,16 44.1k 1(st)
SGI Indigo 8,16 48k 4(st)
Acorn Archimedes ~U-LAW ~180k 8(st)
Sony RISC-NEWS 8, 16 37.8k ?(st)
VAXstation 4000 U-LAW 8k 1
Tandy 1000/[TS]L 8-bit 22k 3

4(st) means "four voices, stereo"; sampling rates xx/yy are
different recording/playback rates.

All these machines can play back sound without additional hardware,
although the needed software is not always standard; only the Sun,
NeXT and SGI come with standard sampling hardware (the NeXT only
samples U-LAW at 8000 samples/sec from the built-in microphone port;
you need a separate board for other rates).

The new VAXstation 4000 series lets you PLAY audio (.au) files, and
the as-of-yet-unreleased package, DECsound, will let you do the

The SGI Personal IRIS 4D/30 and 4D/35 have the same capabilities as
the Indigo.

The new Apple Macs have more powerful audio hardware; the latest
models have built-in microphones.

Software exists for the PC that can play sound on its 1-bit speaker
using pulse width modulation (see appendix); the Soundblaster board
records at rates up to 13 k and plays back up to 22 k (weird
combination, but that's the way it is).

On the NeXT, the Motorola 56001 DSP chip is programmable and you can
(in principle) do what you want. The SGI uses the same DSP chip but
it can't be programmed by users -- SGI prefers to offer it as a shared
system resource to multiple applications, thus enabling developers to
program audio with their Audio Library and avoid code modifications
for execution on future machines with different audio hardware, i.e. a
different DSP.

The Amiga also has a 6-bit volume, which can be used to produce
something like a 14-bit output for each voice. The hardware can also
use one of each voice-pair to modulate the other in FM (period) or AM
(volume, 6-bits).

The Acorn Archimedes uses a variation on U-LAW with the bit order
reversed and the sign bit in bit 0. Being a 'minority' architecture,
Arc owners are quite adept at converting sound/image formats from
other machines, and it is unlikely that you'll ever encounter sound in
one of the Arc's own formats (there are several).

CD-I machines form a special category. The following formats are used:

- PCM 44.1 kHz standard CD format
- ADPCM - Addaptive Delta PCM
- Level A 37.8 kHz 8-bit
- Level B 37.8 kHz 4-bit
- Level C 18.9 kHz 4-bit

File formats

Historically, almost every type of machine used its own file format
for audio data, but some file formats are more generally applicable,
and in general it is possible to define conversions between almost any
pair of file formats -- sometimes losing information, however.

File formats are a separate issue from device characteristics. There
are two types of file formats: self-describing formats, where the
device parameters and encoding are made explicit in some form of
header, and "raw" formats, where the device parameters and encoding
are fixed.

Self-describing file formats generally define a family of data
encodings, where a header fields indicates the particular encoding
variant used. Headerless formats define a single encoding and usually
allows no variation in device parameters (except sometimes sampling
rate, which can be a pain to figure out other than by listening to the

The header of self-describing formats contains the parameters of the
sampling device and sometimes other information (e.g. a
human-readable description of the sound, or a copyright notice). Most
headers begin with a simple "magic word". (Some formats do not simply
define a header format, but may contain chunks of data intermingled
with chunks of encoding info.) The data encoding defines how the
actual samples are stored in the file, e.g. signed or unsigned, as
bytes or short integers, in little-endian or big-endian byte order,
etc. Strictly spoken, channel interleaving is also part of the
encoding, although so far I have seen little variation in this area.

Some file formats apply some kind of compression to the data, e.g.
Huffman encoding, or simple silence deletion.

Here's an overview of popular file formats.

Self-describing file formats

extension, name origin variable parameters (fixed; comments)

.au or .snd NeXT, Sun rate, #channels, encoding, info string
.aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info
.aif(f), AIFC Apple, SGI same (extension of AIFF with compression)
.iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits)
.voc Soundblaster rate (8 bits/1 ch; can use silence deletion)
.wav, WAVE Microsoft rate, #channels, sample width, lots of info
.sf IRCAM rate, #channels, encoding, info
none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression)
none, MIME Internet (see below)
.mod or .nst Amiga (see below)

Note that the filename extension ".snd" is ambiguous: it can be either
the self-describing NeXT format or the headerless Mac/PC format, or
even a headerless Amiga format.

I know nothing for sure about the origin of HCOM files, only that
there are a lot of them floating around on our system and probably at
FTP sites over the world. The filenames usually don't have a ".hcom"
extension, but this is what SOX (see below) uses. The file format
recognized by SOX includes a MacBinary header, where the file
type field is "FSSD". The data fork begins with the magic word "HCOM"
and contains Huffman compressed data; after decompression it it is 8
bits unsigned data.

IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc).
Compression is optional (and extensible); volume is variable; author,
notes and copyright properties; etc.

AIFF, AIFC and WAVE are similar in spirit but allow more freedom in
encoding style (other than 8 bit/sample), amongst others.

There are other sound formats in use on Amiga by digitizers and music
programs, such as IFF/SMUS.

Appendices describes the NeXT and VOC formats; pointers to more info
about AIFF, AIFC, 8SVX and WAVE (which are too complex to describe
here) are also in appendices.

DEC systems (e.g. DECstation 5000) use a variant of the NeXT format
that uses little-endian encoding and has a different magic number
(0x0064732E in little-endian encoding).

Standard file formats used in the CD-I world are IFF but on the disc
they're in realtime files.

An interesting "interchange format" for audio data is described in the
proposed Internet Standard "MIME", which describes a family of
transport encodings and structuring devices for electronic mail. This
is an extensible format, and initially standardizes a type of audio
data dubbed "audio/basic", which is 8-bit U-LAW data sampled at 8000

Finally, a format that doesn't really belong here are "MOD" files,
usually with extension ".mod" or ".nst" (on PCs, that is -- on Amigas
they have a *prefix* of "mod."). These files are short clips of
sounds with sequencing information. This makes for fairly compact
files but is limitted to making music with samples of a piano and
trumpet, etc.

Headerless file formats

extension origin parameters
or name

.snd, .fssd Mac, PC variable rate, 1 channel, 8 bits unsigned
.ul US telephony 8 k, 1 channel, 8 bit "U-LAW" encoding
.snd? Amiga variable rate, 1 channel, 8 bits signed

It is usually easy to distinguish 8-bit signed formats from unsigned
by looking at the beginning of the data with 'od -b since most sounds start with a little bit of silence containing small
amounts of background noise, the signed formats will have an abundance
of bytes with values 0376, 0377, 0, 1, 2, while the unsigned formats
will have 0176, 0177, 0200, 0201, 0202 instead. (Using "od -c" will
also show any headers that are tacked in front of the file.)

The Apple IIgs records raw data in the same format as the Mac, but
uses a 0 byte as a terminator; samples with value 0 are replaced by 1.

File conversions


The most versatile tool for converting between various audio formats
is SOX ("Sound Exchange"). It can read and write various types of
audio files, and optionally applies some special effects (e.g. echo,
channel averaging, or rate conversion).

SOX recognizes all filename extensions listed above except ".snd",
which would be ambiguous anyway, and ".wav" (but there's a patch, see
below). Use type ".au" for NeXT ".snd" files. Mac and PC ".snd"
files are completely described by these parameters:

-t raw -b -u -r 11000

(or -r 22000 or -r 7333 or -r 5500; 11000 seems to be the most common

The source for SOX, version 5, was posted to alt.sources, and should
be widely archived. To save you the trouble of hunting it down, it
can be gotten by anonymous ftp from, in the
directory usenet/alt.sources/articles, files 5581.Z through 5585.Z.
(These files are compressed news articles containing shar files, if
you hadn't guessed.) I am sure many sites have similar archives, I'm
just listing one that I know of and which carries a lot of this kind
of stuff. (Also see the appendix if you don't have Internet access.)

A compressed tar file containing the same version of SOX is available
by anonymous ftp from [], in /pub/sox*.tar.Z.
You may be able to locate a nearer version using archie!

Ports of SOX:

- The source as posted should compile on any UNIX system with 4-byte

- A PC version is available by ftp from (see above) as
pub/sox4*.zip; also available from the garbo mail server.

- The latest Amiga SOX (corresponding to version 5) is available via
anonymous ftp to, files
systems/amiga/audio/utils/amisox*. (See below for a non-SOX

- Work is currently in progress to get SOX ported to VMS (watch
comp.os.vms for announcements).

SOX usage hints:

- Often, the filename extension of sound files posted on the net is
wrong. Don't give up, try a few other possibilities using the
"-t " option. Remember that the most common file type is
unsigned bytes, which can be indicated with "-t ub". You'll have to
guess the proper sampling rate, but often it's 11k or 22k.

- In particular, with SOX version 4 (or earlier), you have to
specify "-t 8svx" for files with an .iff extension.

- When converting linear samples to U-LAW using the .au type for the
output file, you must specify "-U" for the output file, otherwise
you will end up with a file containing a NeXT/Sun header but linear
samples -- only the NeXT will play such files correctly. Also, you
must explicitly specify an output sampling rate with "-r 8000".
(This may seem fixed for most cases in version 5, but it is still
occasionally necessary, so I'm keeping this warning in.)

Sun Sparc

On Sun Sparcs, starting at SunOS 4.1, a program "raw2audio" is
provided by Sun (in /usr/demo/SOUND -- see below) which takes a raw
U-LAW file and turns it into a ".au" file by prefixing it with an
appropriate header.


On NeXTs, you can usually rename .au files to .snd and it'll work like
a charm, but some .au files lack header info that the NeXT needs.
This can be fixed by using sndconvert:

sndconvert -c 1 -f 1 -s 8012.8210513 -o nextfile.snd

SGI Indigo and Personal IRIS

SGI supports a program sfconvert, similar in spirit to SOX (in
/usr/sbin in IRIX version 4.0). Also note that the sfplay program
(see the next section) can do on-the-fly conversion for several
popular formats.


Mike Cramer's SoundZAP can do no effects except rate change and it
only does conversions to IFF, but it is generally much faster than
SOX. (Ftp'able from the same directory as amisox above.)


The Tandy 1000 uses a (proprietary?) compressed format. There is a
PD Mac to Tandy conversion program called CONVERT.

Playing audio files on UNIX

The commands needed to play an audio file depend on the file format
and the available hardware and software. Most systems can only
directly play sound in their native format; use a conversion program
(see above) to play other formats.

Sun Sparc

Raw U-LAW files can be played using "cat file >/dev/audio".

A whole package for dealing with ".au" files is provided by Sun on an
experimental basis, in /usr/demo/SOUND. You may have to compile the
programs first. (If you can't find this directory, either you are not
running SunOS 4.1 yet, or your system administrator hasn't installed
it -- go ask him for it, not me!) The program "play" in this
directory recognizes all files in Sun/NeXT format, but can play only
those using U-LAW encoding at 8 k.

You can also cat a ".au" file to /dev/audio, if it uses U-LAW; the
header will sound like a short burst of noise but the rest of the data
will sound OK (really, the only difference in this case between raw
U-LAW and ".au" files is the header; the U-LAW data is exactly the

Finally, OpenWindows 3.0 has a full-fledged audio tool. You can drop
audio file icons into it, edit them, etc.


On NeXT machines, the standard "sndplay" program can play all NeXT
format files (this include Sun ".au" files). It supports at least
U-LAW at 8 k and 16 bits samples at 22 or 44.1 k. It attempts
on-the-fly conversions for other formats.

Sound files are also played if you double-click on them in the file

SGI Indigo and Personal IRIS

On SGI Indigo and the 4D/30 and /35 Personal IRIS workstations, the
program "sfplay" (in /usr/sbin) plays AIFF files, if the sampling rate
is one of 8000, 11025, 16000, 22050, 32000, 44100, or 48000. On the
Personal IRIS, you need to have the audio board installed (check the
output from hinv) and you must run IRIX 3.3.2 or 4.0 or higher.
"Workspace" plays audio files if you double click on them.

There is no simple /dev/audio interface on these SGI machines. (There
was one on 4D/25 machines, reading and writing signed linear 8-bit
samples at rates of 8, 16 and 32 k.)

A program "playulaw" was posted as part of the "radio 2.0" release
that I posted to several source groups recently; it plays raw U-LAW
files on the Indigo or Personal IRIS audio hardware.


The Sony RISC-NEWS line (NWS-3250 laptop, NWS-37xx desktop, NWS-38xx
desktop w/ IOP) also has builtin sound capabilities. You can also buy
external boards for the older NEWS machines or to add extra channels
to the new machines. In the default mode (8k/8-bit), Sun .au files
are directly supported (you can 'cat' .au files to /dev/sb and have
them play).

Vaxstation 4000

".au" files can be played by COPYING them to device "SOA0:". This
device is set up by enabling the driver SODRIVER, as described below:

DEC's sound stuff is like most other new toy. Hardware first, THEN the
software. DEC will soon be releasing a layered product called DECsound,
which will let you record, play, and (possibly) manipulate sound files.
Third party product(s) have ALREADY hit the market.

Enabling SODRIVER: (you can use the following command file)

$!---------------- cut here -------------------------------
$! enable SOUND driver
$ run sys$system:sysgen
connect soa0 /adapter=0 /csr=%x0e00 /vector=%o304 /driver=sodriver
$ exit
$!----------------- cut here ------------------------------------

The external audio port comes with a telephone-jack-like port. For
starters, you can plug a telephone RECEIVER right into this port to
hear your first sound files. After that, you can use the adapter
(that came with the VaxStation), and plug in a small set of stereo
speakers (the kind you'd plug into a WALKMAN, for example), for more


Most other UNIX boxes don't have audio hardware and thus can't play
audio data.

Playing audio files on micros

Most micros have at least a speaker built in, so theoretically all you
need is the right software. Unfortunately most systems don't come
bundled with sound-playing software, so there are many public domain
or shareware software packages, each with their own bugs and features.
Most separate sound recording hardware also comes with playing
software, most of which can play sound (in the file format used by
that hardware) even on machines that don't have that hardware

Chris S. Craig announces the following software for PCs:

ScopeTrax This is a complete PC sound player/editor package. Sounds
can be played back at ANY rate between 1kHz to 65kHz through
the PC speaker or the Sound Blaster. It supports several
file formats including VOC, IFF/8SVX, raw signed and raw
unsigned. A separate executable is provided to convert
.au and mu-law to raw format. ScopeTrax requires EGA/VGA
graphics for editing and displaying sounds on a REALTIME
oscilloscope. The package also includes:
* An expanded memory player which can play sounds
larger than 640K in size.
* Basic (rough) sound compression/uncompression
* Complete documentation.
The package is FREEWARE! It is available on SIMTEL in the
PD1:[MSDOS.SOUND] directory.

One of the appendices below contains a list of more programs to play
sound on the PC.

For sounds on Atari STs - programs are in the atari/sound/players
directory on (

Malcolm Slaney from Apple writes:

"We do have tools to play sound back on most of our Unix hosts. We wrote
a program called TcpPlay that lets us read a sound file on a Unix host,
open a TCP/IP connection to the Mac on my desk, and plays the file. We
think of it as X windows for sound (at least a step in that direction.)

This software is available for anonymous FTP from
Look for ~ftp/pub/TcpPlay/TcpPlay.sit.hqx.

Finally, there are MANY tools for working with sound on the Macintosh. Three
applications that come to mind immediately are SoundEdit (formerly by
Farralon and now by MacroMind/Paracomp), Alchemy and Eric Keller's Signalyze.
There are lots of other tools available for sound editing (including some
of the QuickTime Movie tools.)"

On a Tandy 1000, sounds can be played and recorded with DeskMate Sound
(SOUND.PDM), or if they not stored in compressed format, they can also
be played be a program called PLAYSND. No indication of whether
PLAYSND is PD or not. It hasn't been updated since March of 89.

The Sound Site Newsletter

An electronic publication with lots of info about digitised sound and
sound formats, albeit mostly on micros, is "The Sound Site
Newsletter". So far, 8 issues have appeared, the last in January
1992. Issues can be ftp'ed from, directory
directory pub/rogue/newsletters, or from,

Posting sounds

The newsgroup alt.binaries.sounds.misc is dedicated to postings
containing sound. (Discussions related to such postings belong in

There is no set standard for posting sounds; uuencoded files in most
popular formats are welcome, if split in parts under 50 kBytes. To
accomodate automatic decoding software (such as the ":decode" command
of the nn newsreader), please place a part indicator of the form
(mm/nn) at the end of your subject meaning this is number mm of a
total of nn part.

It is recommended to post sounds in the format that was used for the
original recording; conversions to other formats often lose
information and would do people with identical hardware as the poster
no favor. For instance, convering 8-bit linear sound to U-LAW loses
the lower few bits of the data, and rate changing conversions almost
always add noise. Converting from U-LAW to linear requires expansion
to 16 bit samples if no information loss is allowed!

U-LAW data is best posted with a NeXT/Sun header.

If you have to post a file in a headerless format (usually 8-bit
linear, like ".snd"), please add a description giving at least the
sampling rate and whether the bytes are signed (zero at 0) or unsigned
(zero at 0200). However, it is highly recommended to add a header
that indicates the sampling rate and encoding scheme; if necessary you
can use SOX to add a header of your choice to raw data.

Compression of sound files usually isn't worth it; the standard
"compress" algorithm doesn't save much when applied to sound data
(typically at most 10-20 percent), and compression algorithms
specifically designed for sound (e.g. NeXT's) are usually
proprietary. (See also the section "Compression schemes" earlier.)


Here are some more detailed pieces of info that I received by e-mail.
They are reproduced here virtually without much editing.

FTP access for non-internet sites

From the FAQ:

Sites not connected to the Internet cannot use FTP directly, but
there are a few automated FTP servers which operate via email.
Send mail containing only the word HELP to [email protected]
or [email protected], and the servers will send you
instructions on how to make requests


FAQ lists are available by anonymous FTP from
( and by email from [email protected] (send
a message containing "help" for instructions about the mail server).

AIFF Format (Audio IFF) and AIFC

This format was developed by Apple for storing high-quality sampled
sound and musical instrument info; it is also used by SGI and several
professional audio packages (sorry, I know no names). An extension,
called AIFC or AIFF-C, supports compression (see the last item below).

I've made a BinHex'ed MacWrite version of the AIFF spec (no idea if
it's the same text as mentioned below) available by anonymous ftp from []; the file is /pub/AudioIFF1.2.hqx. But
you may be better off with the AIFF-C specs, see below.

Mike Brindley ([email protected]) writes:

"The complete AIFF spec by Steve Milne, Matt Deatherage (Apple) is
available in 'AMIGA ROM Kernal Reference Manual: Devices (3rd Edition)'
1991 by Commodore-Amiga, Inc.; Addison-Wesley Publishing Co.;
ISBN 0-201-56775-X, starting on page 435 (this edition has a charcoal
grey cover). It is available in most bookstores, and soon in many
good librairies."

Finally, Mark Callow writes (in comp.sys.sgi):

"I have placed a PostScript version of the AIFF-C specification on for public ftp. It is in the file sgi/'s internet host number is (I think)"

The NeXT/Sun audio file format

Here's the complete story on the file format, from the NeXT
documentation. (Note that the "magic" number is ((int)0x2e736e64),
which equals ".snd".) Also, at the end, I've added a litte document
that someone posted to the net a couple of years ago, that describes
the format in a bit-by-bit fashion rather than from C.

I received this from Doug Keislar, NeXT Computer. This is also the
Sun format, except that Sun doesn't recognize as many format codes. I
added the numeric codes to the table of formats and sorted it.

SNDSoundStruct: How a NeXT Computer Represents Sound

The NeXT sound software defines the SNDSoundStruct structure to
represent sound. This structure defines the soundfile and Mach-O
sound segment formats and the sound pasteboard type. It's also used
to describe sounds in Interface Builder. In addition, each instance
of the Sound Kit's Sound class encapsulates a SNDSoundStruct and
provides methods to access and modify its attributes.

Basic sound operations, such as playing, recording, and cut-and-paste
editing, are most easily performed by a Sound object. In many cases,
the Sound Kit obviates the need for in-depth understanding of the
SNDSoundStruct architecture. For example, if you simply want to
incorporate sound effects into an application, or to provide a simple
graphic sound editor (such as the one in the Mail application), you
needn't be aware of the details of the SNDSoundStruct. However, if
you want to closely examine or manipulate sound data you should be
familiar with this structure.

The SNDSoundStruct contains a header, information that describes the
attributes of a sound, followed by the data (usually samples) that
represents the sound. The structure is defined (in
sound/soundstruct.h) as:

typedef struct {
int magic; /* magic number SND_MAGIC */
int dataLocation; /* offset or pointer to the data */
int dataSize; /* number of bytes of data */
int dataFormat; /* the data format code */
int samplingRate; /* the sampling rate */
int channelCount; /* the number of channels */
char info[4]; /* optional text information */
} SNDSoundStruct;

SNDSoundStruct Fields


magic is a magic number that's used to identify the structure as a
SNDSoundStruct. Keep in mind that the structure also defines the
soundfile and Mach-O sound segment formats, so the magic number is
also used to identify these entities as containing a sound.


It was mentioned above that the SNDSoundStruct contains a header
followed by sound data. In reality, the structure only contains the
header; the data itself is external to, although usually contiguous
with, the structure. (Nonetheless, it's often useful to speak of the
SNDSoundStruct as the header and the data.) dataLocation is used to
point to the data. Usually, this value is an offset (in bytes) from
the beginning of the SNDSoundStruct to the first byte of sound data.
The data, in this case, immediately follows the structure, so
dataLocation can also be thought of as the size of the structure's
header. The other use of dataLocation, as an address that locates
data that isn't contiguous with the structure, is described in
"Format Codes," below.

dataSize, dataFormat, samplingRate, and channelCount

These fields describe the sound data.

dataSize is its size in bytes (not including the size of the

dataFormat is a code that identifies the type of sound. For sampled
sounds, this is the quantization format. However, the data can also
be instructions for synthesizing a sound on the DSP. The codes are
listed and explained in "Format Codes," below.

samplingRate is the sampling rate (if the data is samples). Three
sampling rates, represented as integer constants, are supported by
the hardware:

Constant Sampling Rate (samples/sec)

SND_RATE_CODEC 8012.821 (CODEC input)
SND_RATE_LOW 22050.0 (low sampling rate output)
SND_RATE_HIGH 44100.0 (high sampling rate output)

channelCount is the number of channels of sampled sound.


info is a NULL-terminated string that you can supply to provide a
textual description of the sound. The size of the info field is set
when the structure is created and thereafter can't be enlarged. It's
at least four bytes long (even if it's unused).

Format Codes

A sound's format is represented as a positive 32-bit integer. NeXT
reserves the integers 0 through 255; you can define your own format
and represent it with an integer greater than 255. Most of the
formats defined by NeXT describe the amplitude quantization of
sampled sound data:

Value Code Format

0 SND_FORMAT_UNSPECIFIED unspecified format
1 SND_FORMAT_MULAW_8 8-bit mu-law samples
2 SND_FORMAT_LINEAR_8 8-bit linear samples
3 SND_FORMAT_LINEAR_16 16-bit linear samples
4 SND_FORMAT_LINEAR_24 24-bit linear samples
5 SND_FORMAT_LINEAR_32 32-bit linear samples
6 SND_FORMAT_FLOAT floating-point samples
7 SND_FORMAT_DOUBLE double-precision float samples
8 SND_FORMAT_INDIRECT fragmented sampled data
11 SND_FORMAT_DSP_DATA_8 8-bit fixed-point samples
12 SND_FORMAT_DSP_DATA_16 16-bit fixed-point samples
13 SND_FORMAT_DSP_DATA_24 24-bit fixed-point samples
14 SND_FORMAT_DSP_DATA_32 32-bit fixed-point samples
15 ?
16 SND_FORMAT_DISPLAY non-audio display data
18 SND_FORMAT_EMPHASIZED 16-bit linear with emphasis
19 SND_FORMAT_COMPRESSED 16-bit linear with compression
20 SND_FORMAT_COMPRESSED_EMPHASIZED A combination of the two above

Most formats identify different sizes and types of
sampled data. Some deserve special note:

-- SND_FORMAT_DSP_CORE format contains data that represents a
loadable DSP core program. Sounds in this format are required by the
SNDBootDSP() and SNDRunDSP() functions. You create a
SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension
".lod") with the SNDReadDSPfile() function.

-- SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that
contain DSP commands created by the Music Kit. Sounds in this format
can only be created through the Music Kit's Orchestra class, but can
be played back through the SNDStartPlaying() function.

-- SND_FORMAT_DISPLAY format is used by the Sound Kit's
SoundView class. Such sounds can't be played.

-- SND_FORMAT_INDIRECT indicates data that has become
fragmented, as described in a separate section, below.

-- SND_FORMAT_UNSPECIFIED is used for unrecognized formats.

Fragmented Sound Data

Sound data is usually stored in a contiguous block of memory.
However, when sampled sound data is edited (such that a portion of
the sound is deleted or a portion inserted), the data may become
discontiguous, or fragmented. Each fragment of data is given its own
SNDSoundStruct header; thus, each fragment becomes a separate
SNDSoundStruct structure. The addresses of these new structures are
collected into a contiguous, NULL-terminated block; the dataLocation
field of the original SNDSoundStruct is set to the address of this
block, while the original format, sampling rate, and channel count
are copied into the new SNDSoundStructs.

Fragmentation serves one purpose: It avoids the high cost of moving
data when the sound is edited. Playback of a fragmented sound is
transparent-you never need to know whether the sound is fragmented
before playing it. However, playback of a heavily fragmented sound
is less efficient than that of a contiguous sound. The
SNDCompactSamples() C function can be used to compact fragmented
sound data.

Sampled sound data is naturally unfragmented. A sound that's freshly
recorded or retrieved from a soundfile, the Mach-O segment, or the
pasteboard won't be fragmented. Keep in mind that only sampled data
can become fragmented.

>From!purdue!decwrl!ucbvax!ziploc!eps Wed Apr 4
23:56:23 EST 1990
Article 5779 of
>From: [email protected] (Eric P. Scott)
Subject: Re: Format of NeXT sndfile headers?
Message-ID: <[email protected]>
Date: 31 Mar 90 21:36:17 GMT
References: <[email protected]>
Reply-To: [email protected] (Eric P. Scott)
Organization: San Francisco State University
Lines: 42

In article <[email protected]>
[email protected] (Brian Kendig) writes:
>I'd like to take a program I have that converts Macintosh sound
>to NeXT sndfiles and polish it up a bit to go the other direction as

Two people have already submitted programs that do this
(Christopher Lane and Robert Hood); check the various
NeXT archive sites.

> Could someone please give me the format of a NeXT sndfile

0 1 2 3
0 | 0x2e | 0x73 | 0x6e | 0x64 | "magic" number
4 | | data location
8 | | data size
12 | | data format (enum)
16 | | sampling rate (int)
20 | | channel count
24 | | | | | (optional) info

28 = minimum value for data location

data format values can be found in /usr/include/sound/soundstruct.h

Most common combinations:

sampling channel data
rate count format
voice file 8012 1 1 = 8-bit mu-law
system beep 22050 2 3 = 16-bit linear
CD-quality 44100 2 3 = 16-bit linear

IFF/8SVX Format

Newsgroups: alt.binaries.sounds.d,
Subject: Format of the IFF header (Amiga sounds)
Message-ID: <[email protected]>
From: [email protected] (Joe Smith)
Date: 23 Oct 91 23:54:38 GMT
Followup-To: alt.binaries.sounds.d
Organization: BT North America (Tymnet)

The first 12 bytes of an IFF file are used to distinguish between an Amiga
picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file
conforming to the IFF specification. The middle 4 bytes is the count of
bytes that follow the "FORM" and byte count longwords. (Numbers are stored
in M68000 form, high order byte first.)


FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long.

0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR
F O R M 15040 8 S V X V H D R
0010: 00000014 00003A98 00000000 00000000 ......:.........
20 15000 0 0
0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:.
10000 1 0 1.0 B O D Y 15000

0000000..03 = "FORM", identifies this as an IFF format file.
FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.)
FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice.

????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY.
VHDR+00..03 (ULONG) = number of bytes to follow.
VHDR+04..07 (ULONG) = samples in the high octave 1-shot part.
VHDR+08..0B (ULONG) = samples in the high octave repeat part.
VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0.
VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.)
VHDR+12 (UBYTE) = number of octaves of waveforms in sample.
VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding).
VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.)

????+00..03 = "BODY", identifies the start of the audio data.
BODY+00..03 (ULONG) = number of bytes to follow.
BODY+04..NNNNN = Data, signed bytes, from -128 to +127.

0030: 04030201 02030303 04050605 05060605
0040: 06080806 07060505 04020202 01FF0000
0060: FDFDFF00 00FFFFFF 00000000 00FFFF00
0070: 00000000 00FF0000 00FFFEFF 00000000
0080: 00010000 000101FF FF0000FE FEFFFFFE

This small section of the audio sample shows the number ranging from -5 (0xFD)
to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the
file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or
"(c) " may be present, and may be in any order. You will have to check the
byte count in each chunk to determine how many bytes to skip.

Playing sound on a PC

From: Eric A Rasmussen

Any turbo PC (8088 at 8 Mhz or greater)/286/386/486/etc. can produce a quality
playback of single channel 8 bit sounds on the internal (1 bit, 1 channel)
speaker by utilizing Pulse-Width-Modulation, which toggles the speaker faster
than it can physically move to simulate positions between fully on and fully
off. There are several PD programs of this nature that I know of:

REMAC - Plays MAC format sound files. Files on the Macintosh, at least the
sound files that I've ripped apart, seem to contain 3 parts. The
first two are info like what the file icon looks like and other
header type info. The third part contains the raw sample data, and
it is this portion of the file which is saved to a seperate file,
often named with the .snd extension by PC users. Personally, I like
to name the files .s1, .s2, .s3, or .s4 to indicate the sampling rate
of the file. (-s# is how to specify the playback rate in REMAC.)
REMAC provides playback rates of 5550hz, 7333hz, 11 khz, & 22 khz.
REMAC2 - Same as REMAC, but sounds better on higher speed machines.
REPLAY - Basically same as REMAC, but for playback of Atari ST sounds.
Apparently, the Atari has two sound formats, one of which sounds like
garbage if played by REMAC or REPLAY in the incorrect mode. The
other file format works fine with REMAC and so appears to be 'normal'
unsigned 8-bit data. REPLAY provides playback rates of 11.5 khz,
12.5 khz, 14 khz, 16 khz, 18.5 khz, 22khz, & 27 khz.

These three programs are all by the same author, Richard E. Zobell who does
not have an internet mail address to my knowledge, but does have a GEnie email
address of R.ZOBELL.

Additionally, there are various stand-alone demos which use the internal
speaker, of which there is one called mushroom which plays a 30 second
advertising jingle for magic mushroom room deoderizers which is pretty
humerous. I've used this player to playback samples that I ripped out of the
commercial game program Mean Streets, which uses something they call RealSound
(tm) to playback digital samples on the internal speaker. (Of course, I only do
this on my own system, and since I own the game, I see no problems with it.)

For owners of 8 Mhz 286's and above, the option to play 4 channel 8 bit sounds
(with decent quality) on the internal speaker is also a reality. Quite a
number of PD programs exist to do this, including, but not limited to:

ModEdit, ModPlay, ScreamTracker, STM, Star Trekker, Tetra, and probably a few

All these programs basically make use of various sound formats used by the
Amiga line of computers. These include .stm files, .mod files
[a.k.a. mod. files], and .nst files [really the same hing]. Also,
these programs pretty much all have the option to playback the
sound to add-on hardware such as the SoundBlaster card, the Covox series of
devices, and also to direct the data to either one or two (for stereo)
parallel ports, which you could attach your own D/A's to. (From what I have
seen, the Covox is basically an small amplified speaker with a D/A which plugs
into the parallel port. This sounds very similiar to the Disney Sound System
(DSS) which people have been talking about recently.)

The EA-IFF-85 documentation

From: [email protected]edu

As promised, here's an ftp location for the EA-IFF-85 documentation. It's
the November 1988 release as revised by Commodore (the last public release),
with specifications for IFF FORMs for graphics, sound, formatted text, and
more. IFF FORMS now exist for other media, including structured drawing, and
new documentation is now available only from Commodore.

The documentation is at [], in the
directory /amiga/f1/ff185. The complete file list is as follows:


All files except DOCUMENTS.zoo are Amiga-specific, but may be used as a basis
for conversion to other platforms. Well, I take that tentatively back. I
don't know what TP_IFF_Specs.zoo contains, so it might be non-Amiga-specific.

US Federal Standard 1016 availability

From: Joe Campbell N3JBC [email protected] [email protected]

The U.S. DoD's Federal-Standard-1016 4800 bps code excited linear prediction
voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are
now available for worldwide distribution at no charge (on DOS diskettes,
but configured to compile on Sun SPARC stations) from:

Bob Fenichel
National Communications System
Washington, D.C. 20305
1-703-746-4960 (fax)

In addition to the source codes, example input and processed speech files
are included along with a technical information bulletin to assist in
implementation of FS-1016 CELP. (An anonymous ftp site is being considered
for future releases.)

Copies of the FS-1016 document are available for $2.50 each from:

GSA Rm 6654
7th & D St SW
Washington, D.C. 20407


The following articles describe the Federal-Standard-1016 4.8-kbps CELP
coder (it's unnecessary to read more than one):

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
"The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
"The DoD 4.8 kbps Standard (Proposed Federal Standard 1016),"
in Advances in Speech Coding, ed. Atal, Cuperman and Gersho,
Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.

Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
Technology Magazine, April/May 1990, p. 58-64.

For U.S. FED-STD-1016 (4800 bps CELP) _realtime_ DSP code
and information about products using this code, contact:

John DellaMorte
DSP Software Engineering
165 Middlesex Tpk, Suite 206
Bedford, MA 01730
1-617-275-4323 (fax)
[email protected]

DSP Software Engineering's code can run on a DSP Research's Tiger 30 board
(a PC board with a TMS320C3x and analog interface suited to development work)
or on Intellibit's AE2000 TMS320C31 based 3" by 2.5" card.

DSP Research Intellibit
1095 E. Duane Ave. P.O. Box 9785
Sunnyvale, CA 94086 McLean, VA 22102-0785
(408)773-1042 (703)442-4781
(408)736-3451 (fax) (703)442-4784 (fax)

From: [email protected] (Richard Tobias )

For U.S. FED-STD-1016 (4800 bps CELP) _realtime_ DSP code and
information about products using this code using the AT&T DSP32C and
AT&T DSP3210, contact:

White Eagle Systems Technology, Inc.
1123 Queensbridge Way
San Jose, CA 95120
(408) 997-2706
(408) 997-3584 (fax)
[email protected]

Creative Voice (VOC) file format

From: [email protected]

(byte numbers are hex!)

HEADER (bytes 00-19)
Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block]

- ---------------------------------------------------------------

byte # Description
------ ------------------------------------------
00-12 "Creative Voice File"
13 1A (eof to abort printing of file)
14-15 Offset of first datablock in .voc file (std 1A 00
in Intel Notation)
16-17 Version number (minor,major) (VOC-HDR puts 0A 01)
18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11)

- ---------------------------------------------------------------


Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes)
NOTE: Terminator Block is an exception -- it has only the TYPE byte.

TYPE Description Size (3-byte int) Info
---- ----------- ----------------- -----------------------
00 Terminator (NONE) (NONE)
01 Sound data 2+length of data *
02 Sound continue length of data Voice Data
03 Silence 3 **
04 Marker 2 Marker# (2 bytes)
05 ASCII length of string null terminated string
06 Repeat 2 Count# (2 bytes)
07 End repeat 0 (NONE)

*Sound Info Format: **Silence Info Format:
--------------------- ----------------------------
00 Sample Rate 00-01 Length of silence - 1
01 Compression Type 02 Sample Rate
02+ Voice Data

Marker# -- Driver keeps the most recent marker in a status byte
Count# -- Number of repetitions + 1
Count# may be 1 to FFFE for 0 - FFFD repetitions
or FFFF for endless repetitions
Sample Rate -- SR byte = 256-(1000000/sample_rate)
Length of silence -- in units of sampling cycle
Compression Type -- of voice data
8-bits = 0
4-bits = 1
2.6-bits = 2
2-bits = 3
Multi DAC = 3+(# of channels) [interesting--
this isn't in the developer's manual]

RIFF WAVE (.WAV) file format

RIFF is a format by Microsoft and IBM which is similar in spirit and
functionality as EA-IFF-85, but not compatible (and it's in
little-endian byte order, of course :-). WAVE is RIFF's equivalent of
AIFF, and its inclusion in Microsoft Windows 3.1 has suddenly made it
important to know about.

Rob Ryan was kind enough to send me a description of the RIFF format.
Unfortunately, it is too big to include here (27 k), but I've made it
available for anonymous ftp as

And here's a pointer to the official description from Matt Saettler,
Microsoft Multimedia:

"The complete definition of the WAVE file format as defined by
IBM/Microsoft is available for anon. FTP from in the
vendor/microsoft/multimedia directory."

(Rob Ryan's version may actually be an extract from one of the files
stored there.)


 December 23, 2017  Add comments

Leave a Reply