WHERE DID ZIP FILES COME FROM ANYWAY ?
In order to answer this question, we must first turn our
thoughts backward in time; back to the days when computer
resources were at real premiums.
Home computers were something you saw in "Popular
Electronics". They were relatively expensive, slow, kludgey
and you had to put them together yourself. But once you had
one of the little beasties together, it shone like a silver
knight; it was beautiful; it had LED's, buttons and a
video monitor that did double duty with late night
television. It was connected to your trusty $39.95 "tape
drive". My friend Chuck and I bought a Netronics ELF II. It
had an 1802 processor, 256 bytes of static ram, a HEX keypad
and RS232C ports. WE had the DELUXE model with the HEX
display! The video display was fashioned from of a miniature
RCA portable TV. The tuning section removed, along with the
audio section improved the bandwidth dramatically. We were
the gleaming daddies of this little creature.
The computer came complete with a HEX listing of a program
that would display the Starship Enterprise in big block
graphics. It took an hour to key in, but when you saw that
ship appear on the display, the earth almost moved! That is,
until we boastfully presented this goliath fabrication to a
neighbor, who glibly replied, "pretty neat, now make it
move". My world crushed; reality landing hard; it became
painfully apparent that there was not much that you could do
without some sort of operating system. Those were not the
days where you could trot on down to the local computer store
(there were none) and pick up a fresh copy of DOS.
We wrote a little monitoring program that allowed us to do
marvelous things, we could move bytes in memory, load and
save programs to and from the tape and we could execute those
programs once loaded. This was all done by entering cryptic
commands like, 04,03,00,00,FF,FF - in English this means load
the program from tape into locations 0000-FFFFh. Not quite as
easy as DOS, but at least, the computer DID something. We
waited until the price came down and bought a 4K memory board
for about $300, we were sailing then! Later, we built another
8K card, some digital I/O ports, an ASCII keyboard and a
music board. It was a regular digital wonder, cables
everywhere, flashing lights; but it worked.
What does all of this have to do with ZIP files? I have been
writing data compression software since 1981 and with the
above as a background, have come to develop a great respect for
system resources (or more properly, the lack thereof). Other
individuals have similar backgrounds and similar regard for
resources. Consequently, in the CP/M days, we saw programs
like SQ.COM, which would SQUEEZE (Huffman coding) files into
a smaller space. We saw compression ratios of around 30-40%
and were thrilled, because we could save some disk space
(which we had very little of). Soon another program appeared,
LU.COM which built library files (.LBR), that is, the program
gathered a group of files into a single file on disk. This
action alone could readily be responsible for a phenomenal
savings in disk space due to allocation units. To store your
files on disk and really take advantage of all this, you
first had to SQUEEZE the files, then insert them into a
library file. It was a lot of trouble, but it was worth it.
Other programs such as B29 emerged, which was both a CP/M
"shell" and a decompression /delibrarying program (from IDC).
As the DOS machines began to increase in popularity (and
availability), more and more of the CP/M software was
converted over for DOS. SQ.COM and LU.COM were both among
those in the first wave of conversions.
A few years down the line, ARC.EXE appeared. This program
merged the two step process of library files into a single
step. Quickly .ARC files gained prominence over the .LBR
files. The same compression algorithm was used (SQUEEZING),
but ARC.EXE added the further step of analyzing the files
before putting them into the archive in order to determine if data
compression would provide any savings. This actuality brought
to light the need for more compression methods, since not
every method will bring about disk savings on every file; a
fact that remains unchanged today. Each compression algorithm has
its own strengths and weaknesses.
SEA, the company who developed ARC.EXE threatened to sue IDC
claiming that our work was a derivative of theirs, however,
after lawyer to lawyer conferences and some research on their
part, SEA realized that our claim that THEIR work, was actually
a derivative of ours and several others, they decided not to
pursue legal action against us. They did, however, continue
with a lawsuit against PKWARE, again claiming that PKWARE's work
was a derivative of SEA's work. Unfortunately, PKWARE was
a relative newcomer at the time and didn't have the product
history that IDC did and the lawsuit proceeded. SEA then
attempted to get IDC to testify as an expert witness against
PKWARE in the proceedings, we refused.
It was during this time that IDC (Gary Conway) and PKWARE (Phil Katz)
became friends. We saw the short comings of the current compression formats
and decided that a new format was needed. We also agreed that the format
would be placed in the public domain so that no one entity could again
falsely lay claim and attempt to license anyone else. I think
that it's likely due to this simple fact that ZIP files have
enjoyed the longevity that they have. We are approaching 20 years
now, a life-cycle relatively unheard of in the software industry.
There is usually a single compression method that will do
best on a given file. Despite all of this, ARC.EXE was
laboriously slow, thus emerged PKARC.EXE by Phil Katz. PKARC
was very fast and offered other improvements. Good
compression and speed were attractive offerings. New and
better compression methods were added along with file
encryption (password protection). Then along came NARC.EXE
and IDCshell the first menu driven archive /dearchive
utilities (and still the only ones with built in compression
routines). NARC and IDCshell relieved the task of typing a
lot of filenames and offered easy random accesss to .ARC
files.
The real downside to .ARC files (and .ZOO and .PAK and LHARC
files) are twofold. First, there is no expandability. As
compression needs and compression technology grows, so do the
demands upon the format of the compressed files. Second, data
integrity guarantees, were less than desirable, in light of
current software technology. It was thus time for another
step in the librarying evolution. This next step was
developed jointly by IDC and PKWARE and released into the
public domain on Feb. 14,1989, via a joint press release. Why
public domain? Because the ZIP format was built on the work
of many others (as were .ARC files)
The ZIP format takes advantage of a 32 bit CRC algorithm that
catches many more errors than the older 16 bit CRC's. The ZIP
format offers greater protection from disaster by
implementing both the distributed directory, (as was found in
.ARC files) and a new central directory. This adds a great
measure of protection against data loss by providing the
directory information in two places instead of one. The
central directory also makes listing the directory of a ZIP
file much faster, since you no longer have to read the entire
file to do so. The ZIP format offers the advantage of
expandability, that is, as our collective needs grow, we can
stick with the same format because it can grow with us. There
are enough non-standards in the computer industry and
hopefully, the ZIP format will bring about a new longer-
lasting standard for compressed data files.
Gary Conway, President
Infinity Design Concepts, Inc. (IDC)
1143 Johnson Road
Louisville, Kentucky 40245
In fond memory of Phil Katz, my friend.
|