mirror of
https://github.com/aseprite/aseprite.git
synced 2025-01-14 03:40:55 +00:00
124 lines
6.3 KiB
Plaintext
124 lines
6.3 KiB
Plaintext
LZW compression used to encode/decode a GIF file by Bob Montgomery [73357,3140]
|
|
|
|
ENCODER
|
|
Consider the following input data stream in a 4 color (A, B, C, D) system.
|
|
We will build a table of codes which represent strings of colors. Each time
|
|
we find a new string, we will give it the next code, and break it into a
|
|
prefix string and a suffix color. The symbols \ or --- represent the prefix
|
|
string, and / represents the suffix color. The first 4 entries in the table
|
|
are the 4 colors with codes 0 thru 3, each of which represents a single
|
|
color. The next 2 codes (4 and 5) are the clear code and the end of image
|
|
code. The first available code to represent a string of colors is 6. Each
|
|
time we find a new code, we will send the prefix for that code to the
|
|
output data stream.
|
|
|
|
A B A B A B A B B B A B A B A A C D A C D A D C A B A A A B A B .....
|
|
\/\/---/-----/\/---/-------/\/ \/ \ /\/---/---/\ /\/-----/---/---/
|
|
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Code
|
|
6 8 10 8 14 16 8 13 7 Prefix
|
|
|
|
The encoder table is built from input data stream. Always start with the
|
|
suffix of last code, and keep getting colors until you have a new
|
|
combination.
|
|
|
|
Color Code Prefix Suffix String Output
|
|
A 0 -
|
|
B 1 -
|
|
C 2 -
|
|
D 3 -
|
|
Clear 4 -
|
|
End 5 -
|
|
A \ A A First color is a special case.
|
|
B / \ 6 A B AB A
|
|
A | / 7 B A BA B
|
|
B |
|
|
A / | 8 6 A ABA 6
|
|
B |
|
|
A |
|
|
B \ / 9 8 B ABAB 8
|
|
B / | 10 B B BB B
|
|
B |
|
|
A | / 11 10 A BBA 10
|
|
B |
|
|
A |
|
|
B |
|
|
A / \ 12 9 A ABABA 9
|
|
A \ / 13 A A AA A
|
|
C / \ 14 A C AC A
|
|
D \ / 15 C D CD C
|
|
A / | 16 D A DA D
|
|
C |
|
|
D | / 17 14 D ACD 14
|
|
A |
|
|
D / \ 18 16 D DAD 16
|
|
C \ / 19 D C DC D
|
|
A / | 20 C A CA C
|
|
B |
|
|
A |
|
|
A | / 21 8 A ABAA 8
|
|
A |
|
|
B / | 22 13 B AAB 13
|
|
A |
|
|
B / 23 7 B BAB 7
|
|
|
|
The resultant output stream is: A B 6 8 B 10 9 A A C D 14 16 D C 8 ....
|
|
The GIF encoder starts with a code length of 2+1=3 bits for 4 colors, so
|
|
when the code reaches 8 we will have to increase the code size to 4 bits.
|
|
Similarly, when the code gets to 16 we will have to increse the code size
|
|
to 5 bits, etc. If the code gets to 13 bits, we send a clear code and start
|
|
over. See GIFENCOD.GIF for a flow diagram of the encoding process. This
|
|
uses a tree method to search if a new string is already in the table, which
|
|
is much simpler, faster, and easier to understand than hashing.
|
|
|
|
===========================================================================
|
|
|
|
DECODER
|
|
|
|
We will now see if we can regenerate the original data stream and duplicate
|
|
the table looking only at the output data stream generated by the encoder
|
|
on the previous page. The output data stream is:
|
|
|
|
A B 6 8 B 10 9 A A C D 14 16 D C 8 ....
|
|
|
|
The docoding process is harder to see, but easier to implement, than the
|
|
encoding process. The data is taken in pairs, and a new code is assigned to
|
|
each pair. The prefix is the left side of the pair, and the suffix is the
|
|
color that the right side of the pair decomposes to from the table. The
|
|
decomposition is done by outputing the suffix of the code, and using the
|
|
prefix as the new code. The process repeats until the prefix is a single
|
|
color, and it is output too. The output of the decomposition is pushed onto
|
|
a stack, and then poped off the stack to the display, which restores the
|
|
original order that the colors were seen by the encoder. We will go thru
|
|
the first few entries in detail, which will hopefully make the process
|
|
clearer.
|
|
The first pair is (A B), so the prefix of code 6 is A and the suffix
|
|
is B, and 6 represents the string AB. The color A is sent to the display.
|
|
The 2nd pair is (B 6), so the prefix of code 7 is B and the suffix is
|
|
the the last color in the decomposition of code 6. Code 6 decomposes into
|
|
BA, so code 7 = BA, and has a suffix A. The color B is sent to the display.
|
|
The 3rd pair is (6 8) and the next code is 8. How can we decompose 8.
|
|
We know that the prefix of code 8 is 6, but we don't know the suffix. The
|
|
answer is that we use the suffix of the prefix code; A in this case since
|
|
the suffix of 6 is A. Thus, code 8 = ABA and has a suffix A. We decompose 6
|
|
to get BA, which becomes AB when we pop it off the stack to the display.
|
|
The 4th pair is (8 B), so code 9 has a prefix of 8 and a suffix of B,
|
|
and code 9 = ABAB. We output ABA to the stack, and pop it off to the
|
|
display as ABA.
|
|
The 5th pair is (B 10) and the next code is 10. The prefix of code 10
|
|
is B and the suffix is B (since the prefix is B). Code 10 = BB, and we
|
|
output the prefix B to the display.
|
|
The 6th pair is (10 9) and the next code is 11. Thus the prefix of
|
|
code 11 is 10 and the suffix is the last color in the decomposition of 9,
|
|
which is A. Thus code 11 = BBA, And we output BB to the display.
|
|
So far, we have output the correct colors stream A B AB ABA B BB to
|
|
the display, and have duplicated the codes 6 thru 11 in the encoder table.
|
|
This process is repeated for the whole data stream to reconstruct the
|
|
original color stream and build a table identical to the one built by the
|
|
encoder. We start the table with codes 0-5 representing the 4 colors, the
|
|
clear code, and the end code. When we get to code 8, we must increse the
|
|
code size to 5 bits, etc. See GIFDECOD.GIF for a flow diagram of the
|
|
decoding process.
|
|
|
|
I Hope this helps take some of the mystery out of LZW compression, which is
|
|
really quite easy once you 'see' it. Bob Montgomery
|