Rizin
unix-like reverse engineering framework and cli tools
LZ4 Streaming API Example : Line by Line Text Compression

by Takayuki Matsuoka

blockStreaming_lineByLine.c is LZ4 Straming API example which implements line by line incremental (de)compression.

Please note the following restrictions :

  • Firstly, read "LZ4 Streaming API Basics".
  • This is relatively advanced application example.
  • Output file is not compatible with lz4frame and platform dependent.

What's the point of this example ?

  • Line by line incremental (de)compression.
  • Handle huge file in small amount of memory
  • Generally better compression ratio than Block API
  • Non-uniform block size

How the compression works

First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.

(1)
Ring Buffer
+--------+
| Line#1 |
+---+----+
|
v
{Out#1}
(2)
Prefix Mode Dependency
+----+
| |
v |
+--------+-+------+
| Line#1 | Line#2 |
+--------+---+----+
|
v
{Out#2}
(3)
Prefix Prefix
+----+ +----+
| | | |
v | v |
+--------+-+------+-+------+
| Line#1 | Line#2 | Line#3 |
+--------+--------+---+----+
|
v
{Out#3}
(4)
External Dictionary Mode
+----+ +----+
| | | |
v | v |
------+--------+-+------+-+--------+
| .... | Line#X | Line#X+1 |
------+--------+--------+-----+----+
^ |
| v
| {Out#X+1}
|
Reset
(5)
Prefix
+-----+
| |
v |
------+--------+--------+----------+--+-------+
| .... | Line#X | Line#X+1 | Line#X+2 |
------+--------+--------+----------+-----+----+
^ |
| v
| {Out#X+2}
|
Reset

Next (see (1)), read first line to ringbuffer and compress it by LZ4_compress_continue(). For the first time, LZ4 doesn't know any previous dependencies, so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer. After that, write {Out#1} to the file and forward ringbuffer offset.

Do the same things to second line (see (2)). But in this time, LZ4 can use dependency to Line#1 to improve compression ratio. This dependency is called "Prefix mode".

Eventually, we'll reach end of ringbuffer at Line::X (see (4)). This time, we should reset ringbuffer offset. After resetting, at Line::X+1 pointer is not adjacent, but LZ4 still maintain its memory. This is called "External Dictionary Mode".

In Line::X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line::X+1. This is the same situation as Line#2.

Continue these procedure to the end of text file.

How the decompression works

Decompression will do reverse order.

  • Read compressed line from the file to buffer.
  • Decompress it to the ringbuffer.
  • Output decompressed plain text line to the file.
  • Forward ringbuffer offset. If offset exceedes end of the ringbuffer, reset it.

Continue these procedure to the end of the compressed file.