The order-1 ISSE adjusts the order-0 ICM prediction by mixing it in the logistic domain with a constant, such that the pair of weights is selected by an 8-bit bit history, which is selected by an order 1 context of the BWT output. Option -m2 selects the better BWT mode (bwt2), which drops the RLE step and uses an order 0-1 ISSE chain. Memory usage for -m3 and -m4 is not affected by block size. Jan 15 2013 - added lzwc 0.1, lzwc 0.3, lzwc_bitwise 0.7, lzip 1.14-rc3. Jan 18 2013 - updated plzma_v3b (not v3p), plzma_v3c.
Memory usage per thread for the two BWT modes is 5 times the block size after rounding up to a power of 2. Nov 26 2012 - added comprox 0.10.0, comprolz 0.10.0. Dec 17 2012 - added comprox 0.11.0, comprolz 0.11.0.
The option -b N selects a block size of N*2^20 - 256 bytes.
Other changes: there is no longer an option to limit memory.
Compression time 61480 ns/byte timed on a 2 x dual core (only one core active) Intel Woodcrest 2GHz with 1333MHz fsb and 4GB 667MHz CL5 memory under Si Software Sandra Lite 2007. Drystone ALU 37,014 MIPS, Whetstone i SSE3 25,393 MFLOPS, Integer x8 i SSE4 220,008 it/s, Floating-point x4 i SSE2 119,227 it/s. Reported by Giorgio Tani (author of Pea Zip) on Nov. Tested on a Mac Book Pro, Intel T2500 Core Duo CPU (one core used), with 512 MB memory under Win XP SP2. Usage is 111 MB and 246 MB per thread for -m3 and -m4 respectively.
Compressed sizes are based on the unzp source code (37,967 bytes). Unlike the earlier version, it correctly handles all legal ZPAQL, such as jumps into the middle of a 2 byte instruction, such as occurs in max_enwik9
There is a separate decompresser, unzp, which is optimized for fast, mid, max, bwtrle1, and bwt2 modes, and can be configured to optimize for other models by generating, compiling, linking, and running C code for an optimized version of itself. It uses libzpaq v4.00, which internally translates ZPAQL into just-in-time (JIT) x86-32 or x86-64, which runs about as fast as the previous version that translated ZPAQL to C and compiled it.
bytes of the XML text dump of the English version of Wikipedia on Mar. A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. Symbols may be arithmetic coded (fractional bit length for best compression), Huffman coded (bit aligned for speed), or byte aligned as a preprocessing step. Win RK compression options: Model size 800MB, Audio model order: 255, Bit-stream model order: 27, Use text dictionary: Enabled, Fast analyses: Disabled, Fast executable code compression: Disabled 10. Like earlier versions of zpaq, it also accepts configuration files and external preprocessors. The journaling format is not compatible with zpaq versions prior to 6.00. Converted decmprs8, decomp8, decomp8b, all_HKCC, lpaq9* to Mar 05 2012 - added crook v0.1.
The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). Alg: compression algorithm, referring to the method of parsing the input into symbols (strings, bytes, or bits) and estimating their probabilities (modeling) for choosing code lengths. Decompression verified on enwik8 only (not timed, about 2.5 hours). The threshold for compressing a block is 1/16, 1/32, 1/64, and 1/128 of bytes predicted by the order 1 model, respectively.
There is no solid mode compression because BWT requires that each block contain only one whole or part of a file. Oct 14 2012 - updated Tiny LZP 0.1, added zcm 0.70b.
The default number of threads (-t option) is the number of cores.
Input is deduplicated before compression by dividing input files into fragments averaging 64 KB on content-dependent boundaries that move when data is inserted or removed.