4 LZMA SDK provides the documentation, samples, header files, libraries,
5 and tools you need to develop applications that use LZMA compression.
7 LZMA is default and general compression method of 7z format
8 in 7-Zip compression program (www.7-zip.org). LZMA provides high
9 compression ratio and very fast decompression.
11 LZMA is an improved version of famous LZ77 compression algorithm.
12 It was improved in way of maximum increasing of compression ratio,
13 keeping high decompression speed and low memory requirements for
21 LZMA SDK is written and placed in the public domain by Igor Pavlov.
29 - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
30 - Compiled file->file LZMA compressing/decompressing program for Windows system
35 To compile C++ version of file->file LZMA encoding, go to directory
36 C++/7zip/Compress/LZMA_Alone
37 and call make to recompile it:
38 make -f makefile.gcc clean all
40 In some UNIX/Linux versions you must compile LZMA with static libraries.
41 To compile with static libraries, you can use
47 lzma.txt - LZMA SDK description (this file)
48 7zFormat.txt - 7z Format description
49 7zC.txt - 7z ANSI-C Decoder description
50 methods.txt - Compression method IDs for .7z
51 lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
52 history.txt - history of the LZMA SDK
60 Alloc.* - Memory allocation functions
61 Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
62 LzFind.* - Match finder for LZ (LZMA) encoders
63 LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
64 LzHash.h - Additional file for LZ match finder
65 LzmaDec.* - LZMA decoding
66 LzmaEnc.* - LZMA encoding
67 LzmaLib.* - LZMA Library for DLL calling
68 Types.h - Basic types for another .c files
69 Threads.* - The code for multithreading.
71 LzmaLib - LZMA Library (.DLL for Windows)
73 LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
75 Archive - files related to archiving
76 7z - 7z ANSI-C Decoder
80 Common - common files for C++ projects
81 Windows - common files for Windows related code
83 7zip - files related to 7-Zip Project
85 Common - common files for 7-Zip
87 Compress - files related to compression/decompression
90 RangeCoder - Range Coder (special code of compression/decompression)
91 LZMA - LZMA compression/decompression on C++
92 LZMA_Alone - file->file LZMA compression/decompression
93 Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
95 Archive - files related to archiving
97 Common - common files for archive handling
98 7z - 7z C++ Encoder/Decoder
100 Bundles - Modules that are bundles of other modules
102 Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
103 Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
104 Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
106 UI - User Interface files
108 Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll
109 Common - Common UI files
110 Console - Code for console archiver
116 Common - some common files for 7-Zip
117 Compress - files related to compression/decompression
118 LZ - files related to LZ (Lempel-Ziv) compression algorithm
119 LZMA - LZMA compression/decompression
120 LzmaAlone - file->file LZMA compression/decompression
121 RangeCoder - Range Coder (special code of compression/decompression)
125 Compression - files related to compression/decompression
126 LZ - files related to LZ (Lempel-Ziv) compression algorithm
127 LZMA - LZMA compression/decompression
128 RangeCoder - Range Coder (special code of compression/decompression)
131 C/C++ source code of LZMA SDK is part of 7-Zip project.
132 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
134 http://sourceforge.net/projects/sevenzip/
140 - Variable dictionary size (up to 1 GB)
141 - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
142 - Estimated decompressing speed:
143 - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
144 - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
145 - Small memory requirements for decompressing (16 KB + DictionarySize)
146 - Small code size for decompressing: 5-8 KB
148 LZMA decoder uses only integer operations and can be
149 implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
151 Some critical operations that affect the speed of LZMA decompression:
152 1) 32*16 bit integer multiply
153 2) Misspredicted branches (penalty mostly depends from pipeline length)
154 3) 32-bit shift and arithmetic operations
156 The speed of LZMA decompressing mostly depends from CPU speed.
157 Memory speed has no big meaning. But if your CPU has small data cache,
158 overall weight of memory speed will slightly increase.
164 Using LZMA encoder/decoder executable
165 --------------------------------------
167 Usage: LZMA <e|d> inputFile outputFile [<switches>...]
173 b: Benchmark. There are two tests: compressing and decompressing
174 with LZMA method. Benchmark shows rating in MIPS (million
175 instructions per second). Rating value is calculated from
176 measured speed and it is normalized with Intel's Core 2 results.
177 Also Benchmark checks possible hardware errors (RAM
178 errors in most cases). Benchmark uses these settings:
179 (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter.
180 Also you can change the number of iterations. Example for 30 iterations:
182 Default number of iterations is 10.
187 -a{N}: set compression mode 0 = fast, 1 = normal
190 d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB)
191 The maximum value for dictionary size is 1 GB = 2^30 bytes.
192 Dictionary size is calculated as DictionarySize = 2^N bytes.
193 For decompressing file compressed by LZMA method with dictionary
194 size D = 2^N you need about D bytes of memory (RAM).
196 -fb{N}: set number of fast bytes - [5, 273], default: 128
197 Usually big number gives a little bit better compression ratio
198 and slower compression process.
200 -lc{N}: set number of literal context bits - [0, 8], default: 3
201 Sometimes lc=4 gives gain for big files.
203 -lp{N}: set number of literal pos bits - [0, 4], default: 0
204 lp switch is intended for periodical data when period is
205 equal 2^N. For example, for 32-bit (4 bytes)
206 periodical data you can use lp=2. Often it's better to set lc0,
207 if you change lp switch.
209 -pb{N}: set number of pos bits - [0, 4], default: 2
210 pb switch is intended for periodical data
211 when period is equal 2^N.
213 -mf{MF_ID}: set Match Finder. Default: bt4.
214 Algorithms from hc* group doesn't provide good compression
215 ratio, but they often works pretty fast in combination with
218 Memory requirements depend from dictionary size
219 (parameter "d" in table below).
221 MF_ID Memory Description
223 bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing.
224 bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing.
225 bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing.
226 hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing.
228 -eos: write End Of Stream marker. By default LZMA doesn't write
229 eos marker, since LZMA decoder knows uncompressed size
230 stored in .lzma file header.
232 -si: Read data from stdin (it will write End Of Stream marker).
233 -so: Write data to stdout
238 1) LZMA e file.bin file.lzma -d16 -lc0
240 compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
241 and 0 literal context bits. -lc0 allows to reduce memory requirements
245 2) LZMA e file.bin file.lzma -lc0 -lp2
247 compresses file.bin to file.lzma with settings suitable
248 for 32-bit periodical data (for example, ARM or MIPS code).
250 3) LZMA d file.lzma file.bin
252 decompresses file.lzma to file.bin.
255 Compression ratio hints
256 -----------------------
261 To increase the compression ratio for LZMA compressing it's desirable
262 to have aligned data (if it's possible) and also it's desirable to locate
263 data in such order, where code is grouped in one place and data is
264 grouped in other place (it's better than such mixing: code, data, code,
270 You can increase the compression ratio for some data types, using
271 special filters before compressing. For example, it's possible to
272 increase the compression ratio on 5-10% for code for those CPU ISAs:
273 x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
275 You can find C source code of such filters in C/Bra*.* files
277 You can check the compression ratio gain of these filters with such
278 7-Zip commands (example for ARM code):
280 7z a a1.7z a.bin -m0=lzma
282 With filter for little-endian ARM code:
283 7z a a2.7z a.bin -m0=arm -m1=lzma
285 It works in such manner:
286 Compressing = Filter_encoding + LZMA_encoding
287 Decompressing = LZMA_decoding + Filter_decoding
289 Compressing and decompressing speed of such filters is very high,
290 so it will not increase decompressing time too much.
291 Moreover, it reduces decompression time for LZMA_decoding,
292 since compression ratio with filtering is higher.
294 These filters convert CALL (calling procedure) instructions
295 from relative offsets to absolute addresses, so such data becomes more
298 For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
301 LZMA compressed file format
302 ---------------------------
303 Offset Size Description
304 0 1 Special LZMA properties (lc,lp, pb in encoded form)
305 1 4 Dictionary size (little endian)
306 5 8 Uncompressed size (little endian). -1 means unknown size
313 Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
314 If you want to use old interfaces you can download previous version of LZMA SDK
315 from sourceforge.net site.
317 To use ANSI-C LZMA Decoder you need the following files:
318 1) LzmaDec.h + LzmaDec.c + Types.h
319 LzmaUtil/LzmaUtil.c is example application that uses these files.
322 Memory requirements for LZMA decoding
323 -------------------------------------
325 Stack usage of LZMA decoding function for local variables is not
326 larger than 200-400 bytes.
328 LZMA Decoder uses dictionary buffer and internal state structure.
329 Internal state structure consumes
330 state_size = (4 + (1.5 << (lc + lp))) KB
331 by default (lc=3, lp=0), state_size = 16 KB.
334 How To decompress data
335 ----------------------
337 LZMA Decoder (ANSI-C version) now supports 2 interfaces:
338 1) Single-call Decompressing
339 2) Multi-call State Decompressing (zlib-like interface)
341 You must use external allocator:
343 void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
344 void SzFree(void *p, void *address) { p = p; free(address); }
345 ISzAlloc alloc = { SzAlloc, SzFree };
347 You can use p = p; operator to disable compiler warnings.
350 Single-call Decompressing
351 -------------------------
352 When to use: RAM->RAM decompressing
353 Compile files: LzmaDec.h + LzmaDec.c + Types.h
354 Compile defines: no defines
356 - Input buffer: compressed size
357 - Output buffer: uncompressed size
358 - LZMA Internal Structures: state_size (16 KB for default settings)
361 int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
362 const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
363 ELzmaStatus *status, ISzAlloc *alloc);
366 destLen - output data size
368 srcLen - input data size
369 propData - LZMA properties (5 bytes)
370 propSize - size of propData buffer (5 bytes)
371 finishMode - It has meaning only if the decoding reaches output limit (*destLen).
372 LZMA_FINISH_ANY - Decode just destLen bytes.
373 LZMA_FINISH_END - Stream must be finished after (*destLen).
374 You can use LZMA_FINISH_END, when you know that
375 current output buffer covers last bytes of stream.
376 alloc - Memory allocator.
379 destLen - processed output size
380 srcLen - processed input size
385 LZMA_STATUS_FINISHED_WITH_MARK
386 LZMA_STATUS_NOT_FINISHED
387 LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
388 SZ_ERROR_DATA - Data error
389 SZ_ERROR_MEM - Memory allocation error
390 SZ_ERROR_UNSUPPORTED - Unsupported properties
391 SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
393 If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
394 and output value of destLen will be less than output buffer size limit.
396 You can use multiple checks to test data integrity after full decompression:
397 1) Check Result and "status" variable.
398 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
399 3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
400 You must use correct finish mode in that case. */
403 Multi-call State Decompressing (zlib-like interface)
404 ----------------------------------------------------
406 When to use: file->file decompressing
407 Compile files: LzmaDec.h + LzmaDec.c + Types.h
410 - Buffer for input stream: any size (for example, 16 KB)
411 - Buffer for output stream: any size (for example, 16 KB)
412 - LZMA Internal Structures: state_size (16 KB for default settings)
413 - LZMA dictionary (dictionary size is encoded in LZMA properties header)
415 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
416 unsigned char header[LZMA_PROPS_SIZE + 8];
417 ReadFile(inFile, header, sizeof(header)
419 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
422 LzmaDec_Constr(&state);
423 res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
427 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
429 LzmaDec_Init(&state);
433 int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
434 const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
439 4) Free all allocated structures
440 LzmaDec_Free(&state, &g_Alloc);
442 For full code example, look at C/LzmaUtil/LzmaUtil.c code.
448 Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
449 LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
452 - (dictSize * 11.5 + 6 MB) + state_size
454 Lzma Encoder can use two memory allocators:
455 1) alloc - for small arrays.
456 2) allocBig - for big arrays.
458 For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
459 better compression speed. Note that Windows has bad implementation for
461 It's OK to use same allocator for alloc and allocBig.
464 Single-call Compression with callbacks
465 --------------------------------------
467 Check C/LzmaUtil/LzmaUtil.c as example,
469 When to use: file->file decompressing
471 1) you must implement callback structures for interfaces:
477 static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
478 static void SzFree(void *p, void *address) { p = p; MyFree(address); }
479 static ISzAlloc g_Alloc = { SzAlloc, SzFree };
481 CFileSeqInStream inStream;
482 CFileSeqOutStream outStream;
484 inStream.funcTable.Read = MyRead;
485 inStream.file = inFile;
486 outStream.funcTable.Write = MyWrite;
487 outStream.file = outFile;
490 2) Create CLzmaEncHandle object;
494 enc = LzmaEnc_Create(&g_Alloc);
499 3) initialize CLzmaEncProps properties;
501 LzmaEncProps_Init(&props);
503 Then you can change some properties in that structure.
505 4) Send LZMA properties to LZMA Encoder
507 res = LzmaEnc_SetProps(enc, &props);
509 5) Write encoded properties to header
511 Byte header[LZMA_PROPS_SIZE + 8];
512 size_t headerSize = LZMA_PROPS_SIZE;
516 res = LzmaEnc_WriteProperties(enc, header, &headerSize);
517 fileSize = MyGetFileLength(inFile);
518 for (i = 0; i < 8; i++)
519 header[headerSize++] = (Byte)(fileSize >> (8 * i));
520 MyWriteFileAndCheck(outFile, header, headerSize)
522 6) Call encoding function:
523 res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
524 NULL, &g_Alloc, &g_Alloc);
526 7) Destroy LZMA Encoder Object
527 LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
530 If callback function return some error code, LzmaEnc_Encode also returns that code.
533 Single-call RAM->RAM Compression
534 --------------------------------
536 Single-call RAM->RAM Compression is similar to Compression with callbacks,
537 but you provide pointers to buffers instead of pointers to stream callbacks:
539 HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
540 CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
541 ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
545 SZ_ERROR_MEM - Memory allocation error
546 SZ_ERROR_PARAM - Incorrect paramater
547 SZ_ERROR_OUTPUT_EOF - output buffer overflow
548 SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)
555 _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
557 _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for
558 some structures will be doubled in that case.
560 _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.
562 _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.
565 C++ LZMA Encoder/Decoder
566 ~~~~~~~~~~~~~~~~~~~~~~~~
567 C++ LZMA code use COM-like interfaces. So if you want to use it,
568 you can study basics of COM/OLE.
569 C++ LZMA code is just wrapper over ANSI-C code.
573 ~~~~~~~~~~~~~~~~~~~~~~~~
574 If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
575 you must check that you correctly work with "new" operator.
576 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
577 So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
578 operator new(size_t size)
580 void *p = ::malloc(size);
582 throw CNewException();
585 If you use MSCV that throws exception for "new" operator, you can compile without
586 "NewHandler.cpp". So standard exception will be used. Actually some code of
587 7-Zip catches any exception in internal code and converts it to HRESULT code.
588 So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
593 http://www.7-zip.org/sdk.html
594 http://www.7-zip.org/support.html