The bzip2 command in Linux: compress with a higher ratio than gzip

Introduction

In the world of Linux system administration, file compression is a daily task that allows saving disk space and speeding up transfers. Although gzip is the most well-known tool thanks to its speed, there is an alternative that offers a superior compression ratio: bzip2. This post explores in detail how bzip2 works, its syntax, advantages and disadvantages compared to gzip, and when it is the best option for your projects.

What is bzip2?

bzip2 is a file compressor based on the Burrows‑Wheeler transform algorithm and Huffman coding. It was developed by Julian Seward in the late 1990s and is distributed under the liberal BSD-style license. Its main feature is achieving a significantly higher compression ratio than gzip, at the cost of somewhat greater time and resource consumption.

Installing bzip2

Most modern Linux distributions include bzip2 in their default repositories. To install it, simply use the corresponding package manager:

  • On Debian/Ubuntu: sudo apt-get install bzip2
  • On Red Hat/CentOS: sudo yum install bzip2 or sudo dnf install bzip2
  • On Arch Linux: sudo pacman -S bzip2

After installation, the bzip2 command will be available in any terminal.

Basic Syntax

The use of bzip2 is very similar to that of gzip. The simplest way to compress a file is:

bzip2 filename

This command creates a compressed file with the extension .bz2 and removes the original (unless the -k option is specified to keep it). To decompress, use:

bzip2 -d filename.bz2

or the alias bunzip2:

bunzip2 filename.bz2

Practical Examples

Imagine we want to compress a large log file named app.log:

bzip2 -9 app.log

The level -9 indicates maximum compression (the default is already quite high). The result will be app.log.bz2.

If we want to keep the original, we add -k:

bzip2 -9 -k app.log

To compress multiple files at once, we can use a wildcard:

bzip2 *.log

And to decompress an entire directory:

bzip2 -d *.bz2

Advantages and Disadvantages Compared to gzip

  • Advantages of bzip2:
    • Typical compression ratio is between 10% and 30% better than gzip, especially for text files and source code.
    • Compression is stable and reproducible; the same level always yields the same size.
    • Free of patents and with a permissive license.
  • Disadvantages of bzip2:
    • Compression and decompression speed is considerably lower than that of gzip, which can be a bottleneck in scripts requiring speed.
    • Higher memory consumption during the process (around several megabytes).
    • Less availability on very minimal embedded systems.

When to Use bzip2?

bzip2 shines in scenarios where space is more critical than time. Some common use cases include:

  • Archiving source code or documentation that will be stored for long periods.
  • Creating backups where size reduction is prioritized over restore speed.
  • Distributing software packages where you want to offer the smallest possible file to users.
  • Log files that are compressed for long-term retention and are rarely read.

Conversely, if you need to compress and decompress on the fly, such as in processing pipelines or in real time, gzip (or even more modern tools like zstd or xz) may be more suitable.

Tips for Optimizing Compression with bzip2

  • Use the -9 level

    This post is also available in ESPAÑOL.

Leave a Reply

Your email address will not be published. Required fields are marked *

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional para Francesc Roig francesc@vivaldi.net .