Table of Contents

DjVu, the image compression format developed at AT&T Labs between 1996 and 2001, stands as a forgotten precursor to modern deep learning-based image compression. Created by a team that included Yann LeCun, Léon Bottou, and Yoshua Bengio—researchers who would go on to foundational work in deep learning—the format pioneered multi-layer encoding, wavelet-based compression, and learned dictionary methods that anticipated neural compression techniques by nearly two decades12.

What is DjVu?

DjVu (pronounced “déjà vu”) is a computer file format designed primarily to store scanned documents, especially those containing combinations of text, line drawings, indexed color images, and photographs1. The format emerged from a simple observation: traditional image compression methods like JPEG were optimized for continuous-tone photographs, but performed poorly on document images containing sharp text, white backgrounds, and mixed content.

The DjVu technology was developed from 1996 to 2001 by Yann LeCun, Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, and Yoshua Bengio at AT&T Labs in Red Bank, New Jersey1. This team composition is historically significant—two of these researchers (LeCun and Bengio) would later share the 2018 Turing Award with Geoffrey Hinton for their foundational work on deep learning23.

DjVu employs several key technologies to achieve superior compression:

  • Image layer separation: Text and background/images are separated into distinct layers
  • Progressive loading: Images render progressively as data arrives
  • Arithmetic coding: Advanced entropy encoding for maximum compression
  • Lossy compression for bitonal images: Specialized handling of monochrome content1

The developers reported impressive compression ratios: color magazine pages compressed to 40–70 KB, black-and-white technical papers to 15–40 KB, and ancient manuscripts to around 100 KB. A comparable JPEG image typically required 500 KB1.

How Does DjVu Work?

At its core, DjVu relies on sophisticated signal processing techniques that share fundamental principles with modern neural networks—particularly the concepts of feature extraction, dictionary learning, and hierarchical encoding.

Multi-Layer Architecture

DjVu divides documents into three distinct layers, each processed with specialized algorithms:

  1. Background layer: Typically a low-resolution color image (100 DPI) compressed using wavelet-based IW44 encoding
  2. Foreground layer: A higher-resolution image containing color information for text and drawings
  3. Mask layer: A high-resolution (300+ DPI) bi-level image that determines whether each pixel belongs to the foreground or background1

This layer separation is functionally similar to how modern convolutional neural networks (CNNs) use attention mechanisms and layer-wise feature extraction4. Both approaches decompose complex visual information into semantically meaningful components.

The JB2 Algorithm: Dictionary Learning Before Deep Learning

The most technically significant component of DjVu is its JB2 (Jewish Book 2) algorithm for compressing bi-level images. JB2 is a pattern matching and substitution encoder that creates a dictionary of symbol shapes—typically characters—and encodes each occurrence as a reference to this dictionary plus position information15.

This approach is functionally identical to learned dictionary compression, a concept that would later become fundamental to autoencoder-based neural compression. JB2:

  • Identifies repeated patterns in the input
  • Stores unique patterns in a shared dictionary
  • Encodes positions relative to previously encoded symbols
  • Uses context-dependent arithmetic coding (the MQ coder)5

JBIG2, a later standard published in 2000, was explicitly designed to be similar to DjVu’s JB2 compression scheme5.

Wavelet-Based Encoding

For continuous-tone images, DjVu employs the IW44 wavelet encoder—a discrete wavelet transform (DWT) implementation optimized for progressive transmission16. The wavelet transform decomposes images into multiple resolution levels, capturing both spatial and frequency information—similar to how CNNs use multi-scale feature pyramids4.

The discrete wavelet transform operates by passing signals through low-pass and high-pass filters, then subsampling the results7. This creates approximation coefficients (low-frequency information) and detail coefficients (high-frequency edges and textures). DjVu’s progressive encoding transmits the coarse approximation first, then progressively refines it with detail coefficients6.

Why Does DjVu Matter for Deep Learning?

DjVu’s significance extends beyond its practical utility as a document format. It represents a bridge between classical signal processing and the learned representations that define modern deep learning.

The Same Minds, Different Tools

The DjVu team at AT&T Labs was simultaneously developing convolutional neural networks. Yann LeCun created LeNet—the seminal CNN architecture for handwritten digit recognition—while working at AT&T Bell Laboratories28. Léon Bottou collaborated on neural network simulators and would later become known for his work on stochastic gradient descent9. Yoshua Bengio pioneered neural probabilistic language models3.

These researchers approached image compression with the same conceptual toolkit they applied to neural networks:

  • Hierarchical feature extraction: Both CNNs and DjVu process information at multiple scales
  • Dictionary learning: JB2’s symbol dictionary parallels the learned codebooks in vector-quantized neural compression
  • Probabilistic modeling: Arithmetic coding in DjVu uses estimated symbol probabilities, similar to how variational autoencoders model latent distributions10

From Hand-Crafted to Learned

Modern neural image compression methods, exemplified by papers like “End-to-End Optimized Image Compression” (2017) and “Variational Image Compression with a Scale Hyperprior” (2018), directly extend the principles pioneered in DjVu1112:

FeatureDjVu (1998)Modern Neural Compression (2017+)
Layer separationHand-crafted background/mask/foregroundLearned encoder/decoder transforms
DictionaryJB2 symbol dictionaryLearned vector quantization
Entropy codingArithmetic codingLearned entropy models
Multi-resolutionWavelet pyramidMulti-scale neural networks
TrainingHand-tuned parametersEnd-to-end gradient descent
PSNR at 0.5 bpp~30 dB~35+ dB11

The key difference is that DjVu’s components were hand-engineered by compression experts, while modern methods use gradient descent to learn optimal transforms from data—a capability that wasn’t computationally feasible in the 1990s.

The Autoencoder Connection

Autoencoders—neural networks designed to learn compressed representations—are the foundational architecture for neural image compression10. An autoencoder consists of:

  • Encoder: Maps input to a latent (compressed) representation
  • Code: The compressed representation (typically lower-dimensional)
  • Decoder: Reconstructs the input from the code

DjVu’s architecture maps precisely onto this structure:

  • The foreground/background separation acts as a content-aware encoder
  • The symbol dictionary and wavelet coefficients form the compressed code
  • The renderer reconstructs the document from these components

The critical innovation of modern neural compression is replacing hand-designed transforms with learned convolutional networks, allowing the system to discover optimal representations for specific data distributions1011.

Historical Context: The Compression Wars

DjVu emerged during a pivotal period in image compression history. The JPEG standard, based on the discrete cosine transform (DCT), dominated continuous-tone image compression but produced artifacts around text and sharp edges13. JPEG 2000, developed from 1997 to 2000, adopted wavelet transforms similar to those in DjVu, but suffered from higher computational complexity13.

For document compression, the primary alternatives were:

  • Fax Group 4: The standard for bi-level fax transmission, offering modest compression
  • JBIG: The predecessor to JBIG2, using arithmetic coding but without symbol dictionaries
  • PDF: Primarily a document container format, with compression as a secondary concern15

DjVu achieved superior compression by combining the strengths of multiple approaches: wavelet transforms for photographs, pattern matching for text, and arithmetic coding for entropy reduction. This hybrid approach anticipated the current trend in neural compression toward end-to-end optimized systems that replace multiple hand-designed components with unified learned models1112.

Technical Legacy: What DjVu Got Right

Several DjVu innovations remain relevant to modern compression research:

Content-Adaptive Encoding

DjVu’s automatic separation of documents into text and image regions—implemented through sophisticated segmentation algorithms—prefigured the content-aware approaches in modern variable-rate neural compression. The system adapted its encoding strategy based on local image characteristics, allocating bits where they mattered most perceptually1.

Progressive Transmission

The format’s progressive loading capability—rendering a low-quality preview that refines as more data arrives—directly influenced later standards. This property is now standard in neural compression systems that optimize for rate-distortion trade-offs at multiple quality levels11.

Open Implementation

The DjVuLibre open-source implementation, maintained by Léon Bottou since 2002, ensured the format’s longevity and influenced subsequent open compression efforts19. The Internet Archive adopted DjVu for distributing scanned documents, demonstrating real-world viability at scale1.

💡 Technical Insight: DjVu’s JB2 algorithm achieves compression ratios of 10

or better for text documents by recognizing that printed characters are instances of a finite alphabet, even if pixel-level differences exist between occurrences. This insight—that semantic equivalence matters more than pixel identity—mirrors the probabilistic approach of modern generative models.

The Forgotten Format, Foundational Ideas

Despite its technical merits, DjVu largely failed to achieve mainstream adoption outside specialized applications. Several factors contributed:

  1. Browser support: DjVu required plugins that major browsers eventually discontinued
  2. PDF dominance: Adobe’s format achieved ubiquity through bundling and marketing
  3. NPAPI deprecation: Around 2015, major browsers stopped supporting the NPAPI framework that DjVu plugins required1

However, the ideas pioneered in DjVu lived on. When Google researchers published “End-to-End Optimized Image Compression” in 2017, demonstrating that neural networks could outperform JPEG and JPEG 2000, they were extending principles that the DjVu team had explored two decades earlier—now with the computational resources and algorithmic advances to fully realize the vision of learned compression11.

⚠️ Historical Note: The connection between DjVu and modern deep learning is not merely coincidental—it reflects the continuity of research at AT&T Labs, where the same researchers who developed CNNs and backpropagation algorithms simultaneously applied machine learning insights to signal processing problems.

Modern Neural Compression: DjVu’s Successors

Contemporary neural image compression has achieved remarkable results by fully embracing the learning-based paradigm that DjVu could only partially implement:

  • 2016: Ballé et al. introduced end-to-end optimized compression using nonlinear transforms and uniform quantization11
  • 2018: The scale hyperprior model incorporated side information to capture spatial dependencies, achieving state-of-the-art results12
  • 2020+: Generative adversarial networks and diffusion models enabled perceptually optimized compression that prioritizes human visual perception over pixel-level accuracy13

These systems routinely achieve 20-40% bitrate reductions compared to JPEG 2000, with subjective quality improvements that are even more dramatic1112. The key enabler has been the availability of large-scale datasets and GPU computing—resources unavailable to the DjVu team in the late 1990s.

Frequently Asked Questions

Q: What compression ratio does DjVu achieve compared to JPEG? A: DjVu typically achieves 5–10× smaller file sizes than JPEG for scanned documents. For example, a color magazine page compresses to 40–70 KB in DjVu compared to 500 KB for comparable JPEG quality1.

Q: Who created DjVu and what else did they work on? A: DjVu was created by Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, Paul Howard, and Patrice Simard at AT&T Labs. LeCun and Bengio later shared the 2018 Turing Award with Geoffrey Hinton for their work on deep learning. Bottou, while not a Turing Award recipient, made foundational contributions to stochastic gradient descent and machine learning optimization123.

Q: Is DjVu still used today? A: While browser plugin support ended around 2015, DjVu remains in use for digital libraries and document archives, particularly through the Internet Archive. The DjVuLibre open-source library continues to be maintained1.

Q: How does modern neural compression differ from DjVu? A: Modern neural compression uses gradient descent to learn encoding and decoding transforms from data, while DjVu used hand-engineered algorithms. Both approaches share concepts like multi-scale processing and dictionary-based encoding, but neural methods achieve superior performance through end-to-end optimization1112.

Q: What is the JB2 algorithm? A: JB2 (Jewish Book 2) is DjVu’s bi-level image compression algorithm. It creates a dictionary of symbol shapes and encodes document pages as references to these shapes plus position information—similar to learned dictionary methods in modern compression15.


Sources

Footnotes

  1. Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/DjVu 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

  2. Wikipedia contributors. “Yann LeCun.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Yann_LeCun 2 3 4

  3. Wikipedia contributors. “Yoshua Bengio.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Yoshua_Bengio 2 3

  4. Wikipedia contributors. “Convolutional neural network.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Convolutional_neural_network 2

  5. Wikipedia contributors. “JBIG2.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/JBIG2 2 3 4 5

  6. Wikipedia contributors. “Wavelet transform.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Wavelet_transform 2

  7. Wikipedia contributors. “Discrete wavelet transform.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Discrete_wavelet_transform

  8. LeCun, Y., et al. (1998). “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE, 86(11), 2278–2324.

  9. Wikipedia contributors. “Léon Bottou.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/L%C3%A9on_Bottou 2

  10. Wikipedia contributors. “Autoencoder.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Autoencoder 2 3

  11. Ballé, J., Laparra, V., & Simoncelli, E.P. (2017). “End-to-end Optimized Image Compression.” International Conference on Learning Representations (ICLR). arXiv

    .01704. 2 3 4 5 6 7 8 9

  12. Ballé, J., et al. (2018). “Variational Image Compression with a Scale Hyperprior.” International Conference on Learning Representations (ICLR). arXiv

    .01436. 2 3 4 5

  13. Wikipedia contributors. “Image compression.” Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Image_compression 2 3

Enjoyed this article?

Stay updated with our latest insights on AI and technology.