Table of Contents

DjVu, the image compression format developed at AT&T Labs between 1996 and 2001, stands as a forgotten precursor to modern deep learning-based image compression. Created by a team that included Yann LeCun, Léon Bottou, and Yoshua Bengio—researchers who would go on to foundational work in deep learning—the format pioneered multi-layer encoding, wavelet-based compression, and learned dictionary methods that anticipated neural compression techniques by nearly two decades (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Yann LeCun.” Wikipedia, The Free Encyclopedia).

What is DjVu?

DjVu (pronounced “déjà vu”) is a computer file format designed primarily to store scanned documents, especially those containing combinations of text, line drawings, indexed color images, and photographs (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia). The format emerged from a simple observation: traditional image compression methods like JPEG were optimized for continuous-tone photographs, but performed poorly on document images containing sharp text, white backgrounds, and mixed content.

The DjVu technology was developed from 1996 to 2001 by Yann LeCun, Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, and Yoshua Bengio at AT&T Labs in Red Bank, New Jersey (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia). This team composition is historically significant—two of these researchers (LeCun and Bengio) would later share the 2018 Turing Award with Geoffrey Hinton for their foundational work on deep learning (Wikipedia contributors. “Yann LeCun.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Yoshua Bengio.” Wikipedia, The Free Encyclopedia).

DjVu employs several key technologies to achieve superior compression:

  • Image layer separation: Text and background/images are separated into distinct layers
  • Progressive loading: Images render progressively as data arrives
  • Arithmetic coding: Advanced entropy encoding for maximum compression
  • Lossy compression for bitonal images: Specialized handling of monochrome content (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia)

The developers reported impressive compression ratios: color magazine pages compressed to 40–70 KB, black-and-white technical papers to 15–40 KB, and ancient manuscripts to around 100 KB. A comparable JPEG image typically required 500 KB (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia).

How Does DjVu Work?

At its core, DjVu relies on sophisticated signal processing techniques that share fundamental principles with modern neural networks—particularly the concepts of feature extraction, dictionary learning, and hierarchical encoding.

Multi-Layer Architecture

DjVu divides documents into three distinct layers, each processed with specialized algorithms:

  1. Background layer: Typically a low-resolution color image (100 DPI) compressed using wavelet-based IW44 encoding
  2. Foreground layer: A higher-resolution image containing color information for text and drawings
  3. Mask layer: A high-resolution (300+ DPI) bi-level image that determines whether each pixel belongs to the foreground or background (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia)

This layer separation is functionally similar to how modern convolutional neural networks (CNNs) use attention mechanisms and layer-wise feature extraction (Wikipedia contributors. “Convolutional neural network.” Wikipedia, The Free Encyclopedia). Both approaches decompose complex visual information into semantically meaningful components.

The JB2 Algorithm: Dictionary Learning Before Deep Learning

The most technically significant component of DjVu is its JB2 (Jewish Book 2) algorithm for compressing bi-level images. JB2 is a pattern matching and substitution encoder that creates a dictionary of symbol shapes—typically characters—and encodes each occurrence as a reference to this dictionary plus position information (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “JBIG2.” Wikipedia, The Free Encyclopedia).

This approach is functionally identical to learned dictionary compression, a concept that would later become fundamental to autoencoder-based neural compression. JB2:

JBIG2, a later standard published in 2000, was explicitly designed to be similar to DjVu’s JB2 compression scheme (Wikipedia contributors. “JBIG2.” Wikipedia, The Free Encyclopedia).

Wavelet-Based Encoding

For continuous-tone images, DjVu employs the IW44 wavelet encoder—a discrete wavelet transform (DWT) implementation optimized for progressive transmission (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Wavelet transform.” Wikipedia, The Free Encyclopedia). The wavelet transform decomposes images into multiple resolution levels, capturing both spatial and frequency information—similar to how CNNs use multi-scale feature pyramids (Wikipedia contributors. “Convolutional neural network.” Wikipedia, The Free Encyclopedia).

The discrete wavelet transform operates by passing signals through low-pass and high-pass filters, then subsampling the results (Wikipedia contributors. “Discrete wavelet transform.” Wikipedia, The Free Encyclopedia). This creates approximation coefficients (low-frequency information) and detail coefficients (high-frequency edges and textures). DjVu’s progressive encoding transmits the coarse approximation first, then progressively refines it with detail coefficients (Wikipedia contributors. “Wavelet transform.” Wikipedia, The Free Encyclopedia).

Why Does DjVu Matter for Deep Learning?

DjVu’s significance extends beyond its practical utility as a document format. It represents a bridge between classical signal processing and the learned representations that define modern deep learning.

The Same Minds, Different Tools

The DjVu team at AT&T Labs was simultaneously developing convolutional neural networks. Yann LeCun created LeNet—the seminal CNN architecture for handwritten digit recognition—while working at AT&T Bell Laboratories1 (Wikipedia contributors. “Yann LeCun.” Wikipedia, The Free Encyclopedia). Léon Bottou collaborated on neural network simulators and would later become known for his work on stochastic gradient descent (Wikipedia contributors. “Léon Bottou.” Wikipedia, The Free Encyclopedia). Yoshua Bengio pioneered neural probabilistic language models (Wikipedia contributors. “Yoshua Bengio.” Wikipedia, The Free Encyclopedia).

These researchers approached image compression with the same conceptual toolkit they applied to neural networks:

  • Hierarchical feature extraction: Both CNNs and DjVu process information at multiple scales
  • Dictionary learning: JB2’s symbol dictionary parallels the learned codebooks in vector-quantized neural compression
  • Probabilistic modeling: Arithmetic coding in DjVu uses estimated symbol probabilities, similar to how variational autoencoders model latent distributions (Wikipedia contributors. “Autoencoder.” Wikipedia, The Free Encyclopedia)

From Hand-Crafted to Learned

Modern neural image compression methods, exemplified by papers like “End-to-End Optimized Image Compression” (2017) and “Variational Image Compression with a Scale Hyperprior” (2018), directly extend the principles pioneered in DjVu23:

FeatureDjVu (1998)Modern Neural Compression (2017+)
Layer separationHand-crafted background/mask/foregroundLearned encoder/decoder transforms
DictionaryJB2 symbol dictionaryLearned vector quantization
Entropy codingArithmetic codingLearned entropy models
Multi-resolutionWavelet pyramidMulti-scale neural networks
TrainingHand-tuned parametersEnd-to-end gradient descent
PSNR at 0.5 bpp~30 dB~35+ dB2

The key difference is that DjVu’s components were hand-engineered by compression experts, while modern methods use gradient descent to learn optimal transforms from data—a capability that wasn’t computationally feasible in the 1990s.

The Autoencoder Connection

Autoencoders—neural networks designed to learn compressed representations—are the foundational architecture for neural image compression (Wikipedia contributors. “Autoencoder.” Wikipedia, The Free Encyclopedia). An autoencoder consists of:

  • Encoder: Maps input to a latent (compressed) representation
  • Code: The compressed representation (typically lower-dimensional)
  • Decoder: Reconstructs the input from the code

DjVu’s architecture maps precisely onto this structure:

  • The foreground/background separation acts as a content-aware encoder
  • The symbol dictionary and wavelet coefficients form the compressed code
  • The renderer reconstructs the document from these components

The critical innovation of modern neural compression is replacing hand-designed transforms with learned convolutional networks, allowing the system to discover optimal representations for specific data distributions2 (Wikipedia contributors. “Autoencoder.” Wikipedia, The Free Encyclopedia).

Historical Context: The Compression Wars

DjVu emerged during a pivotal period in image compression history. The JPEG standard, based on the discrete cosine transform (DCT), dominated continuous-tone image compression but produced artifacts around text and sharp edges (Wikipedia contributors. “Image compression.” Wikipedia, The Free Encyclopedia). JPEG 2000, developed from 1997 to 2000, adopted wavelet transforms similar to those in DjVu, but suffered from higher computational complexity (Wikipedia contributors. “Image compression.” Wikipedia, The Free Encyclopedia).

For document compression, the primary alternatives were:

DjVu achieved superior compression by combining the strengths of multiple approaches: wavelet transforms for photographs, pattern matching for text, and arithmetic coding for entropy reduction. This hybrid approach anticipated the current trend in neural compression toward end-to-end optimized systems that replace multiple hand-designed components with unified learned models23.

Technical Legacy: What DjVu Got Right

Several DjVu innovations remain relevant to modern compression research:

Content-Adaptive Encoding

DjVu’s automatic separation of documents into text and image regions—implemented through sophisticated segmentation algorithms—prefigured the content-aware approaches in modern variable-rate neural compression. The system adapted its encoding strategy based on local image characteristics, allocating bits where they mattered most perceptually (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia).

Progressive Transmission

The format’s progressive loading capability—rendering a low-quality preview that refines as more data arrives—directly influenced later standards. This property is now standard in neural compression systems that optimize for rate-distortion trade-offs at multiple quality levels2.

Open Implementation

The DjVuLibre open-source implementation, maintained by Léon Bottou since 2002, ensured the format’s longevity and influenced subsequent open compression efforts (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Léon Bottou.” Wikipedia, The Free Encyclopedia). The Internet Archive adopted DjVu for distributing scanned documents, demonstrating real-world viability at scale (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia).

The Forgotten Format, Foundational Ideas

Despite its technical merits, DjVu largely failed to achieve mainstream adoption outside specialized applications. Several factors contributed:

  1. Browser support: DjVu required plugins that major browsers eventually discontinued
  2. PDF dominance: Adobe’s format achieved ubiquity through bundling and marketing
  3. NPAPI deprecation: Around 2015, major browsers stopped supporting the NPAPI framework that DjVu plugins required (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia)

However, the ideas pioneered in DjVu lived on. When Google researchers published “End-to-End Optimized Image Compression” in 2017, demonstrating that neural networks could outperform JPEG and JPEG 2000, they were extending principles that the DjVu team had explored two decades earlier—now with the computational resources and algorithmic advances to fully realize the vision of learned compression2.

Modern Neural Compression: DjVu’s Successors

Contemporary neural image compression has achieved remarkable results by fully embracing the learning-based paradigm that DjVu could only partially implement:

  • 2016: Ballé et al. introduced end-to-end optimized compression using nonlinear transforms and uniform quantization2
  • 2018: The scale hyperprior model incorporated side information to capture spatial dependencies, achieving state-of-the-art results3
  • 2020+: Generative adversarial networks and diffusion models enabled perceptually optimized compression that prioritizes human visual perception over pixel-level accuracy (Wikipedia contributors. “Image compression.” Wikipedia, The Free Encyclopedia)

These systems routinely achieve 20-40% bitrate reductions compared to JPEG 2000, with subjective quality improvements that are even more dramatic23. The key enabler has been the availability of large-scale datasets and GPU computing—resources unavailable to the DjVu team in the late 1990s.

Frequently Asked Questions

Q: What compression ratio does DjVu achieve compared to JPEG? A: DjVu typically achieves 5–10× smaller file sizes than JPEG for scanned documents. For example, a color magazine page compresses to 40–70 KB in DjVu compared to 500 KB for comparable JPEG quality (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia).

Q: Who created DjVu and what else did they work on? A: DjVu was created by Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, Paul Howard, and Patrice Simard at AT&T Labs. LeCun and Bengio later shared the 2018 Turing Award with Geoffrey Hinton for their work on deep learning. Bottou, while not a Turing Award recipient, made foundational contributions to stochastic gradient descent and machine learning optimization (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Yann LeCun.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “Yoshua Bengio.” Wikipedia, The Free Encyclopedia).

Q: Is DjVu still used today? A: While browser plugin support ended around 2015, DjVu remains in use for digital libraries and document archives, particularly through the Internet Archive. The DjVuLibre open-source library continues to be maintained (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia).

Q: How does modern neural compression differ from DjVu? A: Modern neural compression uses gradient descent to learn encoding and decoding transforms from data, while DjVu used hand-engineered algorithms. Both approaches share concepts like multi-scale processing and dictionary-based encoding, but neural methods achieve superior performance through end-to-end optimization23.

Q: What is the JB2 algorithm? A: JB2 (Jewish Book 2) is DjVu’s bi-level image compression algorithm. It creates a dictionary of symbol shapes and encodes document pages as references to these shapes plus position information—similar to learned dictionary methods in modern compression (Wikipedia contributors. “DjVu.” Wikipedia, The Free Encyclopedia, Wikipedia contributors. “JBIG2.” Wikipedia, The Free Encyclopedia).

Footnotes

  1. LeCun, Y., et al. (1998). “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE, 86(11), 2278–2324.

  2. Ballé, J., Laparra, V., & Simoncelli, E.P. (2017). “End-to-end Optimized Image Compression.” International Conference on Learning Representations (ICLR). arXiv

    .01704. 2 3 4 5 6 7 8 9

  3. Ballé, J., et al. (2018). “Variational Image Compression with a Scale Hyperprior.” International Conference on Learning Representations (ICLR). arXiv

    .01436. 2 3 4 5

Sources

  1. Wikipedia contributors. "DjVu." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  2. Wikipedia contributors. "Wavelet transform." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  3. Wikipedia contributors. "Discrete wavelet transform." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  4. Wikipedia contributors. "Convolutional neural network." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  5. Wikipedia contributors. "Yann LeCun." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  6. Wikipedia contributors. "Yoshua Bengio." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  7. Wikipedia contributors. "Léon Bottou." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  8. Wikipedia contributors. "Autoencoder." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  9. Wikipedia contributors. "Image compression." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24
  10. Wikipedia contributors. "JBIG2." *Wikipedia, The Free Encyclopedia*analysisaccessed 2026-04-24

Enjoyed this article?

Stay updated with our latest insights on AI and technology.