Skip to content

A fast, memory-efficient Rust tool that converts comic book archives (CBR, CBZ, CBT) to PDF files with parallel processing.

License

Notifications You must be signed in to change notification settings

3axap4eHko/cb2pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CB2PDF - Comic Book Archive to PDF Converter

A fast, memory-efficient Rust tool that converts comic book archives (CBR, CBZ, CBT) to PDF files with parallel processing.

Features

  • Multi-format support: CBR (RAR), CBZ (ZIP), and CBT (TAR.GZ) archives
  • Intelligent file detection: Uses magic bytes with extension fallback
  • Directory grouping: Automatically groups files by directory and processes only the first file (others treated as parts)
  • Parallel processing: Uses 4 threads for simultaneous volume conversion
  • Memory efficient: Minimal memory footprint with streaming image processing
  • Quality preservation:
    • No lossy re-encoding of JPEG files
    • Preserves PNG transparency (alpha channels)
    • Direct embedding of original image data
  • Automatic PDF naming: Creates PDFs named after the source directory
  • MacOS compatibility: Filters out ._ and __MACOSX metadata files

Installation

Prerequisites

  • Rust 1.70+ with Cargo
  • System dependencies for archive formats:
    • unrar library for CBR files
    • zlib for ZIP/CBZ files

Build from source

git clone <repository-url>
cd cb2pdf
cargo build --release

CI/CD

The project includes comprehensive GitHub Actions workflows:

Continuous Integration (.github/workflows/ci.yml)

  • Multi-platform builds: Linux, Windows, macOS
  • Code quality: cargo fmt, cargo clippy
  • Security audit: cargo audit
  • Code coverage: Using cargo-tarpaulin
  • Artifact uploads: Build binaries for each platform

Release Automation (.github/workflows/release.yml)

  • Tagged releases: Triggered on version tags (e.g., v1.0.0)
  • Cross-platform binaries: Automatic GitHub releases with binaries
  • Installation instructions: Included in release notes

Dependency Management

  • Dependabot: Weekly dependency updates
  • Security monitoring: Automated vulnerability scanning

Usage

cb2pdf [OPTIONS] <PATTERN>

Options

  • -t, --threads <NUM>: Number of threads to use (1 to number of CPU cores)
  • -h, --help: Show help information

Examples

Process all comic book archives in subdirectories:

cb2pdf "comics/*/*.cbr"

Process specific volume directories:

cb2pdf "files/Volume*/*.cbr"

Use 8 threads for faster processing:

cb2pdf -t 8 "files/*/*.cbr"

Single-threaded processing:

cb2pdf --threads 1 "files/*/*.cbr"

The program will:

  1. Find all CBR/CBZ/CBT files matching the pattern
  2. Group them by directory name
  3. Process only the first file from each directory (treating others as parts)
  4. Create PDFs named after each directory (e.g., Volume 50.pdf)
  5. Use parallel threads for processing (default: 4 threads or number of CPU cores, whichever is smaller)

Architecture

Archive Format Detection

  • Magic bytes: Primary detection using file headers
    • RAR: Rar!
    • ZIP: PK
    • TAR.GZ: \x1f\x8b
  • Extension fallback: Falls back to file extension if magic bytes fail

Memory Efficiency

  • Uses ImageReader::into_dimensions() to get image size without full decoding
  • Processes images one at a time rather than loading all into memory
  • Direct file data embedding with RawImage::decode_from_bytes()

Quality Preservation

  • JPEG files: Uses original file bytes directly (no re-compression)
  • PNG files: Preserves transparency and original quality
  • Other formats: Converts to high-quality JPEG (95% quality) only when necessary

PDF Creation

  • Creates pages sized to match image dimensions (96 DPI)
  • Uses XObjectTransform for proper image scaling and positioning
  • Embeds images as XObjects for efficient PDF structure

Performance

  • Parallel processing: Configurable threads (1 to number of CPU cores)
  • Memory usage: ~50-100MB per thread regardless of archive size
  • Speed: Processes typical 20-page volumes in 10-30 seconds
  • Output size: Reasonable expansion (2.7MB CBR → 7.2MB PDF is typical)
  • Thread safety: Validates thread count and warns about excessive values

Dependencies

[dependencies]
glob = "0.3.2"                                    # File pattern matching
image = "0.25.6"                                  # Image processing
printpdf = { version = "0.8.2", features = ["jpeg"] } # PDF generation
tempfile = "3.20.0"                               # Temporary directories
unrar = "0.5.8"                                   # RAR extraction
zip = "2.2.0"                                     # ZIP extraction  
tar = "0.4.41"                                    # TAR extraction
flate2 = "1.0.34"                                 # GZIP decompression
rayon = "1.7.0"                                   # Parallel processing
clap = { version = "4.4.0", features = ["derive"] } # Command-line argument parsing
num_cpus = "1.16.0"                               # CPU core detection

File Structure

src/
├── main.rs                 # Main application logic
│   ├── Archive format detection
│   ├── Multi-format extraction (CBR/CBZ/CBT)
│   ├── Directory grouping and parallel processing
│   └── Memory-efficient PDF creation
└── Cargo.toml              # Dependencies and build config

Error Handling

  • Graceful failures: Individual volume failures don't stop batch processing
  • Detailed logging: Shows progress and warnings for each volume
  • Format validation: Validates archive formats before processing
  • Image filtering: Skips invalid images and MacOS metadata

Limitations

  • RAR dependency: Requires unrar library for CBR files
  • Memory usage: Still loads full image data during PDF creation (printpdf limitation)
  • Single archive per directory: Assumes first file contains all content
  • No streaming PDF: PDF must be fully constructed before writing (printpdf limitation)

Example Output

$ cb2pdf -t 8 "files/*/*.cbr"
Using 8 threads for parallel processing (CPU cores: 32)
Found 44 volumes to process
  Volume 28: "files/Volume 28/245.cbr"
  Volume 29: "files/Volume 29/254.cbr"
  ...
Processing Volume 50: "files/Volume 50/464.cbr"
  Detected format: Zip
  Found 17 images
Creating PDF with 17 images, memory-efficient processing
✅ Successfully processed Volume 50
🎉 All volumes processed!

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A fast, memory-efficient Rust tool that converts comic book archives (CBR, CBZ, CBT) to PDF files with parallel processing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages