Customizing Your Output

Control what Capcat collects and how it's organized.

What You'll Learn

  • Control article counts
  • Generate HTML for browser reading
  • Download additional media files
  • Customize output locations
  • Update existing articles

Article Count Control

Set Count Per Fetch

# Fetch 10 articles
capcat fetch hn --count 10

# Fetch 50 articles
capcat bundle tech --count 50

Guidelines:

  • Daily routine: 10-20 articles
  • Weekly collection: 30-50 articles
  • Research deep dive: 50+ articles

Default Counts

Sources and bundles have defaults (usually 30).

Use defaults:

capcat fetch hn  # Uses 30

Override:

capcat fetch hn --count 15  # Uses 15

HTML Generation

Why Generate HTML?

Markdown is great for archiving, but HTML is better for reading:

  • Browse in web browser
  • Click between articles
  • Professional styling
  • Navigate comments easily

Generate HTML

Add --html flag:

capcat fetch hn --count 10 --html
capcat bundle tech --html
capcat single URL --html

HTML Output Structure

With --html flag:

Article_Folder/ ├── article.md # Markdown version ├── html/ │ ├── article.html # Browsable HTML │ └── comments.html # Comments (if available) └── images/ # Downloaded images

Open in browser:

open Article_Folder/html/article.html

Media Download Control

Images (Default)

Images are downloaded and embedded by default. No flag required:

capcat fetch hn --count 10
# Images automatically downloaded to images/

Disable image downloads globally in Config/Global-settings.yaml:

processing:
  download_images: false

PDF Downloads

Use --pdfs to download PDF files linked in articles, independently of other media:

capcat fetch nature --count 10 --pdfs

PDFs are downloaded asynchronously in the background. A progress line appears only when the count changes:

Downloading PDFs: 3 active, 2 queued, 8 completed

PDF size and count limits:

pdf:
  max_pdf_size_bytes: 20971520   # Skip PDFs larger than 20 MB
  max_pdf_per_article: 10        # Cap per article

All Media (Videos, Audio, Documents)

Use --media to download everything — images, PDFs, videos, audio, and documents:

capcat fetch nature --count 10 --media

--media is a superset of --pdfs. Using --media enables PDF downloads automatically.

Per-Source PDF Defaults

Some sources (Nature, IEEE, Scientific American) have PDF-heavy content. Their default PDF behaviour is configured in their config.yaml via a media: block:

media:
  download_pdfs: true          # Enable PDFs for this source by default
  max_pdf_size_mb: 10          # Source-level size cap (overrides global)

The resolution order from highest to lowest priority:

  1. CLI flag (--pdfs / --media / --no-pdfs)
  2. TUI prompt answer (Yes / No / Source defaults)
  3. Per-source media: block in config.yaml
  4. Global Global-settings.yaml media.download_pdfs

Media Storage

Article_Folder/ ├── article.md ├── images/ # Images (default) └── files/ # PDFs, videos, audio (--pdfs or --media)

Output Location

Default Locations

Bundle and Fetch (multiple articles):

News/news_DD-MM-YYYY/Source_DD-MM-YYYY/

Single article:

Capcats/DD-MM-YYYY-Article-Title/

Custom Output Location

Use --output flag:

# Custom directory
capcat fetch hn --count 10 --output /path/to/output

# Relative path
capcat single URL --output ~/Articles

# Current directory
capcat fetch bbc --output .

Updating Existing Articles

Update Mode

Re-fetch and overwrite existing articles:

capcat fetch hn --count 10 --update
capcat bundle tech --update
capcat single URL --update

Use when:

  • Article content was updated
  • Want newer comments
  • Previous fetch failed/incomplete

Behavior:

  • Overwrites existing article.md
  • Re-downloads media
  • Updates timestamps

Configuration File Customization

Global Settings

Global settings live in Config/Global-settings.yaml inside your vault. Generate it with:

capcat settings --force

Then edit directly:

vim Config/Global-settings.yaml

Common customizations:

processing:
  max_workers: 16           # Parallel downloads (faster)
  download_images: true     # On by default

network:
  connect_timeout: 15       # Connection timeout
  read_timeout: 45          # Read timeout

logging:
  console_level: "INFO"     # Log verbosity (DEBUG, INFO, WARNING, ERROR)

Save and settings apply to all future fetches — no restart required.

Source Selection

capcat.yml at the vault root controls which sources run and how many articles each fetches:

sources:
  - name: hn
    article_count: 10
  - name: bbc
    article_count: 5
bundles: {}

Per-Source Customization

Edit source config:

vim Config/sources/active/config_driven/configs/sourcename.yaml

Customize:

timeout: 15.0              # Source-specific timeout
rate_limit: 2.0            # Slower rate limiting

Common Customization Scenarios

Quick Daily Collection

Fast, no frills:

capcat bundle tech --count 15
# No --html (faster)
# No --media (smaller)
# Default output location

Weekend Reading Preparation

Full featured:

capcat bundle science --count 40 --html --media
# HTML for browser reading
# Media for offline viewing
# Large count for weekend

Targeted Research

Custom location, specific sources:

capcat fetch nature,scientificamerican --count 30 \
  --html --media --output ~/Research/Climate

Archive Update

Refresh existing collection:

capcat bundle news --count 20 --update --html
# Updates existing articles
# Regenerates HTML

Logging Control

File Logging

Save detailed logs:

capcat --log-file capcat.log fetch hn --count 10

Log includes:

  • Detailed processing steps
  • Network requests
  • Errors and warnings
  • Timing information

Verbosity Levels

Normal (default):

capcat fetch hn --count 10

Verbose:

capcat --verbose fetch hn --count 10
capcat -V fetch hn --count 10

Quiet (errors only):

capcat --quiet fetch hn --count 10
capcat -q fetch hn --count 10

Output Organization Tips

By Date:

  • Default organization by date
  • Easy to find recent articles
  • Automatic cleanup by deleting old date folders

By Source:

  • Each source gets its own folder
  • Browse by source preference
  • Easy to see source volume

By Topic:

  • Use custom --output for topics
  • Create topic directories manually
  • Organize by project/research area

Performance vs Quality

Fast Collection (no frills):

capcat bundle tech --count 10
# ~1-2 minutes

Balanced:

capcat bundle tech --count 20 --html
# ~3-5 minutes

Full Archive:

capcat bundle tech --count 50 --html --media
# ~10-15 minutes

Environment Variables

Override defaults without editing config:

# More parallel workers
export CAPCAT_PROCESSING_MAX_WORKERS=16
capcat bundle tech

# Custom timeout
export CAPCAT_NETWORK_CONNECT_TIMEOUT=20
capcat fetch bbc --count 30

Next Steps

Deep customization:

Quick Reference

# Article count
--count N

# Generate HTML
--html

# Download all media
--media

# Custom output
--output DIR

# Update existing
--update

# File logging
--log-file FILE

# Verbosity
--verbose / -V
--quiet / -q

# Combined example
capcat bundle tech --count 20 --html --output ~/News