Customizing Your Output

Control what Capcat collects and how it's organized.

What You'll Learn

Control article counts
Generate HTML for browser reading
Download additional media files
Customize output locations
Update existing articles

Article Count Control

Set Count Per Fetch

# Fetch 10 articles
capcat fetch hn --count 10

# Fetch 50 articles
capcat bundle tech --count 50

Guidelines:

Daily routine: 10-20 articles
Weekly collection: 30-50 articles
Research deep dive: 50+ articles

Default Counts

Sources and bundles have defaults (usually 30).

Use defaults:

capcat fetch hn  # Uses 30

Override:

capcat fetch hn --count 15  # Uses 15

HTML Generation

Why Generate HTML?

Markdown is great for archiving, but HTML is better for reading:

Browse in web browser
Click between articles
Professional styling
Navigate comments easily

Generate HTML

Add --html flag:

capcat fetch hn --count 10 --html
capcat bundle tech --html
capcat single URL --html

HTML Output Structure

With --html flag:

Article_Folder/ ├── article.md # Markdown version ├── html/ │ ├── article.html # Browsable HTML │ └── comments.html # Comments (if available) └── images/ # Downloaded images

Open in browser:

open Article_Folder/html/article.html

Media Download Control

Images (Always Downloaded)

Images are always downloaded and embedded:

capcat fetch hn --count 10
# Images automatically downloaded

Additional Media (Videos, Audio, PDFs)

Use --media flag for additional files:

capcat fetch nature --count 10 --media

Downloads:

Videos (MP4, WebM, etc.)
Audio files (MP3, WAV, etc.)
Documents (PDF, DOCX, etc.)

Warning:

--media significantly increases:

Download time
Disk space usage
Network bandwidth

Media Storage

Article_Folder/ ├── article.md ├── images/ # Always downloaded └── files/ # Videos, PDFs, audio (with --media)

Output Location

Default Locations

Bundle and Fetch (multiple articles):

../News/news_DD-MM-YYYY/Source_DD-MM-YYYY/

Single article:

../Capcats/cc_DD-MM-YYYY-Article-Title/

Custom Output Location

Use --output flag:

# Custom directory
capcat fetch hn --count 10 --output /path/to/output

# Relative path
capcat single URL --output ~/Articles

# Current directory
capcat fetch bbc --output .

Updating Existing Articles

Update Mode

Re-fetch and overwrite existing articles:

capcat fetch hn --count 10 --update
capcat bundle tech --update
capcat single URL --update

Use when:

Article content was updated
Want newer comments
Previous fetch failed/incomplete

Behavior:

Overwrites existing article.md
Re-downloads media
Updates timestamps

Configuration File Customization

Global Settings

Edit capcat.yml:

vim capcat.yml

Common customizations:

processing:
  max_workers: 16           # Parallel downloads (faster)
  download_images: true     # Always on
  download_videos: false    # Off unless --media

network:
  connect_timeout: 15       # Connection timeout
  read_timeout: 45          # Read timeout

logging:
  default_level: "INFO"     # Log verbosity

Save and settings apply to all future fetches.

Per-Source Customization

Edit source config:

vim sources/active/config_driven/configs/sourcename.yaml

Customize:

timeout: 15.0              # Source-specific timeout
rate_limit: 2.0            # Slower rate limiting

Common Customization Scenarios

Quick Daily Collection

Fast, no frills:

capcat bundle tech --count 15
# No --html (faster)
# No --media (smaller)
# Default output location

Weekend Reading Preparation

Full featured:

capcat bundle science --count 40 --html --media
# HTML for browser reading
# Media for offline viewing
# Large count for weekend

Targeted Research

Custom location, specific sources:

capcat fetch nature,scientificamerican --count 30 \
  --html --media --output ~/Research/Climate

Archive Update

Refresh existing collection:

capcat bundle news --count 20 --update --html
# Updates existing articles
# Regenerates HTML

Logging Control

File Logging

Save detailed logs:

capcat --log-file capcat.log fetch hn --count 10

Log includes:

Detailed processing steps
Network requests
Errors and warnings
Timing information

Verbosity Levels

Normal (default):

capcat fetch hn --count 10

Verbose:

capcat --verbose fetch hn --count 10
capcat -V fetch hn --count 10

Quiet (errors only):

capcat --quiet fetch hn --count 10
capcat -q fetch hn --count 10

Output Organization Tips

By Date:

Default organization by date
Easy to find recent articles
Automatic cleanup by deleting old date folders

By Source:

Each source gets its own folder
Browse by source preference
Easy to see source volume

By Topic:

Use custom --output for topics
Create topic directories manually
Organize by project/research area

Performance vs Quality

Fast Collection (no frills):

capcat bundle tech --count 10
# ~1-2 minutes

Balanced:

capcat bundle tech --count 20 --html
# ~3-5 minutes

Full Archive:

capcat bundle tech --count 50 --html --media
# ~10-15 minutes

Environment Variables

Override defaults without editing config:

# More parallel workers
export CAPCAT_PROCESSING_MAX_WORKERS=16
capcat bundle tech

# Custom timeout
export CAPCAT_NETWORK_CONNECT_TIMEOUT=20
capcat fetch bbc --count 30

Next Steps

Deep customization:

Configuration Comprehensive - All settings documented

Quick Reference

# Article count
--count N

# Generate HTML
--html

# Download all media
--media

# Custom output
--output DIR

# Update existing
--update

# File logging
--log-file FILE

# Verbosity
--verbose / -V
--quiet / -q

# Combined example
capcat bundle tech --count 20 --html --output ~/News