Quick Start Guide

Get Capcat running in 5 minutes with this streamlined setup guide.

Prerequisites

  • Python 3.8+

    (recommended: 3.11+)
  • Virtual environment

    capability
  • Network access

    for downloading articles

Installation

pipx install capcat
capcat list sources

Development Install (Contributing)

Clone the repo and install in editable mode:

git clone https://github.com/stayukasabov/capcat.git
cd capcat
pip install -e ".[dev]"
capcat list sources

Verification

# List available sources
capcat list sources

# List available bundles
capcat list bundles

Project Setup

Capcat uses a project directory to store your config, sources, and output. The first command you run automatically initializes the current directory as a capcat project:

mkdir my-news
cd my-news
capcat catch

This creates:

my-news/ ├── .capcat/ # Internal state (git-ignored) │ ├── state.json │ ├── cache/ │ └── registry/ ├── Config/ # User-owned config (git if you like) │ ├── themes/ # Article HTML themes │ │ ├── base.css │ │ ├── design-system.css │ │ └── Space-Grotesk/ │ └── sources/ │ └── active/ │ ├── config_driven/configs/ # YAML-configured sources │ ├── custom/ # Python-implemented sources │ └── bundles/bundles.yml # Bundle definitions ├── News/ # Batch output (git-ignored) └── Capcats/ # Single-article output (git-ignored)

Auto-init: Running any capcat command in an uninitialised directory automatically scaffolds the vault structure on first use.

Theme upgrades: When the capcat package updates, themes are automatically refreshed in Config/themes/ on next use (with a confirmation prompt in CLI mode).

Basic Usage

Interactive Mode (Recommended for New Users)

Launch the interactive menu for guided workflows:

capcat catch

Main Menu Options:

  What would you like me to do?

  > Catch articles from a bundle of sources
    Catch articles from a list of sources
    Catch from a single source
    Catch a single article by URL
    Manage Sources (add/remove/configure)
    Exit

Why Use Interactive Mode:

  • No command memorization required
  • Guided step-by-step workflows
  • Visual source selection
  • Built-in source management
  • Prevents common errors
  • Ideal for daily use

Quick Examples:

Fetch a News Bundle:

  1. capcat catch
  2. Select "Catch articles from a bundle"
  3. Choose bundle (tech, news, science, etc.)
  4. Select HTML generation option
  5. Confirm and execute

Add a New Source:

  1. capcat catch
  2. Select "Manage Sources"
  3. Select "Add New Source from RSS Feed"
  4. Enter RSS feed URL
  5. Follow prompts

Test a Source:

  1. capcat catch
  2. Select "Manage Sources"
  3. Select "Test a Source"
  4. Choose source to test
  5. View results

For comprehensive interactive mode documentation, see Interactive Mode Guide.

CLI Mode (Advanced Users & Scripts)

Single Article Download

# Download a single article
capcat single https://example.com/article

# With media files
capcat single https://bbc.com/news/technology --media

Batch Downloads

# Tech news bundle (Hacker News + Lobsters + InfoQ)
capcat bundle tech --count 10

# General news bundle (BBC + CNN + Reuters)
capcat bundle news --count 15 --media

# Specific sources
capcat fetch hn,bbc --count 20

Discovery Commands

# List all available sources
capcat list sources

# List predefined bundles
capcat list bundles

File Logging

# Save detailed logs to file (includes all debug information)
capcat -L capcat.log bundle tech --count 10

# Verbose console output + file logging
capcat -V -L debug.log fetch hn --count 15

# Timestamped log files
capcat -L logs/news-$(date +%Y%m%d-%H%M%S).log bundle news --count 10

Output Structure

News/news_DD-MM-YYYY/ # Batch downloads ├── Hacker-News_DD-MM-YYYY/ │ └── 01_Article_Title/ │ ├── article.md │ ├── comments.md │ └── images/ └── BBC-News_DD-MM-YYYY/ └── 01_Article_Title/ ├── article.md └── images/ Capcats/ # Single articles └── DD-MM-YYYY-Article-Title/ ├── article.md └── images/

Key Features Demo

1. Config-Driven Sources (Simple)

# These sources use YAML configuration (no coding required)
capcat fetch iq,bbc,guardian --count 5

2. Custom Sources (Complex)

# These sources have custom Python implementations
capcat fetch hn,bbc,techcrunch --count 5

3. Media Handling

# Images only (default)
capcat bundle tech --count 5

# All media types (images + videos + documents)
capcat bundle tech --count 5 --media

Verification

Test System Health

# Run comprehensive source test
python test_comprehensive_sources.py

# Quick individual test
capcat fetch hn --count 3

Expected Results

  • Sources

    17 sources available
  • Success Rate

    ~90% of curated sources working
  • Performance

    4-6 seconds average per source
  • Output

    Clean Markdown files with local images

Source Policy

  • Paywall Exclusion

    Sources with paywalls or subscription requirements are excluded
  • Recently Removed

    Wired, The Verge (moved to paywall model)
  • Bot Protection

    Sources with aggressive anti-bot measures are avoided
  • Use capcat list sources

    to see current available sources

Intelligent Protection System

Capcat includes automatic protection against problematic sites:

Download Limits (Automatic)

  • Normal Articles

    50 images, 20MB total
  • Suspicious Sites

    10 images, 5MB total
  • High Risk Sites

    5 images, 2MB total
  • Link Aggregators

    0 images, 0MB (blocked)

Real-World Protection

# Example: consumed.today attempted to download 471 images (103MB)
# Automatically blocked: "LINK_AGGREGATOR detected"
# Protection saved: 103MB of unwanted downloads

Media Flag Behavior

# Without --media flag: Standard protection limits apply
capcat single https://example.com/article

# With --media flag: Bypass limits for legitimate sites (up to 500MB)
capcat single https://example.com/article --media

# Note: --media flag ignored for blocked aggregator sites

Wrapper System

Capcat uses a two-layer wrapper system for reliability:

Architecture

  • capcat

    - Lightweight 9-line bash script (executable shortcut)
  • run_capcat.py

    - Comprehensive Python wrapper (handles all logic)
  • capcat.py

    - Main application code

Benefits

  • Automatic Environment Management

    - No manual venv activation needed
  • Dependency Installation

    - Automatically installs requirements.txt
  • Error Handling

    - Clear messages for common issues
  • Cross-Platform

    - Works on macOS, Linux, Windows

Entry Points

# Primary method (recommended)
capcat command args

# Alternative method (direct Python)
python3 run_capcat.py command args

# Manual method (requires venv activation)
source venv/bin/activate && python capcat.py command args

Common Issues

1. Wrapper System Issues

# If bash wrapper fails, use Python wrapper directly
python3 run_capcat.py list sources

# Check wrapper system health
capcat --help

2. Module Not Found

# Let wrapper handle dependencies automatically
capcat list sources

# Or manually activate environment (advanced users)
source venv/bin/activate

3. Virtual Environment Issues

# Remove and recreate (wrapper will rebuild)
rm -rf venv
capcat list sources

4. Network Errors

# Some sources may have anti-bot protection (normal)
# Success rate of 90% (14-16/25) is expected

Next Steps

Pro Tips

  1. Use bundles

    for related content: bundle tech, bundle news
  2. Start small

    with --count 5 to test new sources
  3. Monitor performance

    - check average processing times
  4. Use --media sparingly

    - significantly increases download time
  5. Enable file logging

    with -L logfile.log for troubleshooting and debugging
You're now ready to use Capcat! For advanced usage, continue to the Architecture Overview.