CLI Commands Comprehensive Reference

Complete documentation of EVERY command-line interface option, flag, parameter, and workflow in Capcat.

Source: Application/cli.py, Application/capcat.py

Global Options

These options work with ALL commands.

--version, -v

capcat --version
capcat -v

Displays: Capcat v2.0.0

--verbose, -V

capcat --verbose <command>
capcat -V <command>

Enable verbose output logging. Shows:

Detailed processing information
Debug-level messages
Source discovery details
Network requests
Article processing steps

Cannot be used with:

--quiet

--quiet, -q

capcat --quiet <command>
capcat -q <command>

Suppress informational messages. Shows only:

Warnings
Errors
Critical messages

Cannot be used with:

--verbose

--config, -C FILE

capcat --config custom.yml <command>
capcat -C ~/.capcat/config.yml <command>

Specify custom configuration file path instead of default capcat.yml.

Default location:

Application/capcat.yml

###

--log-file, -L FILE

capcat --log-file capcat.log <command>
capcat -L logs/debug-$(date +%Y%m%d).log <command>

Write detailed logs to specified file. File logging includes:

All log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
Timestamps for every entry
Module/function names
Full stack traces for errors
Network request/response details

Log format:

[YYYY-MM-DD HH:MM:SS] [LEVEL] [module.function:line] Message

Commands

single - Download Single Article

Download and process a single article from any URL.

Syntax:

capcat single <url> [options]

Required Arguments:

url - Article URL to download (must be valid HTTP/HTTPS URL)

Options:

--output, -o DIR

capcat single URL --output /path/to/output
capcat single URL -o ../Capcats

Specify output directory for article.

Default:

Current directory (.)

Output structure:

<output-dir>/ └── cc_DD-MM-YYYY-Article-Title/ ├── article.md ├── images/ └── html/ (if --html specified)

--media, -M

capcat single URL --media
capcat single URL -M

Download ALL media types:

Images (always downloaded, this flag adds others)
Videos (MP4, WebM, etc.)
Audio files (MP3, WAV, etc.)
Documents (PDF, DOCX, etc.)

Default:

Images only

Storage:

Files stored in files/ subdirectory

--html, -H

capcat single URL --html
capcat single URL -H

Generate HTML version of article in addition to Markdown.

Output:

Article_Folder/html/article.html

Features:

Professional styling with themes
Navigation buttons
Responsive design
Dark/light mode support

--update, -U

capcat single URL --update
capcat single URL -U

Update existing article if it already exists.

Behavior:

Overwrites existing article.md
Re-downloads media files
Updates timestamps
Preserves folder structure

Complete Examples:

# Basic single article download
capcat single https://example.com/article

# Download with all media and HTML
capcat single https://bbc.com/news/article --media --html

# Update existing article verbosely
capcat single URL --update --verbose

# Custom output with logging
capcat single URL --output ~/Articles --log-file fetch.log

fetch - Fetch from Specific Sources

Fetch articles from one or more specific sources.

Syntax:

capcat fetch <sources> [options]

Required Arguments:

sources - Comma-separated list of source IDs (no spaces)

Available Sources:

Use capcat list sources to see all available source IDs.

Common Source IDs:

hn - Hacker News
lb - Lobsters
iq - InfoQ
bbc - BBC News
guardian - The Guardian
nature - Nature News
ieee - IEEE Spectrum
mitnews - MIT News
bbcsport - BBC Sport

Options:

--count, -c N

capcat fetch hn --count 10
capcat fetch hn,bbc -c 50

Number of articles to fetch per source.

Default:

Range:

1-1000 (practical limit varies by source)

Behavior:

Each source fetches N articles independently

--output, -o DIR

capcat fetch hn --output /path/to/output
capcat fetch hn -o ../News

Specify output directory for articles.

Default:

Current directory (.)

Output structure:

<output-dir>/ └── news_DD-MM-YYYY/ ├── Source1_DD-MM-YYYY/ │ ├── 01_Article_Title/ │ ├── 02_Article_Title/ │ └── ... └── Source2_DD-MM-YYYY/ └── ...

--media, -M

capcat fetch hn,bbc --media
capcat fetch nature -M

Download ALL media types (videos, audio, documents) in addition to images.

Default:

Images only

Impact:

Significantly increases download time and disk usage

--html, -H

capcat fetch hn --html
capcat fetch hn,bbc -H

Generate HTML versions of all articles.

Output:

Each article gets html/article.html

Sources with comments:

Also generates html/comments.html

--update, -U

capcat fetch hn --update
capcat fetch hn -U

Update existing articles if they already exist.

Complete Examples:

# Fetch 10 articles from Hacker News
capcat fetch hn --count 10

# Fetch from multiple sources
capcat fetch hn,lb,iq --count 20

# Fetch with all features
capcat fetch bbc,guardian --count 15 --media --html --verbose

# Fetch to custom location with logging
capcat fetch nature --count 5 --output ~/Science --log-file science.log

bundle - Fetch from Source Bundles

Fetch articles from predefined source bundles (groups of related sources).

Syntax:

capcat bundle <bundle-name> [options]

Required Arguments:

bundle-name - Name of predefined bundle

Available Bundles:

Use capcat list bundles to see all available bundles.

Predefined Bundles:

tech - Technology news (ieee, mashable, gizmodo)
techpro - Advanced tech news (hn, lb, iq)
news - General news (bbc, guardian)
science - Science news (nature, scientificamerican)
ai - AI/ML news (mitnews, googleai, openai)
sports - Sports news (bbcsport)
all - All available sources

Options:

--count, -c N

capcat bundle tech --count 10
capcat bundle news -c 50

Number of articles to fetch per source in bundle.

Default:

Behavior:

Each source in bundle fetches N articles

--output, -o DIR

capcat bundle tech --output /path/to/output
capcat bundle tech -o ../News

Specify output directory.

Default:

Current directory (.)

Structure:

Same as fetch command

--media, -M

capcat bundle tech --media
capcat bundle science -M

Download ALL media types for all sources in bundle.

--html, -H

capcat bundle tech --html
capcat bundle news -H

Generate HTML for all articles from all sources.

--all, -A

capcat bundle --all
capcat bundle -A --count 10

Fetch from ALL available bundles in order: techpro, tech, news, science, ai.

Cannot be used with:

bundle-name argument

Behavior:

Processes bundles sequentially

--update, -U

capcat bundle tech --update
capcat bundle tech -U

Update existing articles in bundle.

Complete Examples:

# Fetch tech bundle
capcat bundle tech

# Fetch with custom count
capcat bundle news --count 20

# Fetch all bundles
capcat bundle --all --count 10

# Full-featured bundle fetch
capcat bundle science --count 15 --media --html --verbose

# All bundles with logging
capcat bundle --all --log-file bundles-$(date +%Y%m%d).log

list - List Sources and Bundles

Display available sources, bundles, or both.

Syntax:

capcat list [what]

Arguments:

what - What to list: sources, bundles, or all (default: all)

Options:

None (this command has no additional options)

Examples:

List All Sources

capcat list sources

Output format:

--- Available Sources ---

TECH:
  - ieee           IEEE Spectrum
  - mashable       Mashable
  - gizmodo        Gizmodo

TECHPRO:
  - hn             Hacker News
  - lb             Lobsters
  - iq             InfoQ

NEWS:
  - bbc            BBC News
  - guardian       The Guardian

SCIENCE:
  - nature         Nature News
  - scientificamerican Scientific American

AI:
  - mitnews        MIT News

SPORTS:
  - bbcsport       BBC Sport

Total: 11 sources

List All Bundles

capcat list bundles

Output format:

--- Available Bundles ---

tech - Technology News
  Sources: ieee, mashable, gizmodo

techpro - Advanced Technology
  Sources: hn, lb, iq

news - General News
  Sources: bbc, guardian

science - Science News
  Sources: nature, scientificamerican

ai - AI & Machine Learning
  Sources: mitnews

sports - Sports News
  Sources: bbcsport

Total: 6 bundles

List Both

capcat list all
capcat list

Displays both sources and bundles sections.

config - Configuration Management

View or modify Capcat configuration.

Syntax:

capcat config [options]

Options:

--show, -s

capcat config --show
capcat config -s

Display current configuration from capcat.yml.

Output includes:

Network settings (timeouts, user agent)
Processing settings (workers, media flags)
Logging configuration
Output directories
Source-specific settings

--set KEY=VALUE

capcat config --set max_workers=16
capcat config --set download_images=true

Set configuration value.

Cannot be used with:

--show

Modifiable settings:

max_workers - Number of parallel workers (1-32)
download_images - Enable/disable image downloads (true/false)
download_videos - Enable/disable video downloads (true/false)
connect_timeout - Network connection timeout (seconds)
read_timeout - Network read timeout (seconds)
user_agent - Custom user agent string

Note:

Changes are written to capcat.yml immediately.

Examples:

# Show current config
capcat config --show

# Increase workers
capcat config --set max_workers=16

# Set custom user agent
capcat config --set user_agent="MyBot/1.0"

add-source - Add New RSS Source

Interactively add a new RSS-based source.

Syntax:

capcat add-source --url <rss-feed-url>

Required Options:

--url - URL of RSS feed to add

Interactive Workflow:

RSS feed inspection and validation
Source ID selection (auto-suggested from feed title)
Category selection from existing categories
Optional bundle assignment
Optional test fetch (recommended)

What It Creates:

YAML config file in: sources/active/config_driven/configs/
Filename: <source-id>.yaml
Bundle entry (if selected)

Examples:

# Add new RSS source
capcat add-source --url https://techcrunch.com/feed/

# Interactive flow:
# 1. Inspecting RSS feed...
#    [OK] Feed 'TechCrunch' found.
#
# 2. Configure New Source
#    Source ID: techcrunch
#    Category: tech
#    Add to bundle? Yes
#      Bundle: tech
#    [OK] Added 'techcrunch' to bundle 'tech'.
#
# 3. Running Test Fetch
#    Test fetch? Yes
#    [OK] Source added and verified successfully!

Cancellation:

Press Ctrl+C at any prompt to cancel operation.

remove-source - Remove Existing Sources

Interactively remove one or more sources with safety features.

Syntax:

capcat remove-source [options]

Interactive Mode (No Options):

capcat remove-source

Launches interactive removal with checkbox selection:

Displays all available sources
Select sources with spacebar
Review changes
Confirm removal
Automatic backup created
Sources removed

Options:

--dry-run, -n

capcat remove-source --dry-run
capcat remove-source -n

Preview changes without actually removing anything.

Output shows:

Which files would be removed
Which bundle entries would be deleted
Backup that would be created
No actual changes made

--batch, -b FILE

capcat remove-source --batch sources.txt
capcat remove-source -b remove-list.txt

Remove sources listed in file (one source ID per line).

File format:

techcrunch
oldnewssite
discontinued-source

Behavior:

Reads file line by line
Validates all source IDs first
Confirms before removal (unless --force)
Creates backup

--undo, -u [BACKUP_ID]

capcat remove-source --undo
capcat remove-source --undo latest
capcat remove-source --undo backup_20251025_143022

Restore sources from backup.

Arguments:

No argument or latest - Restore most recent backup
BACKUP_ID - Restore specific backup by ID

Backup ID format:

backup_YYYYMMDD_HHMMSS

Restores:

All config files
Bundle definitions
Registry cache

--no-backup

capcat remove-source --no-backup
capcat remove-source --no-backup --force

Skip creating backup before removal.

Warning:

No undo capability without backup. Use with extreme caution.

--no-analytics

capcat remove-source --no-analytics

Skip displaying usage analytics for sources.

Default behavior:

Shows usage statistics before removal

--force, -f

capcat remove-source --force
capcat remove-source -f --batch sources.txt

Skip all confirmation prompts.

Use cases:

Automated scripts
CI/CD pipelines
Batch operations

Warning:

No confirmation, immediate removal.

Complete Examples:

# Interactive removal
capcat remove-source

# Preview without removing
capcat remove-source --dry-run

# Batch removal
capcat remove-source --batch old-sources.txt

# Forced batch removal without backup
capcat remove-source --force --no-backup --batch remove.txt

# Restore last removal
capcat remove-source --undo

# Restore specific backup
capcat remove-source --undo backup_20251025_120000

Backup Locations:

Application/ └── .capcat-backups/ ├── backup_20251025_143022/ │ ├── configs/ │ ├── bundles.yml │ └── manifest.json └── backup_20251025_150033/ └── ...

generate-config - Generate Source Config

Launch interactive wizard to create comprehensive YAML source configuration.

Syntax:

capcat generate-config [options]

Interactive Wizard Steps:

Source ID and display name
Category selection
Base URL
Discovery method (RSS or HTML)
Article link selectors (CSS/XPath)
Content extraction selectors
Image processing configuration
Author/date extraction patterns
Rate limiting settings
Request timeout
Skip patterns
Template selection

Options:

--output, -o FILE

capcat generate-config --output custom.yaml
capcat generate-config -o /path/to/config.yaml

Specify output file path.

Default:

sources/active/config_driven/configs/<source-id>.yaml

Output Format:

# Generated configuration
display_name: "Example Source"
base_url: "https://example.com/"
category: "tech"
timeout: 10.0
rate_limit: 1.0

rss_config:
  feed_url: "https://example.com/feed.xml"
  use_rss_content: true

article_selectors:
  - ".headline a"
  - ".article-title a"

content_selectors:
  - ".article-content"
  - ".post-body"

image_processing:
  selectors:
    - "img"
    - ".content img"
  url_patterns:
    - "example.com/"
  skip_selectors:
    - ".sidebar img"

skip_patterns:
  - "/about"
  - "/contact"
  - "?utm_"

Examples:

# Interactive generation
capcat generate-config

# Save to custom location
capcat generate-config --output ~/configs/newsource.yaml

Next Steps After Generation:

# Review generated config
cat sources/active/config_driven/configs/newsource.yaml

# Test source
capcat fetch newsource --count 5

# Add to bundle (manual edit)
vim sources/active/bundles.yml

catch - Interactive Mode

Launch interactive menu interface for all Capcat operations.

Syntax:

capcat catch

No options:

This command has no additional flags or options.

Main Menu Options:

Catch articles from a bundle of sources
Catch articles from a list of sources
Catch from a single source
Catch a single article by URL
Manage Sources (add/remove/configure)
Exit

Navigation:

Arrow keys: Move between options
Enter: Select option
Ctrl+C: Cancel/go back

Sub-menu: Manage Sources

Add New Source from RSS Feed
Generate Custom Source Config
Remove Existing Sources
List All Sources
Test a Source
Back to Main Menu

Terminal Output:

____ / ____| _ | | __ _ _ __ ___ __ _| |_ | | / _ | _ \ / __/ _ | __| | |___| (_| | |_) | (_| (_| | |_ \_____\__,_| __/ \___\__,_|\__| | | |_| What would you like me to do? > Catch articles from a bundle of sources Catch articles from a list of sources Catch from a single source Catch a single article by URL Manage Sources (add/remove/configure) Exit (Use arrow keys to navigate)

Example Session:

$ capcat catch

# User selects: "Catch articles from a bundle of sources"
# System shows bundle list, user selects "tech"
# System asks: "Generate HTML for web browsing?"
# User selects: "Yes"

--------------------
SUMMARY
Action: bundle
Bundle: tech
Generate HTML: true
--------------------

Executing command...
[Article processing begins]

See:

docs/tutorials/02-interactive-mode-exhaustive.md for complete interactive mode documentation.

Command Combinations

Typical Workflows

Daily News Collection

# Morning routine: fetch all news
capcat bundle news --count 20 --html

# Evening: fetch tech updates
capcat bundle tech --count 10

Research Workflow

# Collect science articles with all media
capcat bundle science --count 30 --media --log-file research.log

# Add specific article found during research
capcat single https://nature.com/article/12345 --media --html

Source Management Workflow

# List current sources
capcat list sources

# Add new source
capcat add-source --url https://newssite.com/feed.xml

# Test new source
capcat fetch newsource --count 3

# Remove old sources
capcat remove-source --batch old-sources.txt

Automated Collection Script

#!/bin/bash
# Daily collection script

LOG_FILE="logs/daily-$(date +%Y%m%d).log"

# Collect from all bundles
capcat bundle --all --count 20 --html --log-file "$LOG_FILE"

# Check if successful
if [ $? -eq 0 ]; then
    echo "Collection successful" >> "$LOG_FILE"
else
    echo "Collection failed" >> "$LOG_FILE"
    exit 1
fi

Exit Codes

All commands return standard exit codes:

0 - Success (all operations completed)
1 - General error (command failed)
2 - Invalid arguments (validation failed)
130 - User interrupt (Ctrl+C pressed)

Usage in scripts:

capcat fetch hn --count 10
if [ $? -eq 0 ]; then
    echo "Success"
else
    echo "Failed"
    exit 1
fi

Environment Variables

Capcat respects these environment variables:

CAPCAT_OUTPUT_DIR

export CAPCAT_OUTPUT_DIR="/custom/path"
capcat fetch hn

Override default output directory for all commands.

CAPCAT_DEFAULT_COUNT

export CAPCAT_DEFAULT_COUNT=50
capcat fetch hn  # Uses 50 instead of 30

Override default article count.

CAPCAT_CONFIG

export CAPCAT_CONFIG="~/.capcat/custom.yml"
capcat fetch hn

Override default configuration file location.

CAPCAT_LOG_LEVEL

export CAPCAT_LOG_LEVEL=DEBUG
capcat fetch hn

Set logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL

CAPCAT_MAX_WORKERS

export CAPCAT_MAX_WORKERS=16
capcat bundle all

Override parallel worker count.

Configuration Priority

Settings are applied in this order (highest to lowest priority):

Command-line arguments (--count 10)
Environment variables (CAPCAT_DEFAULT_COUNT=10)
Configuration file (capcat.yml: default_count: 10)
Default values (hard-coded: 30)

Example:

# capcat.yml has: default_count: 20
# Environment: CAPCAT_DEFAULT_COUNT=40
# Command: --count 10

capcat fetch hn  # Uses 10 (CLI wins)

Error Handling

Network Errors

$ capcat fetch hn
Error: Failed to connect to news.ycombinator.com
Network timeout after 10 seconds

Invalid Source

$ capcat fetch nonexistent
Error: Unknown sources: nonexistent
Available: hn, lb, iq, bbc, guardian, ...

Invalid URL

$ capcat single not-a-url
Error: Invalid URL format: not-a-url
Must start with http:// or https://

Permission Denied

$ capcat fetch hn --output /root/articles
Error: Permission denied: /root/articles
Cannot create output directory

Disk Space

$ capcat bundle all --media
Warning: Low disk space (< 1GB available)
Consider using --no-media flag

Debugging Commands

Verbose Output

# Maximum verbosity with file logging
capcat -V --log-file debug.log fetch hn --count 3

Dry Run (Remove Command)

# Preview what would be removed
capcat remove-source --dry-run

Test Single Article

# Test article processing
capcat single URL --verbose

List and Verify Sources

# Check source availability
capcat list sources

Configuration Validation

# Show current config
capcat config --show

Source Code Locations

All CLI functionality implemented in:

Application/cli.py - Argument parsing, subcommands
Application/capcat.py - Main application logic
Application/core/ - Core processing modules

Function reference:

create_parser() - Application/cli.py:501 - Parser creation
parse_arguments() - Application/cli.py - Argument parsing
validate_arguments() - Application/cli.py:756 - Validation
process_sources() - Application/capcat.py:55 - Source processing
add_source() - Application/cli.py:200 - Add source command
remove_source() - Application/cli.py:339 - Remove source command
generate_config_command() - Application/cli.py:457 - Config generation