Customising Output
Control what Capcat collects and where it goes.
Article Count
capcat fetch hn --count 50 # fetch 50 articles
capcat bundle tech --count 10 # 10 per source in bundle
Default: 30 per source. Set globally in Config/Global-settings.yaml:
fetch:
default_count: 30
HTML Output
Generates browsable HTML alongside Markdown:
capcat fetch hn --count 20 --html
Open News/hn/index.html in a browser. Templates are in Config/themes/.
Media Download
# Images and PDFs
capcat fetch hn --count 20 --html --media
# Single article with full media
capcat single https://example.com/article --html --media
--media downloads both images (download_files=True) and PDFs (download_pdfs=True). These are independent flags - --media sets both.
Per-source PDF control in YAML:
media:
download_pdfs: false # this source never downloads PDFs
max_pdf_size_mb: 5 # cap at 5MB for this source
Output Location
Default: current working directory.
capcat fetch hn --count 20 --output ~/News
capcat single <url> --output ~/archive
Set permanently in Config/Global-settings.yaml:
output:
base_dir: "/home/user/News"
Update Existing Articles
Re-fetch articles already collected:
capcat fetch hn --count 10 --update
Without --update, already-existing article directories are skipped.
Output Structure
News/
<source>/
index.html ← source article index
<date>-<slug>/
article.md
article.html
comments.md ← HN, Lobsters, LessWrong
comments.html
media/
image1.jpg
document.pdf
Capcats/
<date>-<slug>/
article.md
article.html
media/