capcat.core.date_extractor
File: Application/capcat/core/date_extractor.py
Description
Extract publication dates from HTML pages. No network calls.
Functions
extract_publish_date
def extract_publish_date(soup: BeautifulSoup) -> Optional[str]
Extract publication date from parsed HTML.
Priority:
- JSON-LD datePublished
- First
Args: soup: Already-parsed BeautifulSoup object. No HTTP requests made.
Returns: ISO date string or None if no date found.
Parameters:
soup(BeautifulSoup)
Returns: Optional[str]
_extract_from_json_ld
def _extract_from_json_ld(data) -> Optional[str]
Extract datePublished from JSON-LD data (dict or list).
Parameters:
data
Returns: Optional[str]