Data Sources & Methodology
How we collect and process pharmaceutical data
Our Approach
TheraRadar aggregates data exclusively from official government sources. We do not scrape commercial databases or estimate data. Every data point can be traced back to FDA, ClinicalTrials.gov, SEC, or CMS.
Primary Data Sources
| Data Type | Source | Update Frequency | Coverage |
|---|---|---|---|
| FDA Drug Approvals | FDA Drugs@FDA | Weekly | 8,245 drugs |
| Drug Labels | FDA DailyMed | Monthly | Indications, MOA, warnings |
| Patents & Exclusivity | FDA Orange Book | Monthly | Small molecule patents |
| Biologics | FDA Purple Book | Monthly | Biosimilars, reference products |
| Clinical Trials | ClinicalTrials.gov | Weekly | ~92,000 indexed trials |
| SEC Filings | SEC EDGAR | Weekly | 8-K, 10-K, 10-Q filings |
| PDUFA Dates | SEC 8-K Filings (extracted) | Weekly | FDA target action dates |
| Medicare Spending | CMS Part D Data | Annual | Drug spending 2013-2022 |
Drug Data Processing
Source: We parse the FDA Drugs@FDA bulk data files which contain all FDA-approved drug applications (NDAs, ANDAs, BLAs) since 1939.
Enrichment: Drug labels from DailyMed are parsed to extract indications, mechanism of action, targets, warnings, and adverse reactions.
Normalization: Brand names and generic names are normalized. Company names are standardized across acquisitions and name changes.
Clinical Trials Index
Source: We maintain a local index of ~92,000 clinical trials from ClinicalTrials.gov for dashboard analytics.
Live queries: Individual drug pages query ClinicalTrials.gov in real-time for the most current trial information.
Filtering: We focus on interventional drug trials (excluding device, behavioral, and observational studies).
Indication Taxonomy
Manual curation: We maintain a hand-curated taxonomy that maps FDA indication text and clinical trial conditions to standardized therapeutic areas.
Coverage: See our taxonomy dashboard for current mapping coverage and completeness metrics.
Therapeutic areas: Oncology, CNS, Cardiovascular, Metabolic, Infectious Disease, Immunology, Respiratory, Rare Disease, and more.
PDUFA Date Extraction
Method: We search SEC EDGAR full-text for PDUFA-related keywords in 8-K, 10-K, and 10-Q filings.
Keywords: "PDUFA", "Prescription Drug User Fee Act", "FDA target action date", "FDA goal date"
Limitations: Not all companies disclose PDUFA dates. We supplement with manual entries for known upcoming decisions.
Patent & Exclusivity Data
Orange Book: Contains patents and exclusivity for small molecule drugs (NDAs). Updated monthly by FDA.
Purple Book: Contains biosimilar interchangeability and reference product designations for biologics (BLAs).
Limitation: Many biologics are not listed in the Orange Book. Patent cliff analysis focuses on drugs with Medicare Part D spending data.
Known Limitations
- • Drug names: ClinicalTrials.gov uses base names (e.g., "Fludarabine") while FDA uses salt forms (e.g., "FLUDARABINE PHOSPHATE"). We maintain manual mappings but some may be missing.
- • Company attribution: FDA lists the original sponsor, not necessarily the current manufacturer. We attempt to track acquisitions but may miss some.
- • Indication mapping: Our taxonomy covers major indications but some rare conditions may not be mapped. See coverage metrics on the taxonomy dashboard.
- • PDUFA completeness: We capture dates disclosed in SEC filings. Private companies and some situations may not have public disclosures.
Found an error?
If you notice incorrect data, please report it and we'll investigate. Include the drug/company name and what you believe is incorrect.