Methods and Data Documentation

0) Scope of the Atlas

  • Intended purpose: cross-source adverse event signal exploration and evidence contextualization.
  • Not intended for pharmacovigilance regulatory decision support as a standalone system.
  • Not intended to estimate population incidence or real-world risk.
  • Not intended to provide clinical guidance, diagnosis, treatment, or prescribing advice.
  • Population limitations apply because data sources differ in inclusion criteria, reporting behavior, and completeness.

The Atlas is intended for hypothesis generation and exploratory analysis rather than confirmatory evidence.

1) Data sources

This section documents ingestion and interpretation for FAERS, clinical trials, PubMed, DK data, and WHO data.

1.1 FAERS data

What FAERS is

FAERS is a spontaneous reporting system and is used for signal detection rather than causal incidence estimation.

Input files used

All FAERS data files are downloaded directly from the FDA's website.

  • https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html

FAERS quarterly data files from Q1 2016 through Q4 2025 were included.

Case versioning logic

For each caseid, the latest primaryid version is retained.

Substance parsing

  • Source: reported active ingredient prod_ai
  • Normalization: lowercase, trim whitespace, remove blanks

Terminology: In AEAtlas, Substance name (active ingredient) is used consistently for this field.

prod_ai is a reporter-entered field and may include spelling variants, brand names, combinations, salt forms, or incomplete ingredient names.

Adverse Event parsing

  • Source: Preferred term (pt)
  • Normalization: lowercase, trim whitespace, remove blanks

Counting definition

Counts represent the number of unique case identifiers (caseid) in which a given substance-event pair was reported, after retaining the most recent version of each case.

MedDRA System Organ Classes (SOC) mapping

  • Multiple SOCs per PT are collapsed to a joined list

Known limitations

  • Under-reporting and over-reporting
  • Notoriety bias and duplicate complexity
  • Missing age/sex and concomitant medication confounding
  • Indication/channeling bias
  • Exposure denominators (number of patients receiving a drug) are not available in FAERS, preventing incidence estimation

1.2 Clinical trials (ClinicalTrials.gov results)

Database

Postgres: clinicaltrials.gov

ClinicalTrials.gov results data are aggregated arm-level summaries reported by study sponsors and do not represent individual participant data.

Substance to trial mapping

Intervention matching uses ILIKE %drug%. Limitations include substring collisions and missed brand-name variants unless included.

Intervention matching is heuristic and may misclassify trials when intervention names include multiple drugs, brand names, or descriptive phrases.

Included trials

Only studies involving drugs listed as approved by the European Medicines Agency (EMA) are included. The EMA list is downloaded here https://www.ema.europa.eu/en/medicines/download-medicine-data.

Only studies with results_first_posted_date IS NOT NULL. Meaning that only trials with posted results are included in the Atlas

Group logic

  • Reported event groups: EGxxx
  • Baseline groups: BGxxx
  • Map EG to BG by normalized group-title equality

Group mapping relies on title normalization heuristics and may fail for complex study designs (e.g., crossover, extension phases, pooled arms).

Treatment vs control selection

  • Control arm: title contains placebo/control/comparator/SOC patterns and does not mention drug
  • Treatment arm: mentions drug and is not control-pattern arm
  • Multiple candidates: choose highest subjects_at_risk

Arm classification is rule-based and may not perfectly reflect the study's intended comparison structure, particularly in multi-arm or crossover designs.

Outcome extracted

Per trial AE: subjects_affected and subjects_at_risk.

Effect measure

For each trial and adverse event term:

  • a: affected participants in treatment arm
  • n_treat: participants at risk in treatment arm
  • c: affected participants in control arm
  • n_control: participants at risk in control arm
\[ \mathrm{risk}_{\mathrm{treat}}=\frac{a}{n_{\mathrm{treat}}} \]
\[ \mathrm{risk}_{\mathrm{control}}=\frac{c}{n_{\mathrm{control}}} \]
\[ \mathrm{RD}=\mathrm{risk}_{\mathrm{treat}}-\mathrm{risk}_{\mathrm{control}} \]

Interpretation:

  • RD > 0: higher AE risk in treatment arm
  • RD < 0: lower AE risk in treatment arm
  • RD = 0: equal observed risk

Approximate standard error and confidence interval (Wald form):

\[ SE(\mathrm{RD})= \sqrt{ \frac{\mathrm{risk}_{\mathrm{treat}}\left(1-\mathrm{risk}_{\mathrm{treat}}\right)}{n_{\mathrm{treat}}} + \frac{\mathrm{risk}_{\mathrm{control}}\left(1-\mathrm{risk}_{\mathrm{control}}\right)}{n_{\mathrm{control}}} } \]
\[ 95\%\,CI_{\mathrm{RD}}= \left[ \mathrm{RD}-1.96\cdot SE(\mathrm{RD}), \mathrm{RD}+1.96\cdot SE(\mathrm{RD}) \right] \]

The UI currently displays the point estimate-oriented metrics and not trial-level RD confidence intervals.

Risk differences are unadjusted and do not account for baseline imbalances, stratification factors, or time-at-risk differences.

Sex and age extraction

Source: trial baseline demographic records reported in the results dataset.

  • Sex: sum female and male rows per BG group
  • Age mean: param_type ~ mean/average
  • Age SD: param_type ~ sd/stddev/standard deviation

These sex and age values describe baseline trial cohort demographics for selected arms and are not AE-specific participant subsets.

Limitations: median/IQR-only trials, missing SD, mislabelled param_type, multiple age-row semantics.

Known limitations

  • Published-results subset only
  • Selective reporting
  • Heterogeneous AE coding and baseline reporting
  • Differences in treatment duration and follow-up across trials are not adjusted for in the current metrics

1.3 PubMed (literature signal)

What is queried

Substance mention in title/abstract and adverse-event terms from 2016 onward. We currently only search for EMA approved drugs. The list can be download here https://www.ema.europa.eu/en/medicines/download-medicine-data .

Term extraction method

  • Dictionary-based term matching derived from MedDRA Preferred Terms
  • Abstract cleaning: lowercase, punctuation cleanup, header-line cleanup
  • Match extraction from abstract blocks

Indication filtering

Terms matching therapeutic-area concepts are filtered to reduce indication-as-AE false positives.

Publication year

From entrez_summary() pubdate or epublishdate.

Known limitations

  • Not a curated AE dataset
  • False positives from contextual mention
  • Title/abstract only, limited negation handling
  • Literature mentions do not necessarily represent observed adverse events in study populations and may include speculative or background discussion
  • Published literature may preferentially report unusual or severe events, introducing reporting bias
  • Generic terms such as “serious adverse event” are not PT diagnoses

1.4 DK data

What the DK dataset is used for

The Denmark section of the Atlas is an adverse event reporting dataset used for substance-level case summaries, adverse event rankings, signal metrics, and age/sex/year breakdowns.

Application-facing data structure

  • Base adverse-event records for Denmark
  • Pre-aggregated summaries by substance and substance-event combination
  • Server-side totals used for disproportionality metrics
  • Year-stratified pooled signal outputs for eligible substance-event pairs
  • Exposure-denominator summaries used for incidence-style displays

Normalization and matching

Substance queries are normalized to lowercase and trimmed before lookup. For detail-level queries the UI uses drug_name=ilike and pt=ilike, which supports case-insensitive matching but may still miss spelling variants not present in the source data.

Counting definition

Top-line DK totals in the Atlas are displayed as case counts. In record-level queries, de-duplication for charts is performed with the adr field, so sex, age, and yearly DK visualizations reflect unique adr values rather than raw row counts.

Time coverage currently used in the UI

DK yearly, age, and sex analyses are queried for recvd_year 2015 through 2023.

Signal metrics

The Atlas computes DK disproportionality metrics from a source-level 2x2 table (ROR, PRR, and a smoothed IC). When any cell is zero, a continuity correction of 0.5 is applied. If yearly exposure strata are available, the UI also displays a year-stratified Mantel-Haenszel odds ratio.

Exposure denominators

Incidence-style yearly displays depend on exposure denominator data. When no denominator is available, the Atlas falls back to case-only display and suppresses denominator-dependent interpretation.

Known limitations

  • The underlying source provenance for the DK dataset is not described in this workspace and should be documented separately if public-source citation is required.
  • Case counts and exposure counts are drawn from different derived objects and may not support strict causal or incidence interpretation.
  • Case-insensitive string matching does not guarantee harmonization across brand names, salts, misspellings, or combination products.
  • Age and sex summaries depend on source-field completeness and on the availability of unique adr identifiers.

1.5 UK data

What the UK dataset is used for

The UK section is used for substance-level case summaries, top adverse-event rankings, and substance + adverse-event signal metrics (ROR, PRR, smoothed IC), with year/sex/age charts in the same UI pattern as other spontaneous-reporting sources.

Application-facing data structure

  • Base UK adverse-event records with fields including drug_name, year, sex, age_group, pt, and count
  • Pre-aggregated materialized views for substance totals and substance-event totals used by cards and signal RPCs
  • Server-side totals used for 2x2 disproportionality metrics in the UI

Normalization and matching

UK substance and event matching is normalized with lowercase and whitespace trimming. UI lookups use case-insensitive matching and server-side normalized aggregates.

Counting definition

UK values are source-provided event counts. In AE mode, UK signal/year/sex/age cards are queried on the same configured year window to keep card totals aligned.

Time coverage currently used in the UI

UK AE-mode cards currently query years 2015 through 2025.

Known limitations

  • As with other spontaneous reporting systems, disproportionality metrics reflect reporting patterns rather than incidence or causality.
  • Source-field completeness for age and sex can be limited; unknown categories may represent a large share for some substance-event pairs.

1.6 WHO data

What the WHO dataset is used for

The WHO section of the Atlas is used for substance-level adverse event summaries, full PT-level tables, disproportionality metrics, and substance-level year, sex, and age distributions.

Application-facing data structure

  • Preferred-term event counts
  • Substance-level totals grouped by organ-system context
  • Yearly, sex, and age-group summaries
  • Server-side aggregated totals used for signal metrics

Normalization and matching

WHO substance and event matching is normalized with lowercase and whitespace trimming. The UI queries WHO tables with ilike filters, and the backend materialized views used for signal metrics also store normalized lowercase keys.

Counting definition

WHO adverse event tables use source-provided event counts. Substance totals are taken from the maximum available total per substance in the organ-system summary to avoid double-counting repeated organ-system rows for the same substance.

Time coverage currently used in the UI

The year chart queries WHO data from 2016 through 2025.

Signal metrics

The Atlas computes WHO ROR, PRR, and a smoothed IC from server-side aggregated totals. If those totals are unavailable, the client falls back to direct source-table aggregation. The current WHO implementation does not display a Mantel-Haenszel model or an exposure-based incidence estimate.

Demographic summaries

WHO sex and age panels are currently substance-level summaries, not substance-event-specific summaries. Age groups are mapped into fixed display buckets, and any residual counts outside recognized buckets are reported as unknown age.

Known limitations

  • The WHO charts shown in the UI are substance-level for year, sex, and age, even when an adverse event is selected elsewhere on the page.
  • Substance totals inferred from organ-system summaries depend on the assumption that the maximum per-substance total is the correct denominator.
  • As with other spontaneous reporting systems, disproportionality metrics reflect reporting patterns rather than incidence or causality.
  • The underlying external WHO source citation is not described in this workspace and should be documented separately if a formal provenance statement is needed.

2) Terminology and mapping

2.1 MedDRA mapping & Organ system

SOC mapping is retained even when non-unique. A PT can validly map to multiple SOCs and is stored as a joined list.

2.2 What counts as an adverse event term in the Atlas?

Serious adverse event classification

AEAtlas does not include or display seriousness classifications (e.g., serious vs. non-serious adverse events).

Seriousness is a regulatory designation based on patient outcomes (such as death, hospitalization, life-threatening events, or disability) rather than a specific clinical diagnosis. Because AEAtlas is designed to analyze and compare clinical adverse event terms across studies and data sources, seriousness classifications were not incorporated into the current data model.

Seriousness is distinct from severity; severity reflects clinical intensity, while seriousness reflects regulatory outcomes.

Additionally, seriousness definitions and reporting practices vary across data sources (clinical trials, spontaneous reporting systems, and publications), which limits comparability. Excluding seriousness avoids introducing inconsistent or non-comparable metrics into the Atlas.

Future versions may incorporate seriousness as an optional filter if harmonized definitions and reliable cross-source mappings become available.

Substance reaction

Too nonspecific as free text. Include only when mapped to specific PT concepts.

PolicyDefinitionHandling
IncludeSpecific clinical concepts (PT-like)Keep in Atlas AE nodes
ExcludeSeverity tags (e.g., serious adverse event)Not included or displayed in current model
Exclude/flagAdministrative terms (e.g., treatment emergent)Exclude from AE nodes
Exclude/flagCatch-alls (e.g., drug reaction)Exclude unless standardized PT mapping exists
ConditionalProcedural termsInclude only when clinically meaningful

3) Analytics and metrics

3.1 Disproportionality

FAERS, DK, and WHO disproportionality metrics reflect reporting patterns and should not be interpreted as incidence, prevalence, or causal relative risk in exposed populations.

Across all three spontaneous-reporting sources, these metrics can be influenced by co-reported drugs, reporting practices, stimulated reporting, duplicate handling, and confounding by indication.

In the Atlas, FAERS, DK, and WHO all use the same 2x2 disproportionality framework for substance-event signal detection. FAERS and WHO display source-level ROR, PRR, and smoothed IC/IC025 from source-specific totals. DK displays the same source-level metrics and, when yearly exposure strata are available, an additional pooled yearly signal based on a year-stratified Mantel-Haenszel model.

2x2 table

a, b, c, d correspond to substance-event, substance-other, other-substance-event, other-substance-other.

\\[ \\begin{array}{c|cc} & E & \\neg E \\\\ \\hline D & a & b \\\\ \\neg D & c & d \\end{array} \\]

Where D is the drug and E is the adverse event term. For FAERS, DK, and WHO, the source-specific totals define a, b, c, and d; the formulas below are applied within each source.

ROR and CI

\\[ \\mathrm{ROR} = \\frac{a d}{b c} \\]
\\[ \\log(\\mathrm{ROR}) = \\log(a) + \\log(d) - \\log(b) - \\log(c) \\]
\\[ SE\\!\\left(\\log(\\mathrm{ROR})\\right)=\\sqrt{\\frac{1}{a}+\\frac{1}{b}+\\frac{1}{c}+\\frac{1}{d}} \\]
\\[ 95\\%\\,CI_{\\mathrm{ROR}}= \\left[ \\exp\\!\\left(\\log(\\mathrm{ROR})-1.96\\,SE\\right), \\exp\\!\\left(\\log(\\mathrm{ROR})+1.96\\,SE\\right) \\right] \\]

PRR and CI

\\[ \\mathrm{PRR}=\\frac{a/(a+b)}{c/(c+d)} \\]
\\[ SE\\!\\left(\\log(\\mathrm{PRR})\\right)= \\sqrt{ \\frac{1}{a}-\\frac{1}{a+b}+\\frac{1}{c}-\\frac{1}{c+d} } \\]
\\[ 95\\%\\,CI_{\\mathrm{PRR}}= \\left[ \\exp\\!\\left(\\log(\\mathrm{PRR})-1.96\\,SE\\right), \\exp\\!\\left(\\log(\\mathrm{PRR})+1.96\\,SE\\right) \\right] \\]

IC and IC025

Let N=a+b+c+d. Smoothed IC approximation in the Atlas:

\\[ \\mathrm{IC}_{\\mathrm{smoothed}} = \\log_2\\!\\left( \\frac{(a+0.5)(N+1)} {(a+b+0.5)(a+c+0.5)} \\right) \\]

Approximate lower 95% bound displayed as IC025:

\\[ \\mathrm{IC}_{025}\\approx \\mathrm{IC}_{\\mathrm{smoothed}}-1.96\\cdot SE(\\mathrm{IC}) \\]

The implemented SE(IC) is derived on log scale and converted to base-2 scale.

Source-specific use in the Atlas

  • FAERS: displays source-level ROR, PRR, and smoothed IC/IC025 from server-side source totals.
  • DK: displays source-level ROR, PRR, and smoothed IC/IC025 from server-side source totals, and also shows a pooled yearly signal when yearly exposure strata are available.
  • WHO: displays source-level ROR, PRR, and smoothed IC/IC025 from server-side source totals; the current implementation does not display a yearly pooled Mantel-Haenszel signal.

DK pooled yearly signal

For DK, when yearly strata are available, each year y contributes a separate 2x2 table with cells a_y, b_y, c_y, and d_y, and total n_y=a_y+b_y+c_y+d_y.

The displayed pooled yearly signal is a year-stratified Mantel-Haenszel pooled odds ratio:

\\[ \\widehat{\\theta}_{MH} = \\frac{\\sum_y \\frac{a_y d_y}{n_y}} {\\sum_y \\frac{b_y c_y}{n_y}} \\]

The interface displays this pooled estimate as the DK "Pooled Yearly Signal" / Mantel-Haenszel odds ratio, together with its 95% confidence interval on the log scale:

\\[ 95\\%\\,CI_{MH} = \\left[ \\exp\\!\\left(\\log(\\widehat{\\theta}_{MH})-1.96\\,SE\\!\\left(\\log(\\widehat{\\theta}_{MH})\\right)\\right), \\exp\\!\\left(\\log(\\widehat{\\theta}_{MH})+1.96\\,SE\\!\\left(\\log(\\widehat{\\theta}_{MH})\\right)\\right) \\right] \\]

Year-to-year heterogeneity is summarized with Cochran's Q and I^2 over the yearly log-odds-ratio estimates:

\\[ Q=\\sum_y w_y\\left(\\log(OR_y)-\\log(OR_{FE})\\right)^2 \\]
\\[ I^2=\\max\\!\\left(0,\\frac{Q-(k-1)}{Q}\\right)\\times 100\\% \\]

Where k is the number of yearly strata contributing to the pooled estimate. In the UI, these are displayed as the pooled odds ratio, its confidence interval, the heterogeneity p-value, I^2, and k_years.

Edge cases

For FAERS and WHO, metrics are suppressed when required cells make computations unstable. For DK source-level ROR/PRR/IC, the Atlas applies a continuity correction of 0.5 when any cell is zero; yearly pooled DK metrics require sufficient yearly strata and are otherwise suppressed.

3.2 Clinical trials summary metrics

Top-5 card is sorted by percent_affected. Substance + AE risk card reports raw weighted incidence and supporting counts from aggregated trial table.

Study-level inputs for meta-analysis

Meta-analysis is run at trial level (one row per drug_name + adverse_event_term + nct_id) using:

  • a: affected in treatment arm
  • n_treat: at-risk in treatment arm
  • c: affected in control arm
  • n_control: at-risk in control arm

Per-study effects

Risk definitions:

\\[ p_{t,i}=\\frac{a_i}{n_{t,i}},\\quad p_{c,i}=\\frac{c_i}{n_{c,i}} \\]

Risk difference (RD):

\\[ RD_i=p_{t,i}-p_{c,i} \\]
\\[ \\operatorname{Var}(RD_i)= \\frac{p_{t,i}(1-p_{t,i})}{n_{t,i}}+ \\frac{p_{c,i}(1-p_{c,i})}{n_{c,i}} \\]

Risk ratio (RR) on log scale (with continuity correction when needed for zero-event cells):

\\[ y_i=\\log(RR_i)=\\log\\!\\left(\\frac{p_{t,i}}{p_{c,i}}\\right) \\]
\\[ v_i=\\operatorname{Var}(y_i)= \\frac{1}{a_i}-\\frac{1}{n_{t,i}}+\\frac{1}{c_i}-\\frac{1}{n_{c,i}} \\]

Zero-event continuity correction is applied to RR components when required to keep logarithms and variances computable.

DerSimonian-Laird random effects

Fixed-effect weights:

\\[ w_i^{FE}=\\frac{1}{v_i} \\]

Fixed-effect pooled estimate:

\\[ \\hat\\mu_{FE}=\\frac{\\sum_i w_i^{FE}y_i}{\\sum_i w_i^{FE}} \\]

Cochran's Q and DL between-study variance:

\\[ Q=\\sum_i w_i^{FE}(y_i-\\hat\\mu_{FE})^2 \\]
\\[ \\tau^2_{DL}=\\max\\!\\left(0,\\frac{Q-(k-1)}{\\sum_i w_i^{FE}-\\frac{\\sum_i (w_i^{FE})^2}{\\sum_i w_i^{FE}}}\\right) \\]

Random-effects weights and pooled estimate:

\\[ w_i^{RE}=\\frac{1}{v_i+\\tau^2_{DL}},\\quad \\hat\\mu_{RE}=\\frac{\\sum_i w_i^{RE}y_i}{\\sum_i w_i^{RE}} \\]
\\[ SE(\\hat\\mu_{RE})=\\sqrt{\\frac{1}{\\sum_i w_i^{RE}}} \\]
\\[ 95\\%\\,CI=\\hat\\mu_{RE}\\pm1.96\\cdot SE(\\hat\\mu_{RE}) \\]

For RR display:

\\[ RR_{pooled}=\\exp(\\hat\\mu_{RE}) \\]

For RD display, the same DL framework is applied directly to RD_i and \\operatorname{Var}(RD_i).

The DerSimonian-Laird estimator assumes approximately normally distributed study effects and may underestimate uncertainty when the number of studies is small.

Heterogeneity

\\[ I^2=\\max\\!\\left(0,\\frac{Q-(k-1)}{Q}\\right)\\times100\\% \\]

Substantial heterogeneity may indicate differences in study populations, design, or reporting practices rather than true variation in drug effects.

The card reports:

  • Meta RR (DL) with 95% CI
  • Meta RD (DL) with 95% CI
  • I^2 and \\tau^2
  • Model label: DerSimonian-Laird random effects

3.3 PubMed summary metrics

  • publication_count: unique PMID count per substance + AE
  • Recent studies: top 3 unique PMIDs by pub_year DESC

3.4 Signal consensus index

The Signal Consensus Index (SCI) is a cross-source summary score for one substance + adverse event pair. It combines source-level evidence from DK, UK, FAERS, and clinical trials into a 0-100 score, then adds a separate confidence score that reflects source coverage across the available contributors.

Sources included

The SCI uses four primary sources:

  • DK: preferred signal is the lower 95% CI bound of the year-stratified Mantel-Haenszel odds ratio; if that is unavailable, DK falls back to the same disproportionality score structure used for FAERS.
  • UK: lower confidence bounds from ROR and PRR plus IC025, using the same disproportionality-score structure as FAERS.
  • FAERS: lower confidence bounds from ROR and PRR plus IC025.
  • Clinical trials: lower 95% CI bound of the pooled random-effects risk ratio, scaled directly without additional heterogeneity or study-count attenuation.

WHO is not included in the SCI numeric average. It is used only as a concordance badge.

Scaling functions

Several source metrics are first converted to unit-scale quantities in the interval [0,1].

\\[ \\operatorname{clamp}_{[0,1]}(x)=\\min(1,\\max(0,x)) \\]
\\[ f_{ratio}(x)= \\begin{cases} 0, & x \\le 1 \\\\ \\operatorname{clamp}_{[0,1]}\\!\\left(\\dfrac{\\ln(x)}{\\ln(3)}\\right), & x > 1 \\end{cases} \\]
\\[ f_{IC}(x)= \\begin{cases} 0, & x \\le 0 \\\\ \\operatorname{clamp}_{[0,1]}\\!\\left(\\dfrac{x}{2}\\right), & x > 0 \\end{cases} \\]

f_ratio is used for lower confidence bounds of odds ratios, proportional reporting ratios, and trial risk ratios. f_IC is used for IC025.

FAERS source score

Let ROR_{LCL} be the lower 95% confidence bound for ROR, PRR_{LCL} the lower bound for PRR, IC_{025} the lower bound for IC, and cases the source case count. The case-count attenuation factor is:

\\[ F_{cases}=\\operatorname{clamp}_{[0,1]}\\!\\left(\\frac{\\log_{10}(cases+1)}{3}\\right) \\]

The FAERS source score is:

\\[ S_{FAERS}=100\\cdot\\left(0.40\\,f_{ratio}(ROR_{LCL})+0.35\\,f_{ratio}(PRR_{LCL})+0.25\\,f_{IC}(IC_{025})\\right)\\cdot F_{cases} \\]

FAERS is counted as directionally positive when:

\\[ ROR_{LCL}>1,\\quad PRR_{LCL}>1,\\quad IC_{025}>0 \\]

UK source score

The UK source score uses the same disproportionality structure as FAERS:

\\[ S_{UK}=100\\cdot\\left(0.40\\,f_{ratio}(ROR_{LCL})+0.35\\,f_{ratio}(PRR_{LCL})+0.25\\,f_{IC}(IC_{025})\\right)\\cdot F_{cases} \\]

UK is counted as directionally positive when:

\\[ ROR_{LCL}>1,\\quad PRR_{LCL}>1,\\quad IC_{025}>0 \\]

Clinical trials source score

Let RR_{LCL} be the lower 95% confidence bound for the pooled random-effects risk ratio. The clinical-trials source score is:

\\[ S_{Trials}=100\\cdot f_{ratio}(RR_{LCL}) \\]

Clinical trials are counted as directionally positive when RR_{LCL} > 1.

DK source score

DK uses a preferred exposure-anchored score when yearly Mantel-Haenszel output is available. Let MH_{LCL} be the lower 95% confidence bound of the pooled yearly Mantel-Haenszel odds ratio. The preferred DK score is:

\\[ S_{DK}=100\\cdot f_{ratio}(MH_{LCL}) \\]

DK is counted as directionally positive when MH_{LCL} > 1.

If the yearly Mantel-Haenszel signal is unavailable, the DK score falls back to the disproportionality-score formula used for FAERS:

\\[ S_{DK}=100\\cdot\\left(0.40\\,f_{ratio}(ROR_{LCL})+0.35\\,f_{ratio}(PRR_{LCL})+0.25\\,f_{IC}(IC_{025})\\right)\\cdot F_{cases} \\]

Weighted SCI aggregation

Let the fixed source weights be:

\\[ w_{DK}=0.25,\\quad w_{UK}=0.25,\\quad w_{FAERS}=0.25,\\quad w_{Trials}=0.25 \\]

If some sources are unavailable for a given substance + AE pair, the Atlas renormalizes over the available sources only. If A is the set of available sources, then:

\\[ W_{used}=\\sum_{j \\in A} w_j \\]
\\[ SCI=\\sum_{j \\in A}\\left(\\frac{w_j}{W_{used}}\\right)S_j \\]

If no source score is available, SCI is not displayed.

Confidence score

The confidence score is a source-coverage measure only. If |A| is the number of available sources among DK, UK, FAERS, and clinical trials, then:

\\[ Confidence=100\\cdot\\frac{|A|}{4} \\]

Direction and WHO concordance

The consensus direction is derived from the agreement proportion:

\\[ Direction= \\begin{cases} \\text{positive}, & Agreement \\ge \\dfrac{2}{3} \\\\ \\text{negative}, & Agreement \\le \\dfrac{1}{3} \\\\ \\text{neutral}, & \\text{otherwise} \\end{cases} \\]

WHO is evaluated only as a concordance badge. If WHO has no direction, the badge is WHO: unavailable. Otherwise, the badge is WHO: concordant when WHO direction matches the SCI direction, or when either direction is neutral; it is WHO: discordant only when both are non-neutral and opposite.

Displayed interpretation bands

The UI classifies the final result as follows:

\\[ \\text{Strong} \\iff SCI \\ge 70 \\text{ and } Confidence \\ge 70 \\]
\\[ \\text{Moderate} \\iff SCI \\ge 50 \\text{ and } Confidence \\ge 50 \\text{ but not Strong} \\]
\\[ \\text{Weak} \\iff \\text{otherwise} \\]

SCI is an exploratory consensus metric and should not be interpreted as proof of causality.

4) Application Logic and Display Assumptions

4.1 Search behavior

  • Substance list from dedicated source table
  • AE suggestions merged from FAERS, clinical trials, PubMed
  • Mix of eq and ilike behavior across endpoints

4.2 Pagination and caching

  • Configured page-size settings drive payload and table pagination
  • FAERS signal uses cached server-side totals to reduce repeated requests
  • PubMed and trial modals use local response caching

4.3 Charts

  • FAERS charts read pre-aggregated by-year/sex/age structures
  • Trials sex chart sums treatment + control participant counts across included rows (not unique persons)
  • Trials age chart uses mean+SD normal approximation and tracks unknown age separately

Important interpretation note: Clinical trials sex and age charts represent demographics of the included trial cohorts selected by the current substance + AE query context. They are not AE-specific participant distributions (that is, not restricted to only participants who experienced the selected adverse event).

4.4 SCI Explorer behavior

  • The explorer loads SCI rows from the precomputed consensus dataset with server-side pagination and filter composition.
  • Filter controls include substance, adverse event, SCI level, SCI/confidence ranges, and WHO badge state.
  • Summary counters are computed for the full filtered result set, independent of current page rows.
  • Row click expands per-pair details (source scores, source usage, WHO badge, and interpretation note).
  • Download CSV exports the full filtered result set in chunks, not only the visible page.

5) Data refresh & reproducibility

5.1 Pipeline overview

  • FAERS ETL: ingest, dedup/versioning, aggregate outputs with date-stamped artifacts
  • Trials ETL: extraction, arm mapping logic, effect fields and demographics
  • PubMed ETL: query and extraction logic, year parsing, output artifacts
  • Supabase load: copy commands, NA/null handling, downstream refresh steps
  • Recommended build stamp in UI footer/wiki home for traceability

6) Limitations, bias, and appropriate use

  • Reporting bias and underreporting in spontaneous-reporting sources (DK, UK, FAERS, WHO)
  • PubMed publication bias
  • Clinical trial selective reporting
  • Confounding by indication/channeling
  • Duplicate counting pitfalls across sources
  • WHO year/sex/age summaries are currently substance-level and not AE-specific in AE mode
  • Source windows differ (for example DK/UK AE-mode year constraints), which can affect cross-source comparisons
  • Counting units differ across sources (cases, aggregated counts, trial summaries, publications)
  • No time-to-event modeling in current metrics
  • No dose-response modeling in current metrics
  • No unified causal inference framework across sources
  • No patient-level covariate adjustment in displayed summaries
  • Metrics across DK, UK, FAERS, WHO, clinical trials, and literature are not directly comparable due to differences in data generation processes

Atlas metrics are for signal detection and prioritization, not standalone causal inference.

7) FAQ

Why does FAERS show more events than trials?

Different source design and capture behavior: spontaneous reporting vs structured study outputs.

Why do some terms have multiple organ systems?

A single PT can map to multiple SOCs; Atlas preserves that mapping.

Why is my AE missing from PubMed/trials?

Term mismatch, coverage limitations, or filter behavior can exclude specific terms.

Why do sex/age charts show unknowns?

Unknown appears when denominators exist but demographic fields are absent or non-standard.

Why can WHO signal cases differ from WHO year/sex/age cards?

WHO signal is AE-specific, while WHO year/sex/age cards are currently substance-level summaries in the current data model.

Why can DK or UK totals differ between cards?

Cards may use different aggregations (for example signal 2x2 totals versus chart-specific grouped summaries) and can be constrained by specific year windows.

Why are ROR/PRR/IC sometimes blank?

Metrics are suppressed when counts do not support stable computation.