Methodology – ESG AI Leaderboard

Transparent, multi-model, industry-weighted ESG scoring built on publicly available evidence. Read how scores are produced, where the data comes from, what we're confident about, and where the limits are.

Overview

The ESG Purposefy AI Leaderboard ranks publicly traded companies on three pillars — Environmental, Social, and Governance — using qualitative scores generated by multiple independent AI reasoning models. Each model performs deep web research against publicly available information (sustainability reports, news, regulatory filings, rating-agency analyses) and returns a structured 0–100 score per pillar with cited sources and reasoning.

We deliberately use multiple models, run scoring quarterly, and weight pillars by industry so the ranking reflects what actually matters for each sector — not a one-size-fits-all checklist.

Pillars & what we score

Every company is scored on three pillars on a 0–100 scale. The descriptions below summarize the kinds of evidence each pillar weighs. Reasoning models are instructed to cite specific public sources for each assessment.

Environmental (E)

Carbon emissions and climate strategy, energy mix, water and waste management, biodiversity impact, supply-chain environmental risk.

Social (S)

Workforce diversity, equity & inclusion, labor practices, health & safety, community impact, customer privacy and data protection, human rights.

Governance (G)

Board independence and composition, executive compensation, business ethics, transparency and disclosure, risk management, anti-corruption.

A pillar score of 0 means the model could not find sufficient publicly available evidence to make an assessment (we treat this as a gap, not a failure). A score of 100 represents best-in-class performance against industry peers as of the latest assessment quarter.

Industry-specific weighting

ESG materiality varies dramatically by sector. An energy major's environmental performance matters more than a financial services firm's; a bank's governance is more material than a software vendor's. Instead of equal-weighting E, S and G everywhere, we apply per-industry weights when computing the total score, so the ranking reflects what the data tells us actually matters in each sector.

Total score is calculated as: Total = (E × w_E) + (S × w_S) + (G × w_G), where the three weights sum to 1.0.

Industry	E	S	G	Rationale
Oil & Gas	45%	25%	30%	Carbon, pollution, resource depletion are primary risks
Mining & Metals	45%	25%	30%	Carbon, pollution, resource depletion are primary risks
Utilities & Power	45%	25%	30%	Carbon, pollution, resource depletion are primary risks
Renewable Energy	45%	25%	30%	Carbon, pollution, resource depletion are primary risks
Banking & Financial Services	15%	30%	55%	Governance, risk management, anti-corruption are paramount
Insurance	15%	30%	55%	Governance, risk management, anti-corruption are paramount
Asset Management & Investment	15%	30%	55%	Governance, risk management, anti-corruption are paramount
Technology / Software	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Semiconductors & Hardware	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Telecommunications	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Pharmaceuticals	20%	45%	35%	Patient safety, drug access, clinical trial ethics
Healthcare Services	20%	45%	35%	Patient safety, drug access, clinical trial ethics
Medical Devices & Equipment	20%	45%	35%	Patient safety, drug access, clinical trial ethics
Retail	30%	40%	30%	Supply chain labor, packaging, community impact
Consumer Goods & Packaged Foods	30%	40%	30%	Supply chain labor, packaging, community impact
Food & Beverage	30%	40%	30%	Supply chain labor, packaging, community impact
Apparel & Luxury	30%	40%	30%	Supply chain labor, packaging, community impact
Restaurants & Hospitality	30%	40%	30%	Supply chain labor, packaging, community impact
Aerospace & Defense	40%	30%	30%	Resource intensity + worker safety + chemical management
Automotive	40%	30%	30%	Resource intensity + worker safety + chemical management
Industrial Manufacturing	40%	30%	30%	Resource intensity + worker safety + chemical management
Chemicals	40%	30%	30%	Resource intensity + worker safety + chemical management
Construction & Engineering	40%	30%	30%	Resource intensity + worker safety + chemical management
Transportation & Logistics	45%	25%	30%	Fleet emissions, fuel transition, worker conditions
Airlines	45%	25%	30%	Fleet emissions, fuel transition, worker conditions
Real Estate	40%	30%	30%	Energy efficiency, materials, community displacement
Media & Entertainment	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Agriculture & Agribusiness	30%	40%	30%	Supply chain labor, packaging, community impact
Professional Services & Consulting	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Diversified / Conglomerate	33%	33%	34%	Equal weighting as fallback
E-Commerce & Internet Services	20%	40%	40%	Data privacy, AI ethics, board governance dominate
Biotechnology	20%	45%	35%	Patient safety, drug access, clinical trial ethics
Retail & Consumer Goods	30%	40%	30%	Supply chain labor, packaging, community impact
Hospitality & Tourism	30%	40%	30%	Supply chain labor, packaging, community impact
Pharma	27%	33%	40%	New industry added.
BioPharma	23%	43%	34%	rational details are here

Weights are reviewed and updated by the Purposefy ESG team and may change as scoring practices evolve. The values above are pulled live from our database — they always match what the leaderboard is currently using.

Multi-model scoring

We score every company with multiple state-of-the-art reasoning models from different AI labs. Each model independently performs deep web research, returns structured per-pillar scores with cited sources, and is evaluated as a peer rather than an oracle. The leaderboard ranks companies on the cross-model average — a single hallucination from any one model can't determine a company's position.

OpenAI

o4-mini Deep Research

Deep research model (background mode) — extensive web research for thorough, evidence-backed ESG analysis (~2-10 min per company)

Anthropic

Claude Sonnet 4.6

Batch API with extended thinking + web search — 50% cost savings (~1-60 min per batch)

xAI

Grok 4.1 Fast Reasoning

Batch API with reasoning + web search — 10x cheaper than beta, 50% batch savings

Each score row stores the model's full reasoning and source citations, which you can inspect on the leaderboard detail view. When models disagree, the spread is visible in the "Model comparison" tab — that variance itself is a useful signal of how much consensus exists about a company's ESG performance.

Quarterly scoring & trends

Scoring runs are executed on a quarterly cadence (e.g. 2026-Q1). Each quarter we re-research every visible company across all supported models and store the new scores alongside the prior quarter's, so historical performance is preserved rather than overwritten.

On the leaderboard, the displayed ranking uses the latest quarter by default; you can switch quarters from the filter bar. Each company's detail view includes a Quarterly historytab that shows every quarter's per-pillar scores with trend arrows and a short "quarter-over-quarter changes" summary explaining what moved.

We deliberately don't smooth or back-fill scores — if a company's E score jumped from 62 to 78 between quarters, that's the model's assessment, not a rolling average. The cited sources for each quarter explain why.

Sources we use

Reasoning models are instructed to draw on publicly available evidence and to cite every source they rely on. The leaderboard's detail dialog groups sources by type so you can quickly judge their credibility for any given company:

Sustainability reports — Annual ESG / sustainability / impact reports published by the company itself, including 10-K and proxy disclosures where relevant.
Rating agencies— Public assessments from established ESG ratings bodies (e.g. MSCI, S&P Global, Sustainalytics) when available in the source's public summary form.
Regulatory filings — SEC filings, EU CSRD disclosures, and other mandatory regulatory disclosures.
News & investigations — Reputable news outlets and investigative journalism covering material ESG events (controversies, lawsuits, settlements, awards).

We exclude social media, anonymous blogs, sponsored content, and any source the model can't link to. Companies are scored on what they have publicly disclosed and what reputable third parties have reported — not on speculation.

Limitations & assumptions

ESG scoring is inherently judgemental. We've designed this leaderboard to be transparent about what it can and can't tell you.

Public data only. Models score based on publicly available evidence at the time of the run. Companies that disclose less will tend to score lower on pillars where evidence is thin — that's a feature of the methodology, not a bug.
Not investment advice. Scores are AI-generated qualitative assessments and should not be used as the sole basis for investment, procurement, partnership, or hiring decisions.
Model fallibility. Reasoning models can misread sources or hallucinate. We mitigate this with multi-model averaging, source citations, and quarterly variance analysis — but no automated assessment is infallible.
Industry coverage. The leaderboard currently focuses on a curated list of large publicly traded companies. Coverage expands quarter over quarter.
Time lag. A company's latest sustainability report may be 6–18 months old. The model captures what's currently public, not real-time performance.

Questions, corrections, or requests to be reviewed in the next quarter? Reach out via the contact page.

Methodology version & updates

This methodology evolves as our scoring practices mature. Material changes — new models, revised industry weights, expanded source types — are versioned and dated so you can tell whether a leaderboard view used today's methodology or an earlier one.

Current version:v1.0·Last updated May 6, 2026