Hosted and managed by the University of Alabama in Huntsville

Agentarium

scientific agent registry
ToolsGovernanceSign in with ORCID

CMR CARE Agent

v1.0.0active

Earth science dataset discovery agent using NASA's Common Metadata Repository (CMR). Helps users discover, organize, and understand NASA Earthdata datasets across atmosphere, ocean, land, cryosphere, biosphere, and solid earth domains. Outputs are delivered via a structured schema and interactive chat for clarification, guidance, approval gates, or status updates.

by Nidhi Jha (NASA-IMPACT) · earth-science · search-retrieval

tested on
gpt-5.2
license
Apache-2.0
framework
openai-agents-sdk
citable url
https://agentarium.science/a/cmr-care/v/1.0.0
Guardrails & validationexplicitly declared by the author — shown so you can judge, not registry-verified

Guardrails declared — author-stated

Non-prescriptive
Never recommends, selects, or endorses datasets — only surfaces metadata.
User-approval gate (multi-hop)
Indirect/multi-hop discovery requires explicit user approval before running.
Search confirmation
Never executes CMR queries without explicit user confirmation of mapped GCMD terms.
Earth-science scope lock
Refuses queries outside Atmosphere/Ocean/Land/Cryosphere/Biosphere/Solid Earth.
No fabrication
Missing/ambiguous metadata is reported as unknown; never inferred or invented.
No credentialed actions
Never requests credentials, performs downloads, or stores tokens.
Assumption surfacing
Every assumption labeled 'Assumed' or 'Default applied' under Interpreted Scope.

Validation methodology — author-stated

Tested50 known Earth-science queries with ground-truth CMR concept IDs (seed-record placeholder).
DataCurated query set from NASA-IMPACT teams (seed-record placeholder).
MetricReference collection appears in ranked top-5 (seed-record placeholder).
ResultSeed record — author should publish a real validation before public release.
CaveatThis is a registry seed record; the validation block was filled with placeholders. The author is encouraged to submit a new version with real numbers and a real caveat.
Required tools — live healthlive status of MCP endpoints this agent depends on; not registry-verified
cmr_mcp_serverapproval: never
healthy

NASA Common Metadata Repository (CMR) MCP server. Read-only Earth-observation catalog search.

allowed: search_collections, get_granules, get_collection_metadata

Evaluation evidencefetched live from akd-labs · author-published, machine-verifiable
benchmark dataset
CMR CARE Gold Datasetv1.0.0

SME-curated dataset of Earth-science queries with reference CMR concept IDs.

view run ↗
score
27/31 (87%)
queries
31
total cost
$2.252086
model
gpt-5.2
input tokens
597,930
cached tokens
0
output tokens
86,122
reasoning tokens
58,837

Tool usage

search_collections
216
get_collection_metadata
44
get_granules
29
gcmd_keyword_lookup
10

Per-query cost breakdown (8)

#queryoutcometotal $in $out $reasoning $
Q1List the Chandra Obs IDs for Chandra HETGS observations of NGC 1068…Y$0.058067$0.025195$0.007224$0.025648
Q2What are the Chandra Obs Proposals for Chandra HETGS observations…Partial$0.062685$0.018683$0.010360$0.033642
Q3What is the total exposure time in kiloseconds for all the Chandra…Y$0.063864$0.021010$0.013412$0.029442
Q4Provide the HEASARC Data Product URLs for Chandra HETGS observations…Y$0.099692$0.024092$0.012446$0.063154
Q5List the Chandra Obs IDs for Chandra HETGS observations of PSR…Y$0.033600$0.020706$0.004998$0.007896
Q6What is the total exposure time in kiloseconds for all the Chandra…Y$0.042327$0.021243$0.007644$0.013440
Q7Provide the HEASARC Data Product URLs for Chandra HETGS observations…Y$0.092538$0.019500$0.019726$0.053312
Q8What are the Chandra Obs Proposals for Chandra HETGS observations…Partial$0.080100$0.048613$0.011634$0.019853
Sample traces — inspectable end-to-end
Reproductionsindependent runs by other scientists — Tier 5 trigger
Ran this agent yourself? File an independent reproduction — it can move the listing to Tier 5.Sign in to reproduce
Benchmarks (other)author-stated — runs outside the primary published evaluation
benchmarkmetricresultvs. baselinedate
GCMD-mapping accuracy
GCMD keyword mapping for free-text Earth science queries.
dataset: 200 curated queries with reference GCMD terms (NASA-IMPACT internal).
link ↗
Top-1 keyword match0.86TF-IDF lookup baseline at 0.622026-05-01
CMR-search reproducibility
Same query run 10x; reference dataset appears in top-5 each time.
dataset: 30 known study-area queries, 10 runs each.
Variance of top-5 set across runs0 — fully deterministic at reasoning_effort=medium2026-05-12
Other disclosuresas described by the author
Intended use
Helps Earth scientists discover candidate NASA Earthdata collections before formal pipelines. Built for exploratory, human-in-the-loop discovery — the user retains control over scientific framing, assumptions, and final dataset selection.
Out of scope
Does not recommend, select, endorse, or judge datasets for suitability. Does not assess data quality, accuracy, or uncertainty. Does not operate outside Earth science domains. Not for downloads or credentialed access.
Known failure modes
Ambiguous place names may map to wrong region; named regions are kept at metadata level rather than forced to bounding boxes. Sensor/instrument aliases occasionally miss newer collections. Multi-hop (Semantic Scholar) discovery is rate-limited and may time out on broad queries.
Worked example

input

Can you find me datasets about sea ice?

output

Clarifying Questions:
  1. Which sea-ice variable family? (extent, concentration, thickness, drift, age)
  2. Region — Arctic, Antarctic, or global?
  3. Time range?
If you'd rather skip these and run the CMR search with reasonable defaults, just reply
'skip' — I'll proceed and surface every assumption I made.
What this listing is. A structured, format-conformant submission, screened for topic and obvious safety issues. The registry does not verify that the agent is correct, that it works, or that the author's disclosures are accurate. Evaluate before relying on it for research.