NASA Project Exploration & eXtraction (NPEX) is a specialized search system built to improve access to NASA’s research and technology portfolio. It transforms complex and varied datasets into a unified searchable knowledge base, making scientific and technical work easier to discover.
Technical Evolution & Performance #
The system was developed in stages, moving from simple text matching to a hybrid retrieval approach. The final architecture combines the precision of keyword search with the broader understanding of semantic search, producing the strongest overall results.
| Model Stage | Architecture Focus | Effectiveness (MAP) |
|---|---|---|
| Base | Simplistic lexical matching (Baseline) | 0.462 |
| Tuned | Custom schema & field boosting | 0.483 |
| Advanced | Technical synonymy & stop-word retention | 0.499 |
| Semantic | AI-powered vector embeddings (OpenAI) | 0.510 |
| Hybrid | RRF Fusion of Lexical + Semantic | 0.560 |
The hybrid model achieved the best result, reaching a Mean Average Precision (MAP) of 0.560. This showed that combining lexical and semantic retrieval helped recover relevant documents that keyword matching alone could miss.
The Architecture #
Behind the search interface is a data pipeline designed to handle specialized technical content at scale.

The pipeline includes a custom Python ETL process that collects project narratives directly from the NASA research portal. It achieved a 98.4% extraction success rate and gathered detailed descriptions and maturity ratings for more than 16,000 projects. The data is then cleaned and normalized, including ISO 8601 date formatting and matching across different technology taxonomies.
For storage and retrieval, the system uses Apache Solr for indexing and MongoDB for metadata persistence. Semantic retrieval is supported with 3072-dimensional OpenAI embeddings, which help the system identify relevant content even when the query does not use the exact same wording as the source material.
Evaluation Pipeline #
To measure retrieval quality, the project used an evaluation pipeline based on the standard Text Retrieval Conference (TREC) framework.

To build a manageable ground-truth set from a large collection, the evaluation used a pooling strategy that merged the top 100 results from each retrieval model into a single judgment pool. Relevance assessment followed a two-stage process that combined automated screening with human review.
GPT-4o mini was used to score the pooled results and provide written justifications. Cases with uncertain scores were then reviewed manually, leading to the final resolution of 48 borderline document-topic pairs. Performance was measured with standard information retrieval metrics, including MAP, P@10, and nDCG.
Documentation #
More details about the system’s architecture, methodology, and evaluation can be found in the technical academic report and in the repository below.