2025 UAP Workshop: Narrative Data, Infrastructures, and Analysis Workshop Synthesis and Recommendations Associated Universities, Inc. (AUI) Workshop sponsored by: All-domain Anomaly Resolution Office (AARO) Table of Contents Breakout Session #1: Identifying, accessing, and integrating data sources [DAY 1] ............ 5 Breakout Session #2: Pathways for data analysis and interpretation at scale [DAY 1] ......... 5 Breakout Session #3: Cleaning, organizing, and linking data: What can and should be done? Relevant data types and sources of UAP narrative reports ..................................................... 7 Linking data sources and developing a unified approach ....................................................... 8 Executive Summary From both government and scientific perspectives, advancing Unidentified Anomalous Phenomena (UAP) research requires rigorous data collection, standardization, and analysis. Most UAP reports are fragmented, sparse, and unstructured, ranging from military logs and pilot reports to archival records, social media posts, and civilian testimony. Interpreting this heterogeneous data at scale is complicated by barriers of classification, translation, and retention. At the same time, UAP reports also present opportunities for novel methods of integration, metadata design, and analysis. The 2025 UAP Workshop on Narrative Data, Infrastructures, and Analysis brought together 40 participants from government, academia, and independent research organizations. The meeting focused specifically on the challenges and opportunities of working with UAP narrative reports and related data sources. Workshop discussions highlighted several cross-cutting findings. First, effective progress requires clear standards and common reporting templates, with robust metadata capturing time, location, provenance, morphology, and contextual details. Second, linking across datasets military and civilian, to include archival, environmental, and technical - must balance interoperability with privacy, ethical, and classification constraints. Third, credibility is best assessed through corroboration, but for efficiency there is a need for automated methods to filter reports and surface the most promising for investigation. Fourth, AI and machine learning tools offer capacity for transcription, triage, clustering, and semantic search, but they must be deployed cautiously to avoid hallucination, bias, and amplification of hoaxes. Human oversight and iterative workflows remain essential. Finally, the workshop underscored the importance of community engagement and trust-building, encouraging the scientific community to cultivate a sustainable community of practice for UAP research with further work and convenings. This report concludes with recommended actionable next steps to establish metadata templates; combine human expertise with AI tools; leverage existing tools and infrastructures; support triage with awareness of bias; convene community members; facilitate qualitative integration in investigation, such as interviews; prioritize collection of new high-quality reports while integrating historical data; and improve reporting interfaces to enhance accessibility, collaboration, and transparency. Together, these findings and recommendations point toward a multi-disciplinary and community-engaged approach to UAP narrative data, which may influence how and where technical sensors are deployed. Introduction and Purpose Understanding the nature of Unidentified Anomalous Phenomena (UAP) has emerged in recent years as a pressing area of inquiry in need of rigorous scientific approaches, as well as cross- disciplinary, cross-sector and international collaboration. Analyzing reports of UAP related sightings and experiences presents unique challenges due to the large-scale, heterogeneous, and qualitative nature of the reports originating from military and civilian sources. These reports typically lack standardized metadata, making comparative analysis difficult. Additionally, the integration of UAP reports from disparate sourcessuch as military databases, online reporting systems, digital and digitized archival records, and social mediaposes significant challenges for harmonization and verification of data and construction of evidence. The complexity of these datasets requires innovative data infrastructure solutions to enhance reliability, accessibility, and interoperability. The workshop explored these challenges and sought strategies to improve UAP data standardization, integration, and analytical approaches. Recent advances in artificial intelligence (AI) and machine learning present both opportunities to address challenges, along with potential hazards. Tools such as Large Language Models (LLMs) can assist with transcription, clustering, and pattern detection at scale, but they risk introducing bias and hallucination. Responsible use of AI to help organize, analyze, and integrate UAP reports at scale requires evaluation, human oversight, and shared frameworks for interpretation, alongside new models to ensure transparency and trust across diverse research communities. Therefore, the overall purpose of the workshop was to gather perspectives from the broader scientific community and advance the science of UAP. About the Workshop The workshop centered on the collection, organization, and interpretation of UAP reports, with attention to the challenges and opportunities of working with narrative data. The primary objectives established for the workshop were to: Assess the current landscape of UAP reporting systems and data repositories; Identify key challenges and gaps in UAP data collection, standardization, and accessibility; Explore methodologies for data analysis and pattern recognition in UAP reports; Nurture trust and collaboration between researchers, government agencies, and civilian organizations; and Propose recommendations for developing a robust UAP data infrastructure. Outside participation was limited due to budget constraints and institutional capacity. Potential participants were identified based on demonstrated expertise in one or more of the following areas: AI and machine learning; UAP research and data; physical and natural sciences; information and data science; archives and records; analysis methods; cyberinfrastructure and computation; and human and social sciences. If an invitee declined to attend, we extended an invitation to another candidate with similar skills/experience identified through online research and word of mouth. The final workshop included 40 participants. Establishing open dialogue Participant privacy was an important consideration throughout workshop planning, and Institutional Review Board (IRB) approval governed data collection and security for the workshop. The organizing committee further wished to establish a neutral environment in which participants holding diverse beliefs and backgrounds would feel comfortable engaging. It was very important that those attending the workshop felt comfortable sharing their thoughts and ideas without being concerned about what others might say or do. The planning committee also decided not to publicize the workshop online beforehand to limit outside attention and encourage comfort and open discourse among an intimate group of participants. Participants were urged to avoid taking photos or attributing statements to individuals without permission. The organizers made efforts to accommodate privacy concerns after they identified a final list of attendees. This Name tag options: individuals could simply list their first name with no institutional affiliation; Individuals could choose to remove themselves from some sessions or conversations if they felt uncomfortable engaging in various topics; Photographing other attendees was not permitted unless an attendee received consent from all individuals who appeared in a photo; and Respect for all and approaching conversations with an open mind was a requirement for participation. If an individual did not feel this was possible, they were asked to not attend. See email communication sent to all attendees in Appendix B: Guidelines for Conduct. Workshop Summary Agenda overview The event began with a casual, pre-workshop networking social in the evening of August 4, 2025. The organizers provided welcome and opening remarks on the morning of August 5, 2025. Brief participant introductions followed these remarks. A keynote address about the importance of good UAP data primed participants for the first breakout session (Identifying, accessing, and integrating data sources), held before breaking for lunch. The afternoon of August 5, 2025 began with a plenary talk, followed by the first panel discussion, Opportunities and challenges with AI, and a second breakout session (Pathways for data analysis and interpretation at scale). Day 1 concluded with a brief whole group discussion. A workshop dinner was held at a restaurant near the workshop venue. Day 2 began with a second plenary talk and second panel discussion, Harmonizing qualitative and quantitative perspectives on narrative data. After lunch, a series of lightning talks were delivered by participants ahead of the final breakout session (Cleaning, organizing and linking data: What can and should be done?). Throughout the event, the organizing team collected notes that were later transcribed and anonymized. For each breakout session, moderators collected records, and notetakers were assigned to further ensure a robust record of the workshop proceedings. Breakout Discussion Summaries Prompts for each breakout session are included in Appendix D: Breakout Session Prompts. Breakout Session #1: Identifying, accessing, and integrating data sources [DAY 1] The first breakout session addressed central challenges of UAP research. Discussions revealed the scope of the UAP data landscape as a patchwork of historical case files, contemporary narrative reports, sensor-based data (radar, imagery, flight data), and environmental or contextual datasets (weather, astronomical, seismological). Participants expressed enthusiasm for the potential to link these disparate sources, but they also acknowledged the barriers posed by inconsistency in metadata, classification restrictions, missing or inaccessible records, and stigma around UAP reporting. Despite these challenges, groups converged on the outlook that with clear standards, prototype integration projects, and intentional collaboration across organizations, it is possible to create interoperable and sharable datasets that would enable more rigorous and scalable analysis of UAP reports. Breakout Session #2: Pathways for data analysis and interpretation at scale [DAY 1] The second breakout session explored methods and limitations for analyzing UAP narrative data. Across groups, participants grappled with the tension between extracting operationally useful signals and respecting the experiential, cultural, and historical richness embedded in reports. Overall, groups agreed that UAP narratives cannot be reduced to a single analytic approach. Corpus-level methods (time/space clustering, keyword trends, statistical correlation, graph analysis) are useful for pattern detection and hypothesis generation, while narrative/experiential methods (phenomenology, discourse analysis) are useful for preserving meaning, cultural context, and witness voices. Infrastructures should allow these modes to coexist. Breakout Session #3: Cleaning, organizing, and linking data: What can and should be done? [DAY 2] The third and final breakout activity analyzed the structure of a hypothetical online reporting form that has collected 1,000 UAP reports stored as PDF files to identify possibilities for data analysis with the data collected, as well as potential improvement of the form. The discussion led to the following overarching suggestions that are broadly informative for online UAP reporting 1.Intake flow and structure: Begin with a free-text box (and optional audio upload) where the witness provides their account in their own words. Use AI-assisted extraction to propose structured fields, which the witness can then confirm or correct. Frame questions around what was perceived (angular size, shape, movement, sound, effects) rather than presumed properties (exact distance, solid object dimensions). 2. Additions to the form: Ask witnesses to explain how they estimated size, distance, or speed (i.e. context Capture whether this has happened before and, if so, how often. Instead of mass sighting: yes/no, include approximate numbers of witnesses. Include a field for whether the object seemed to react to observer presence. Add examples of technological effects (e.g., radio static, car failure) and basic prompts about feelings or aftereffects that could be informative (e.g., Did you discuss this with others? Would you want professional/peer support?). Automatically ingest and display photo metadata (camera model, timestamp, location), giving users the option to redact sensitive fields. 3. Standardization and cleaning: Accept location information including city/address/zip/latlong, with simple guidance and drop-downs, and normalize on the back end. Enforce a single-entry format for dates and times (calendar widget or drop-downs). Allow multiple inputs for units (imperial/metric) but convert and store consistently. Include structured numeric fields for object count and multiple objects, with adaptive follow-up to describe each object separately. 4.Taxonomical considerations: Provide a concise taxonomy of common shapes (disk, sphere, triangle, cigar, other) but allow free-text for unusual forms. Update descriptive references for cultural familiarity (using objects such as coins or debit card to estimate size) and internationalize/translate forms for broader accessibility. 5. Integration and linkage: Include a field to indicate whether the event was reported elsewhere (NUFORC, MUFON, FAA, etc.). Design the schema so reports can be linked to FAA/NASA Aviation Safety Reporting System (ASRS) data, Automatic Dependent Surveillance-Broadcast (ADS-B) flight tracks, weather radar, astronomical databases, fireball networks, etc. Enable dynamic follow-ups for multiple objects, multiple witnesses, or sequential events. 6. Governance and trust: Give reporters clear control over what information (such as geolocation, photo metadata) is shared publicly. Commit to aggregated, de-identified data releases (maps, trend summaries) to build trust without encouraging hoaxes. Light-touch well-being questions were suggested, to help identify if respondents would like professional or peer follow-up without stepping into clinical assessment. Outcomes and Recommendations Synthesis of Findings Relevant data types and sources of UAP narrative reports Participants emphasized that UAP research requires drawing on a diverse ecosystem of data, extending beyond witness testimony. Primary narrative reports in formats ranging from PDFs and CSVs to emails and oral histories remain central, offering firsthand accounts that, when digitized and transcribed as needed, can be structured for analysis. These reports are complemented by smartphone photos and videos, which are widely available but often of poor quality, though improving over time. Government sources are handling both classified and unclassified records, including finished intelligence and historic documents. Military reports and ship logs are particularly robust, providing structured information on platforms, flight plans, and pilots, while the FAA continues to collect pilot reports. Other data streams include social media posts, which are often multimodal (such as online and social media videos); international partner databases; and structured technical or scientific sensor data, such as radar or spectrum analyses. Supplementary contextual data is also critical, including flight and weather records, seismological data, satellite imagery, and even doorbell videos or CCTV systems can corroborate sightings. Barriers and challenges in data collection and use Despite many potential sources of data, significant obstacles remain. Access to social media data has become more restricted due to corporate licensing policies, while ethical and jurisdictional considerations complicate usage. Classification remains a dominant barrier, as substantial UAP data may be captured on classified sensors, automatically rendering it inaccessible until declassified. Other challenges include language and translation barriers, with both human and automated systems prone to errors, especially in low-resource languages. Stigma in reporting, particularly among pilots, undermines data timeliness and completeness, while the lack of standardized reporting formats across agencies and organizations further fragments the landscape. Time sensitivity and weak retention policies have led to the loss of critical records, as in the well-known Nimitz case. Technical issues are also substantial.