SIParCS 2018- Joshua "Josh" Jones
NCAR DASH Search and Linked Data: Investigation and Implementation using Schema.org
As datasets become more complex and cover multiple scientific domains, larger projects are being undertaken by groups of scientific collaborators. This changes how voluminous data is handled long term, and data management has become a critical issue facing the sciences. Many scientists are now developing and improving additional ways to describe data, including a push in new methods for relating, structuring, or “linking” datasets together. Schema.org is one specific method to link data through relevant metadata stored in a specific structured format. This format refines datasets searches based on the specifically chosen structured fields. This allows an improvement in discoverability and visibility for scientific datasets from different search engines. Schema.org is also able to add fields from additional metadata that can directly link data to a subset or a parent dataset. This project, a SIParCS internship, presents how Schema.org can be implemented with NCAR’s search service called the Digital Asset Service Hub (DASH). DASH, created by the Data Stewardship Engineering Team, allows for all of the scientific assets produced by NCAR to be searched in one central location. We show that when properly utilized in the DASH Search, Schema.org generates the appropriate structured fields for each searchable product from NCAR laboratories. Linked data structures can help scientists’ projects, models, and data be easily discoverable, and will also help scientists establish good practices for metadata collection.
Mentors: Sophie Hou, Eric Nienhouse, Nathan Hook