- Data Portals
- User Support
- About Us
National Center for Atmospheric Research
CISL SEMINAR SERIES PRESENTS
Dr. Sangmi Pallikara, Colorado State University
Queries to Support Analytics Over Unstructured Multidimensional
A lot of the data generated by observational devices is unstructured. Galileo, a distributed storage system for managing geospatial time-series datasets, is designed to handle such data. To be effective, approaches to data retrievals need to scale with increases in data volumes and also the complexity of queries. A related requirement is that of sustained throughput in the face of high rates of data and query arrivals. The focus of my talk is query evaluations in such settings. I will describe support for query evaluations over large, multidimensional datasets using Galileo. Besides support for wildcards and range queries, I will describe how to autonomously improve query evaluations by accounting for both the distribution of data values and patterns in the queries themselves. Such queries can also be constrained using arbitrary polygonal shapes that can be directly utilized by queries that are part of visual analytics. Finally, I will discuss how an autonomous caching scheme was used as the basis to support continuous queries over fast evolving datasets.