- Data Portals
- User Support
- About Us
Proceedings of the Fifth International Workshop on Climate Informatics: CI 2015. J. G. Dy, J. Emile-Geay, V. Lakshmanan, Y. Liu (Eds.). September 2015. ISBN: 978-0-9973548-0-5
Lindsey R. Dietz, Snigdhansu Chatterjee
Abstract—Spatio-temporal models add complexity, but not necessarily value, to some climate analyses. To confirm the presence of spatio-temporal dependence, a hypothesis test should be conducted. The Space-Time Index is one statistic to detect such dependence; this statistic is simple, easily interpretable, and used in several disciplines. In an application to Indian monsoon precipitation thresholds, residuals from logit-normal mixed models were tested for spatio-temporal dependence. No evidence of dependence was detected in high thresholds.
2. CLIMATE CHANGE FROM AN INSURANCE PERSPECTIVE: A CASE STUDY OF NORWAY
Calvin Chu, Yulia R. Gel, Vyacheslav Lyubchich
Abstract—Presented at workshop, but not availble for public circulation.
Nachiketa Acharya, Allan Frei, Emmet M. Owens
Abstract— Stochastic weather generators (WGs) are often used to simulate synthetic weather time series based on observed statistical properties in a particular location. Most studies evaluate WG skill based on average properties. Our objective is to assess how the WGs performs in simulating extremes, especially high precipitation amounts. We analyzed 13 different WGs using two parallel approaches: extreme event indices associated with large precipitation events; and recurrence intervals based on the Generalized Extreme Value (GEV) distribution.
Christian Rodriguez, Imme Ebert-Uphoff, Yi Deng
Abstract—The energy budget of the earth accounts for energy entering from the sun, energy lost to space, and energy stored in the atmosphere and the planet. The exchange of energy between space, the atmosphere and the planet is a very complex process affected by many factors, including surface and atmospheric temperature, surface albedo, the amounts of clouds, aerosols and various trace gases, such as water vapor and carbon dioxide in the atmosphere. A thorough understanding of the earth’s energy budget is essential to predict how the climate responds to perturbations in external forcing. In this project we seek to develop a better understanding of the interactions between radiative flux measurements at the top of atmosphere and air/surface temperatures, using methods from causal discovery. This project is still in its initial stages, so this abstract focuses on the basic methodology and illustrates it with initial results from some first test runs.
Dorit Hammerling, Allison H. Baker, Imme Ebert-Uphoff
Abstract—This study applies methods from causal discovery theory to the output data of climate models. Causal discovery seeks to identify potential cause-effect relationships from data and is used here to learn so-called causal signatures from the data that indicate interactions between the different atmospheric variables. We hope that these causal signatures can act like finger prints for the underlying dynamics, and can as such be used in a variety of applications. Sample applications include (1) distinguishing correct model runs from incorrect ones, i.e. providing an additional error check for climate model runs and (2) assessing the impact of data compression on the causal signatures, as a means to determine which type and amount of compression is acceptable. Still being in the early stages of this project, we primarily describe work in progress and future work.
Kantha Rao B, Prashant Goswami
Climate and Environmental Modelling Programme, CSIR Centre for Mathematical Modelling and Computer Simulation, Wind Tunnel Road, Bangalore-560037, India.
Abstract—As the structure and scope of mathematical models become increasingly complex, computer simulations can no longer be validated easily; this, in turn, raises the criticality of the reliability of the computed results, especially in climate projections. The tremendous growth in high-performance computing has seen a corresponding increase in the scope and the complexity of weather and the climate models. An implicit assumption in designing a forecast or a projection system has been that the accuracy or skill of the simulation or the forecast is independent of the computing platform [comprising of compute server, compiler etc], at least in a statistical sense. In most cases, computing systems are chosen [benchmarked] as generic and passive tools, based on hardware specifications, performance benchmarking involving time, latency etc. Although careful consideration is given to data consistency (such as with reference to earlier simulations), accuracy (that is comparison with target data) is not generally considered; however, for end-use, and especially with respect to applications like weather forecasting, comparison with independent observations (accuracy) is critical. While the importance of the computing platform in the cycle of dynamical forecasting is well recognized, the emphasis so far has been primarily on aspects like scalability and interconnects or optimal hardware configuration. Although consistency of simulations by a new computing system is often considered, it is customary to compare forecasts from different computing platforms in experiments like model inter-comparison to determine the relative skill of forecast or simulation. In more software (model) centric benchmarking, every important model version is generally rerun on the new machine; these simulations are then compared with simulations on the old machine. It is not unknown that a new architecture (and/or compiler) can expose bugs in the code (often in the parallelization), but generally the basic characteristics of the climate of the models can be ascertained, and compared. The objective of the present work is to examine and quantify the role of the computing platform in accuracy and the reliability of simulations, and especially weather forecasts. We show here, using simulations of increasing complexity, from basic library functions to the 3-variable Lorenz system to (internationally tested) models of weather forecasting, that simulations from two different computing platforms of the same brand, with identical initial conditions, can give rise to significantly different forecast skills, although both systems generate acceptable simulations. It is ensured that these differences cannot be attributed to numerical chaos, precision, compilation or dependence on particular initial condition. Implications of machine-dependence in areas like climate projections are discussed; an application-specific accuracy benchmarking is proposed. The primary conclusion from our results is that there is significant contribution from a computing platform to the quality and the accuracy of simulations, especially for complex systems. In addition to the differences in the simulations due to model formulations and sensitivity to initial conditions, a (residual) difference can remain even from the same model and initial condition simulated in different machine architecture. The machine dependence of accuracy is found to persist even beyond 30 days; it is reasonable to assume that such machine dependence will also remain in climate simulations and projections (since machine-dependence introduces error at every integration step). Considering the generally weak signal (such as trend) in climate change, projections of climate change are likely to have added and considerable uncertainty due to machine dependence. Many climate change effects have small signatures but large impact on policy; thus it is necessary to ensure that climate projections are free from uncertainties arising from machine dependence to improve the quality and the applicability of climate informatics.
Impact of computing platform on simulations different time scales and horizontal resolutions (degree of freedom).
(a) Area-averaged (Lon=75°-85°, Lat=8°-28°) daily rainfall averaged over 5 years (1982,1985, 1987, 1988, and 1990) from observation (IMD) and forecasts from the two computing systems; the number in the parentheses represents average (over 30 days) daily error for the respective forecasts.
(b) Impact of model configuration (horizontal resolution) on the difference in simulations with the two computing systems- ALTIX and ORIGIN. The solid line shows the difference in the simulations at horizontal resolution of 0.5°x0.5° of the model in area averaged daily rainfall for the month of May 2000; the dash line shows the corresponding results for a lower resolution (5°x5°).
Pierre Tandeo, Pierre Ailliot, Bertrand Chapron, Redouane Lguensat, Ronan Fablet
Abstract—The reconstruction of geophysical dynamics remain key challenges in ocean, atmosphere and climate sciences. Data assimilation methods are the state-of-the art techniques to reconstruct the space-time dynamics from noisy and partial observations. They typically involve
multiple runs of an explicit dynamical model and may have severe operational limitations, including the computational complexity, the lack of model consistency with respect to the observed data as well as modeling uncertainties. Here, we demonstrate how large amount of historical satellite data can open new avenues to address data assimilation issues, and to develop a fully data-driven assimilation. Assuming that a representative catalog of historical state trajectories is available, the key idea is to use the analog method to propose forecasts with no online evaluation of any physical model. The combination of these analog forecasts with observations resorts to classical stochastic filtering methods. For illustration of the proposed analog data assimilation, the brute force use of 20 years of altimetric data is demonstrated to reconstruct mesoscale sea surface dynamics.
Rose Yu , Dehua Cheng, Yan Liu
Abstract—Low-rank tensor learning has many applications in machine learning. A series of batch learning algorithms have achieved great successes. However, in climate data analysis, we are confronted with large-scale tensor streams, which pose significant challenges to existing solutions. In this paper, we propose an accelerated lowrank tensor online learning algorithm (ALTO) and apply our method to climate multi-model ensemble task. Experiment
results show that our method achieves comparable predictive accuracy with significant speed-up.
Subhabrata Majumdar, Lindsey Dietz, Snigdhansu Chatterjee
Abstract—We introduce a novel one-step model selection technique for general regression estimators, and implement it in a linear mixed model setup to identify important predictors affecting Indian Monsoon precipitation. Under very general assumptions, this technique correctly identifies the set of non-zero values in the true coefficient (of length p) by comparing only p+1 models. Here we use wild bootstrap to estimate the selection criterion. Mixed models built on predictors selected by our procedure are more stable and accurate than full models across testing years in predicting median daily rainfall at a station.
Yi Li, Yale Chang, Thomas Vandal, Debasish Das, Adam Ding, Auroop Ganguly, Jennifer Dy
Abstract—It is imperative to accurately assess the impacts of climate change at regional scale in order to inform stakeholders to make policy decisions on critical infrastructures, management of natural resources, humanitarian aid, and emergency preparedness. However, Global Climate Models (GCMs) currently provide relatively coarse resolution outputs which preclude their application to accurately assess the effects of climate change on finer regional scale events. Statistical downscaling are methods that use statistical models to infer the regional-scale or local-scale climate information from coarsely resolved climate models. To make accurate predictions, covariate selection must be used to reduce the dimensionality of high dimensional climate data. Covariates in climate data tend to be highly dependent and non-linear in nature requiring advanced covariate selection methods. In this work, we propose a novel copula-based dependence measure that can capture non-linear relationships between variables as a criterion for feature selection. We demonstrate its effectiveness in discovering relevant features important for prediction with a non-parametric Bayesian mixture of sparse regression models applied to statistical downscaling.
Simon Goring, J Sakari Salonen, Miska Luoto, Jack Williams
Abstract—Fossil pollen is a widespread proxy for past vegetation that is used for paleoclimatic reconstruction, but the limits of its utility are not well known. Newer methods for climate reconstruction (CR) using machine learning techniques may improve the abilities of CR techniques, but little is known about model accuracy under conditions of non-analogue vegetation known to have occurred in the past. Here we generate non-analogue pollen assemblages by excluding close neighbors from calibration datasets, testing the ability of five CR techniques using pollen, including two machine learning techniques, to accurately reconstruct climate under non-analogue conditions.
Yuan Yan, Marc G. Genton
Abstract—The tilting method ranks each observation in terms of its influence on a general statistic, which can be mean, covariance, etc. This approach is based on ‘tilting’ or re-weighting each data value to achieve a given small change of the statistic, while minimizing the total amount of tilt. Then the influence ranking for each data corresponds to the rank of the tilted data weights. The tilting method can be applied to univariate, multivariate, functional or multivariate functional data. It allows for robust analysis and outlier detection. Climate data are intrinsically functional, either temporally or spatially or both. We illustrate the use of the tilting method by applying it to sea surface temperature data and bivariate data of mean monthly temperatures and precipitations recorded at Canadian weather stations.
Timothy DelSole, Claire Monteleoni, Scott McQuade, Michael K. Tippett, Kathleen Pegion, J. Shukla
Abstract—A machine learning algorithm for combining predictions is applied to seasonal predictions of the NINO3.4 index from six coupled atmosphere-ocean models. The algorithm adaptively tracks a dynamic sequence of “best experts” and produces a probability that a particular expert is best. Averaging based on this probability effectively yields a multi-model prediction. The algorithm gives seasonal predictions that are more skillful than any individual model and better than the multi-model mean.
Homer Strong, Andrew W. Robertson, Padhraic Smyth
Abstract—Predicting ground rainfall from satellite estimates is useful as input for many applications, especially for areas with sparse rain gauges. We propose a predictive model based on an Additive Gaussian process (AGP) which can be viewed as the sum of a GP for the influence of the satellite estimate and a GP for the spatial distribution of rainfall between gauges. The hyperparameters for the covariance function estimates maximize the leave-oneout predictive densities. Initial results indicate that the proposed AGP model provides more accurate predictions compared with traditional kriging and inverse weighting methods.
Mina Moradi Kordmahalleh, Mohammad Gorji Sefidmazgi, Abdollah Homaifar, Stefan Liess
Abstract—A proposed sparse recurrent neural network with flexible topology is used for trajectory prediction of the Atlantic hurricanes. For prediction of the future trajectories of a target hurricane, the most similar hurricanes to the target hurricane are found by comparing directions of the hurricanes. Then, the first and second differences of their positions over their life time are used for training the proposed network. Comparison of the obtained predictions with actual trajectories of Sandy and Humberto hurricanes show that our approach is quite promising for this aim.
Andre R. Erler, W. Richard Peltier
Abstract—The analysis of precipitation extremes with decadal return periods requires long data records; longer than is typically available from station observations. To address this problem, a method id proposed for clustering station records based on the similarity of their precipitation climatologies. It is demonstrated that the proposed aggregation method is superior to naïve aggregation methods for Extreme Value Analysis and enables the detection of changes in the historical record that would otherwise not be distinguishable from noise.
Chi-shing Calvin Cheung, Steve Hung, Lam Yim
Abstract—Atmospheric stability has strong effects on air quality, driving vertical mixing of air pollutants. Literature has investigated the impact of climate changes on stability. However, the influence of associated impacts on air quality has yet to be understood. This study aims at projecting the impact of changes in stability on air quality under climate change. We took the Pearl River Delta region as an example to demonstrate the predictability of air quality using stability indices based on regional climate data simulated by the Weather Research and Forecasting Model, which dynamically downscaled the past and future climate under the A1B scenario simulated by ECHAM5/MPIOM. Stability indices were calculated and used to classify atmospheric conditions into two stability groups: stable and neutral & unstable. Using Generalized Linear Model, the stability indices were used to estimate the changes in Sulfur Dioxide, Ozone and PM10 due to changes in stability in the future periods of 2015-2039 and 2075-2099.
Sabrina Vettori, Rapha¨el Huser, Marc G. Genton
Abstract—The spatial dependence structure of climate extremes may be represented by the class of max-stable distributions. When the domain is very large, describing the spatial dependence between and within subdomains is particularly challenging and requires very flexible, yet interpretable, models. In this work, we use the inherent hierarchical dependence structure of the (max-stable) nested logistic distribution for clustering and dimension reduction in multivariate extremes, taking into account the occurrence times of extreme events. Methods are tested both through a simulation study and by analysing extreme air temperatures at different stations in Switzerland.
Mahesh Mohan, Cheng Tang, Claire Monteleoni, Timothy DelSole, Benjamin Cash
Abstract—We propose to use machine learning to discover indices from the SST field data, and to compare their prediction performance to that of the Ni˜no3.4 index, on tasks related to ENSO. As a first step in this direction, this work focuses on predicting the time-series of monthly
temperature anomalies in Texas, from time series for the whole ocean SST field, ending 6 months prior.
Xin Huang, Vyacheslav Lyubchich, Alexander Brenning, Yulia R. Gel
Abstract—Many existing methods for temporal trend detection in environmental space-time data are based on grouping observations simply by geographical proximity. Such grouping is often static and thus does not account for changes in space-time data distribution. In this project we evaluate properties of the new data mining algorithm for dynamic detection of space-time patterns, TRUST, proposed by Ciampi et al. , . In particular, we explore sensitivity of TRUST in respect to choice of tuning parameters and consider a case study of dynamic cluster detection in yearly precipitation records among 167 stations in Central Germany over 1951–2010.
Charles Anderson, Imme Ebert-Uphoff, Yi Deng, Melinda Ryan
Abstract—Recent experiments involving the training of artificial neural networks with multiple layers, sometimes referred to as deep learning, have demonstrated the ability to automatically identify features that are critical to solving complex pattern classification tasks, such as speech recognition. Similar to speech, atmospheric data sets often consist of multiple time series with unknown, complex interrelationships. In this project we seek to explore what kind of interrelationships can be discovered in climate data by applying the framework of artificial neural networks. As a first application we look at establishing relationships between top of atmosphere radiative flux and air/surface temperatures. This is an important application, since a
thorough understanding of those relationships is essential for understanding the effect of CO2-induced warming on the Earth’s energy balance and future climate. We describe the basic idea, first observations and plans for future work.
David John Gagne II, Amy McGovern, and Ming Xue
Abstract—Track-based analysis of convection-allowing models is an efficient way to compare the properties of storms produced with different model configurations to observations. Storm morphology and evolution can be easily analyzed simultaneously. This project examines the properties of hailstorm proxy objects from a storm-scale ensemble and compares them with radar-observed hailstorms. An evaluation of successful matches by ensemble member is performed, and other properties of the forecast and observed hailstorms are examined.
Huang Huang, Ying Sun
Abstract—Datasets in the fields of climate and environment are often very large and irregularly spaced. To model such datasets, the widely used Gaussian process models in spatial statistics face tremendous challenges due to the prohibitive computational burden. Various approximation
methods have been introduced to reduce the computational cost. However, most of them rely on unrealistic assumptions of the underlying process and retaining statistical efficiency remains an issue. In this work, we develop a new approximation scheme for maximum likelihood estimation. We show how the composite likelihood method can be adapted to provide different types of hierarchical low rank approximations that are both computationally and statistically
efficient. The performance of the proposed method is investigated by numerical and simulation studies, and parallel computing techniques are explored for very large datasets. Our methods are applied to nearly 1 million measurements of soil moisture in the area of Mississippi River basin, which facilitates better understanding of the climate variability.
Chintan A. Dalal, Vladimir Pavlovic, Robert E. Kopp
Abstract—Analyzing datasets, such as sea-level records, pose a challenging statistical problem for reasons including non-stationarity, non-uniformly smooth spatial boundaries, and sparsity in the data. In this paper, we propose a framework to estimate the non-stationary covariance function by employing intrinsic statistics on the local covariates. These local covariates represent the underlying local correlation in the measurements, and they are assumed to lie on a Riemannian manifold of positive definite matrices. Additionally, we provide a technique for data-assimilation of correlated natural processes in order to improve the regression estimates arising from spatially sparse datasets. Experiments on a synthetic and real dataset of relative sealevel changes across the world demonstrate improvements in the error metrics for the regression estimates using our newly proposed approach.
Cameron Bracken, Balaji Rajagopalan, Linyin Cheng, Subhrendu Gangopadhyay
Abstract—An efficient Bayesian hierarchical model for spatial extremes on a large domain is proposed. In the data layer a Gaussian elliptical copula having generalized extreme value (GEV) marginals is applied. Spatial dependence in the GEV parameters are captured with a latent spatial regression. Using a composite likelihood approach and a method for incorporating stations with
missing data, we are able to efficiently incorporate a large precipitation dataset. The model is demonstrated by application to seasonal precipitation extremes at approximately 2800 stations covering the western United States, -125E – -100E longitude and 30N to 50N latitude. The hierarchical model provides parameters on a 1/8th degree grid and consequently maps of return levels and associated uncertainty for each season. The model results indicate that return levels vary coherently both spatially and across seasons, providing valuable information about the space-time variations of risk of extreme precipitation in the western US, helpful for infrastructure planning.
Varun Mithal, Guruprasad Nayak , Ankush Khandelwal, Vipin Kumar, Ramakrishna Nemani, Nikunj Oza
Abstract—This paper presents a new predictive modeling framework designed to learn classification models from imperfectly labeled samples, in the absence of expert annotated
training samples, for identifying rare classes. Our results show that, under some reasonable assumptions, the classifiers trained from imperfectly labeled training data using this approach have performance comparable to the models trained using expert-annotated training data.
This capability of learning from imperfect supervision is advantageous in a wide range of applications where the target class of interest is relatively rare and obtaining a precise labeling of even a small number of training samples is infeasible. We present the application of the framework for creating historical maps of forest fires from satellite data for the tropical forests. This new forest fire product identifies approximately 1 million sq. km. of burned areas in the tropical forests in South America and South-east Asia during years 2001-2014, which is more than three times of the total burned area reported by the state-of-art NASA products in these regions. We show validation of these results using burn-scars visible in satellite images to confirm the veracity of these forest fires.
Carlos Lima, Upmanu Lall
Abstract—Frequency studies of flood hydrology are commonly based on statistical models and thereof rely on the classical assumptions of independence, homogeneity and stationarity of the flood data. In this ongoing work we aim to advance traditional flood frequency studies by investigating extreme streamflow events under the flood hydroclimatology framework, where a formal consideration of the physical mechanisms responsible for the generation of extreme floods is contemplated through the analysis of the synoptic atmospheric and oceanic fields in the days that preceded the events. Large scale fields of wind vector, sea surface temperature and moisture divergence as well as storm track data built on the integrated moisture flux in the atmosphere are evaluated in a reduced dimensional space obtained by autoencoder networks, which consist of multilayer neural networks whose goal is to reduce the dimensionality of high-dimensional input data. Extreme hydrological events in two flood-prone regions in Brazil are
used as case studies for the current work and a hypothesis of the causal chain of extreme floods in such regions is offered and investigated using the proposed methodology and the machine learning tools.
Eniko Sz´ekely, Dimitrios Giannakis, Andrew J. Majda
Abstract—We investigate in this paper the dominant Intraseasonal signals in both convection and circulation data using the nonlinear Laplacian spectral analysis (NLSA) method. Three Madden-Julian oscillation (MJO) indices are constructed based on temporal modes extracted from
pure cloudiness, lower- and upper-level zonal wind anomalies. All three indices reveal strong intermittency and capture well – through the use of kernel-based similarity methods from machine learning – the inherent nonlinear nature of both convection and circulation in the tropics.
Gilberto Iglesias, David C. Kale, Yan Liu
Abstract—The growing volume and detail of digital climate data offer opportunities for better understanding climate and weather phenomena, but the size and complex nature of these data pose many challenges to traditional statistical learning. In this abstract, we present a preliminary
examination on whether and how recent advances in deep learning can capture the complex interactions between climate factors and help make accurate predictions on extreme weather events, such as heatwaves.
Saurabh Agrawal, Stefan Liess, Snigdhansu Chatterjee, Vipin Kumar
Abstract—Climate teleconnections are the relationships between long distant regions. In this work, we introduce a novel climate teleconnection pattern called tripole. A tripole involves three regions A, B, and C, such that the anomaly time series at region C is more strongly correlated
with either addition or subtraction of anomaly time series observed at region A and region B, as compared to that with any of the anomaly time series at region A or B alone.
Xi C. Chen, Yuanshun Yao, Sichao Shi, Vipin Kumar , James H. Faghmous
Abstract—Lack of the global knowledge of land-cover changes limits our understanding of the earth system, hinders natural resource management and also compounds risks. Remote sensing data provides an opportunity to automatically detect and monitor land-cover changes. Although changes in land cover can be observed from remote sensing time series, most traditional change point detection algorithms do not perform well due to the unique properties of the remote sensing data, such as noise, missing values and seasonality. We propose an online change point detection method that addresses these challenges. Using an independent validation set, we show that the proposed method performs better than the four baseline methods in both of the two testing regions, which has ecologically diverse features.
Chi-shing Calvin Cheung, Melissa Anne Hart, Mervyn R. Peart
Abstract— This study demonstrates a method to project the future rainfall in a city, Hong Kong, based on future climate scenarios. The procedure consists of using logistic regression (LM) to predict rain occurrence and generalized linear model (GLM) to predict rainfall volume. A perturbation method is then applied to statistically downscale the future projections to a local rainfall study. The future climate projections are given by three General Circulation Models
(GCMs): GISS-ER, GFDL-CM2.1 and MRICGCM2.3.2. from the Coupled Model Intercomparison Project phase 3 (CMIP3). The logistic regression and generalized linear model were first calibrated using NCEP/NCAR reanalysis and local scale observations data from 1971 – 2000, followed by validation using data from 2001 – 2010. The index of agreement for monthly rain occurrence (days) is > 0.62 and monthly rainfall volume (mm) is > 0.73 for model validation.
Scott McQuade, Claire Monteleoni
Abstract—We approach the problem of adaptively combining the predictions of an ensemble of seasonal climate models as a Multi-task Learning (MTL) problem. Unlike the traditional MTL setting, we only have a single functional task (combining the predictions ensemble members), where we consider multiple forecast periods from the same suite of models as our multiple learning tasks. Even though the same models generate the predictions in our “multiple tasks,” we demonstrate that knowledge transfer between these forecast periods can improve ensemble predictions of the sea surface temperature in the Ni˜no 3.4 region.
Yahui Di, Wei Ding, Sanaz Imen, Ni-Bin Chang
Abstract— The main purpose of this study is to determine the association rules between hydro-climatic variables and the atmospheric / oceanic variables separated by large distances, which are known as the phenomenon of hydro-climatic teleconnection. In order to discover physically meaningful patterns from big climate databases, we aim at developing efficient data-driven approaches with the aid of machine learning, signal processing, and domain knowledge for constrained search. The big data analytics tool with the streaming feature selection in machine learning extracts hydro-climatic variables from large temporal and spatial feature space and formulates the global search for teleconnection signals effect on terrestrial precipitation. The wavelet analysis retrieves the scale-averaged wavelet power to signify the teleconnection signals via a pixel-wise linear lagged correlation analysis. Preliminary comparisons between streaming feature selection in machine learning and wavelet analysis were made possible to pin down some known teleconnection patterns in this interdisciplinary study.
Vidyashankar Sivakumar, Moumita Saha, Pabitra Mitra, Arindam Banerjee
Abstract—We consider the problem of predicting total Indian summer monsoon rainfall (ISMR). A popular approach in prior literature ,  has been to fit a regression model with the precipitation as predict and various climatological indices and parameters as predictors. The predictor climatological indices and parameters are detected through an analysis of their linear correlations with the Indian monsoon precipitation. Due to limited success of such prior work based on a fixed regression model, in this work we investigate ISMR prediction based on the
hypothesis that Indian monsoon operates in a few different regimes, where different predictors become relevant and influential. We model such a multi-regime setting as a finite mixture of linear regressions (MLR) model , with a ridge regression model for each regime of operation.
The parameters of the model are determined using the Expectation Maximization (EM) algorithm. The prediction procedure consists of identifying the regime of operation
and then applying the corresponding regression model. The MLR model seems to improve overall prediction accuracy compared to a single fixed regression model (SLR).
Soumyadeep Chatterjee, Stefan Liess, Arindam Banerjee, Vipin Kumar
Abstract—Statistical modeling of local precipitation involves understanding local, regional and global factors that are informative of precipitation variability in a region. We consider seasonal precipitation over Great Lakes Region in order to identify the dominant factors for each season.
We use sparse regression methods followed by random permutation tests for selecting the dominant factors. It offers hypotheses over possible mechanisms for seasonal regional precipitation, which can be further analyzed by domain scientists for physical validity.
Daniel J. Short Gianotti, Bruce T. Anderson, Guido D. Salvucci
Abstract—Weather and climate prediction rely on inherently different predictors at different time scales. Some of the variability in precipitation on climate time-scales (i.e., interannual) could be attributed to accumulated noise on weather (i.e., daily) time-scales. To determine a lower bound on how much climate variability is due to processes at climate time-scales, weather generator models with no mechanism for interannual variability aside from short term (weather-scale) memory structure are created as nulls against which to compare the observational record of precipitation variability. These models can then be used to assess potential predictability in the observational record as well as in Global Climate Models, even at different spatial scales. Initial comparisons are presented.
Julien Emile-Geay, Nicholas P. McKay, Jianghao Wang, Darrell Kaufman, & PAGES2k consortium
Abstract—Reconstructions of surface temperature over the past 2,000 years extend our knowledge of climate system behavior beyond the instrumental era, helping to distinguish between exogenous and endogenous sources of climate variability, a fundamental frontier of climate science. In this study, we describe the latest incarnation of the PAGES 2k global multi-proxy database, a multi-proxy, community-curated pool of paleoclimate records. The database is structured as Linked Open Data using a JSON-LD container, allowing for semantic relations to be discovered between its objects and other Linked Data. We describe elementary statistical analyses possible with this new data resource, present a reconstruction of global surface temperature via Markov random fields, and encourage experimentation via other forms of machine learning and artificial intelligence.
39. Independence and the use of climate model ensembles
James Annan and Julia Hargreaves
Abstract—The question of the correct manner in which to interpret the ensemble of current state of the art climate models, has attracted substantial attention in recent years. This has frequently been discussed in terms of the ``independence'' or otherwise of the models. Related questions include: how to measure independence, how many ``effectively independent'' models there are, and how we may correct or adjust for any interdependence amongst the models. One common goal is to select a subset of the models, e.g. for use in further analysis and prediction, which may be considered independent. However, the term ``independent'' has rarely if ever been defined in a clear, unambiguous and useful manner. Here we argue for a fundamentally Bayesian interpretation of this question, and analyse the implications of this. We will also show that the same framework can be applied to other topics, such as the independence of observational evidence concerning climate system behaviour and in particular, estimates of the equilibrium climate sensitivity. We present here, for the first time, some clear and consistent guidance as to what independence means in these contexts, including a mathematically precise and clear definition.