SIParCS 2018 Projects

Projects for summer 2018

* U,G Denotes Availability for Undergraduate and/or Graduate Applicants 

  1. Creating a Jupyter Notebook Kernel for NCAR Command Language (NCL) *U
  2. Cybersecurity Center for Internet Security (CIS) Rollout *U
  3. Evaluating the Performance of Large Scale Data Assimilation in Modern Geophysical Models *U
  4. Fortran Standards Toolkit *U
  5. Improving the Traceability of Data Provenance and the Demonstration of Data Workflows Using the Linked Data Model *U,G
  6. Machine Learning Long Term Weather Forecast *U
  7. Novel Methods for HPC Training *U
  8. Supercomputer Infiniband Fabric Analysis *G
  9. Using Machine-Learning to Simplify the Identification of Code Optimization *G
  10. Visualization of Time Series and 3D Spatial Data Using Python *U
  11. Weather Research and Forecast (WRF) Scaling, Performance Assessment and Optimization *U,G

  1. Creating a Jupyter Notebook Kernel for NCAR Command Language (NCL) *U
    Areas of interest: software engineering, application optimization, data visualization

    The Jupyter Notebook (http://jupyter.org/) is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and explanatory text. Jupyter Notebook is a popular tool for scientists and students performing data analysis and visualization. It is also useful for instructors wishing to build interactive tutorials for students, since Jupyter Notebooks can be converted to slide shows with live code execution. Initially, Jupyter Notebook included built-in support for the Julia, Python, and R programming languages, but it now includes an interface for supporting other languages as well.

    The NCAR Command Language (NCL) is an interpreted language designed specifically for scientific data analysis and visualization. The focus for this project is to build a Jupyter Notebook kernel to enable support for NCL. The project will be broken up into three phases. The first phase of the project will be to build the layer that connects the Jupyter Notebook web application to the NCL interpreter. The second phase is to add support for displaying NCL graphics directly inside a Jupyter Notebook. The final phase will be to add support for syntax highlighting of the NCL source code within the Jupyter Notebook.

    Students - The project is open to undergraduate students only.

    Skills and Qualifications
    Students must have experience with Python. Experience with C, Jupyter Notebook, ZeroMQ, JSON, HTML and Javascript, is desired but not required. 

    Apply

    Back to top


  2. Cybersecurity Center for Internet Security (CIS) Rollout *U
    Areas of Interest: Cybersecurity, Formal Verification
     

    The CIS Critical Security Controls are a recommended set of actions for cyber defense. UCAR and NCAR use these controls to prioritize defense strategies to protect digital and physical computing assets. The student engaged in this project will develop a process to address the SANS/CIS Critical Security Control #1: Inventory of Authorized and Unauthorized Devices. The intern will create an automatically updating inventory of all hardware devices on the network so that only authorized devices are given access, and unauthorized devices are restricted from gaining access to the network. The end product process and inventory will seek to be suitable for use in NCAR's distributed and dynamically changing IT environment.

    Students - The project is open to undergraduate students only.

    Skills and Qualifications
    Students should possess basic computer skills, knowledge of Linux, an interest in cybersecurity, and written and verbal communication skills. Experience with the Linux shell command line, Basic shell scripting, and Basic database is desired but not required. 

    Apply

    Back to top


  3. Evaluating the Performance of Large Scale Data Assimilation in Modern Geophysical Models *U
    Areas of interest: Data science, data visualization, numerical methods

    Sensors of all kinds are providing an increasing amount of observational data about our planet, and this data is increasingly used in modern geophysical models to tune the state of simulations, which results in more accurate forecasts and projections. This project will use NCAR’s supercomputers to explore and characterize the performance of ‘data assimilation’ methods in the scope of several key models and across several varying parameters including model sizes, the number of observations, and processor counts. The focus will be on ‘large scale’ simulations which are computationally demanding and might be common in 3-5 years' time.

    The student will run simulations and perform analysis in a systematic and rigorous manner, as well as visualize and present their results.

    Students - The project is open to undergraduate students only.

    Skills and Qualifications
    Students should have experience with scripting languages like csh and Python, as well as an interest in data science.

    Apply

    Back to top


  4. Fortran Standards Toolkit *U
    Areas of interest: Software Engineering

    Large programs may be built over extended periods of time, and therefore may accumulate older code.  As language standards evolve, some of this older code may no longer be standards- compliant. However, where portability is a concern, adherence to the standard becomes important. The sheer magnitude of accumulated old code may inhibit efforts to keep sources up-to-date, but automated help can reduce the scientific programmer's workload.

    Fortran has a well-known laundry list of older features, some once standard and some never standard, that should be addressed.

    A source-to-source translator, with compiler-like logic in the front end, is needed to solve these issues. A simpler approach might work for some of the easier issues, but the more general the tool, the greater the value to the scientific programmer.  Conversations with programmers at the University of Oregon indicates that such technology exists, but is currently used to explore source-to-source techniques for use with vectorizing code, or preparing code for GPUs, and similar.  Thus, making a source-to-source tool for modernizing Fortran will require some effort.

    The task for a SIParCS student is to develop the work from the University of Oregon to perform these and similar transformations. The specific goal is to work with end users to prioritize items from the above list and produce a tool that can handle them. The list overall is substantial; appendices in the Fortran standard list contain deleted and obsolescent features that still haunt older codes. Thus, extensibility is a priority goal.

    Students - The project is open to undergraduate students only.

    Skills and Qualifications: Students should have knowledge of Fortran and compilers or a willingness to learn Fortran.

    Apply

    Back to top


  5. Improving the Traceability of Data Provenance and the Demonstration of Data Workflows using the Linked Data Model *U,G
    Areas of interest: Software engineering, data science

    The purpose of this project is to investigate how NCAR’s Digital Asset Services Hub (DASH) Search system could leverage linked data models in order to improve how the system keeps track of the provenance of its assets and presents provenance information to users. We intend this internship to inform data and scientific knowledge related initiatives including the DASH Search and CAPSTONE Scientific Workflows projects. The project is designed to be a collaborative work between an undergraduate student and a graduate student working in a broader team based environment. For the undergraduate student, s/he will be expected to learn about linked data models, understand the needs of the DASH Search system in terms of provenance and workflows, and provide feedback for any proposed solutions. For the graduate student, s/he will be expected to evaluate how the Resource Description Framework (RDF) could be implemented with the DASH Search system, create a prototype for demonstrating the selected linked data model, and outline the next steps/areas for further evaluation if the DASH Search system is to proceed with integrating the linked data model.

    Students - This project is open to undergraduate and graduate students.

    Skills and Qualifications: Students must have a basic understanding of programming and markup languages such as Python, Java, Javascript, XML, and HTML. Students should also have a willingness and ability to study and understand linked data principles and models such as RDF, as well as the willingness to learn and use computing tools and programs. Knowledge of Jupyter or other notebook enviorinments and Agile Scrum development processes is desired but not required. 

     

    Apply

    Back to top


  6. Machine Learning Long Term Weather Forecasts *U
    Areas of interest: Machine learning, data science, numerical methods

    Long term and seasonal weather forecasts are usually performed by running the same physics-based Numerical Weather Prediction (NWP) models used for relatively short-term (less than 11 days) forecasts. Since the atmosphere is non-linear, a single NWP run is not representative; an ensemble of several runs, with slightly different initial conditions, is often used to better represent the uncertainty. The combination of uncertainty in measurements and the computational cost of long and multiple NWP runs limits the usefulness of long-term forecasts. Moreover, the nature of the problem makes the outcome probabilistic.

    The Research Data Archive (RDA) in CISL contains a large number of global weather datasets. This goldmine of data can be leveraged through machine learning techniques because it contains the input that supervised machine learning techniques need. The labeling can be obtained from the same data by some relatively simple processing (e.g. was summer of 1994 hotter than average in the USA? More rainy?) Most importantly, because of the qualitative nature of the questions being answered, this problem could be easier to solve with techniques such as Deep Learning than with traditional NWP.

    This project seeks to dramatically reduce the computational resources and time needed for long term and seasonal forecasts, as well as improve their accuracy. If successful and applied to extreme weather events, it could help nations that cannot afford their own NWP and supercomputing infrastructure to prepare for and weather extreme events safely.

    Students - This project is open to undergraduate students only. 

    Skills and Qualifications: Desire to use machine learning and technology for the public good. Interest in physics, weather and computer experiments. Ability to work in a Unix or Unix-like environment, using makefiles, scripts, compilers, plotting tools, etc. as needed. The successful candidate must be very organized and highly motivated. Familiarity with machine learning techniques such as Deep Learning, interoperable data formats, weather-related hazards around the world and/or physical science is desired. Prior experience programming in at least one language, such as Python, Java, C or Fortran.

    Apply

    Back to top


  7. Novel Methods for HPC Training *U
    Areas of interest: Software engineering, documentation

    Every user of High Performance Computing (HPC) systems must learn a variety of computing tasks including but not limited to shell scripting, job scheduling, resource management, and data movement. Many individuals self-teach these techniques, but novel methods of interactive learning can make this process more efficient and rewarding.

    The objective of this project is to investigate and develop novel training approaches for topics covered in NCAR’s HPC documentation. The initial focus of work will be prototyping interactive learning modules using Jupyter Notebooks. These Notebooks will teach essential and/or complex concepts in the NCAR HPC ecosystem. Potential topics include using the PBS job scheduler, customizing the user environment, understanding permissions on disk and tape, and managing cron jobs. The student will then work with User Support staff to determine how best to integrate modules into existing NCAR documentation. If time allows, the student may investigate other self-guided teaching technologies and contribution of work to the HPC Carpentry project.

    Student - The project is open to undergraduate students only.

    Skills and Qualifications: Applicants should be comfortable with the Unix computing environment and have experience with shell and Python scripting. An ideal candidate will also have knowledge of HPC resources and exposure to Jupyter Notebooks. Some experience with technical writing would be also be beneficial.

    Apply

    Back to top


  8. Supercomputer Infiniband Fabric Analysis *G
    Areas of interest: Networking, supercomputer systems operations

    Supercomputing centers such as NCAR’s strive to provide users with a productive and efficient supercomputing interconnect, not only by observing performance, but also through static analysis of network topology, routing and design. A particularly important question is how best to optimize applications to fully utilize available system network efficiently. For this, we need static analysis of the interconnect to gain a better understanding of the design’s performance.

    The primary goal of this summer project will be to write a Tulip plugin in C++ to do static analysis on the Infiniband interconnect fabrics of NCAR’s “Yellowstone” and “Cheyenne” petaflop supercomputers. If time permits, the project will include analysis of realtime and archived performance data collected on the supercomputers.

    Students - This project is open to graduate students only. 

    Skills and Qualifications:
    Students interested in design and performance analysis high performance computing systems are encouraged to apply for this project. Students need strong programming skills in C++ and understanding of graph theory. Students with specific experience in Linux, Infiniband, gcc and cmake are preferred.

    Apply

    Back to top


  9. Using Machine Learning to Simplify the Identification of Code Optimization *G
    Areas of interest: Machine learning, application optimization

    Many of the scientific applications that execute on large scale parallel computing platforms run in a sub-optimal fashion. Frequently, modest changes or optimizations to the internal calculation of the applications can significantly reduce the time-to-solution. While it is frequently easy to make these code modifications, it is non-trivial to know exactly which modification should be made.

    In this project we will attempt to utilize machine learning techniques to identify which code changes should be applied to certain sections of code based on a detailed performance analysis of an application. This project will involve the creation of a training set of performance pathologies and likely solutions which will be utilized to create neural network to guide performance optimizations.

    Students - This project is open to graduate students only. 

    Skills and Qualifications: Students have experience or an interest in machine learning and micro processor architecture. Experience with scripting languages such as csh or Python is desired but not required. 

    Apply

    Back to top


  10. Visualization of Time Series and 3D Spatial Data Using Python *U
    Areas of interest: Software engineering, data visualization

    We are looking for a student with a pragmatic understanding of statistics and a strong interest in Python and Data Visualization to help us develop new tools to visualize time series of 3D spatial data.

    The Data Assimilation Research Testbed (DART) is a community facility for ensemble Data Assimilation developed and maintained by the Data Assimilation Research Section (DAReS) at the National Center for Atmospheric Research (NCAR). DART uses an Ensemble Kalman Filter to combine observations with computer models, running many different versions of the model with slightly different initial conditions. This helps account for uncertainty and shows forecasters a spread of possible outcomes. Visualizing this data is important for modelers to improve their forecasts and to better understand how observations are impacting their model.

    This project is an opportunity to make an impact on leading edge Data Assimilation software and how modelers visualize their data. The intern will be working hands on with a diverse team of agile software engineers and data scientists to develop new Data Visualization tools that are flexible and easy to use for various 3D models usin

    Students - This project is open to undergraduate students only. 

    Skills and Qualifications - Students should have a working knowledge of Unix, Python and a pragmatic understanding of statistics. Experience with data assimilation or data visualization is desired but not required. 

    Apply

    Back to top

     


  11. Weather Research and Forecast (WRF) Scaling, Performance Assessment and Optimization *U,G
    Areas of interest: Software engineering, supercomputer systems operations, data visualization 

    The Weather Research and Forecast (WRF) model is a parallel mesoscale numerical weather forecasting application used in both operational and research environments. The objective of this project is to assess performance and scalability of the WRF model taking into account computation, disk I/O, and network communication costs. Interns will benchmark WRF on NCAR’s current HPC platform (Cheyenne), using and comparing different compilers and MPI libraries, and various compile and runtime settings. Time permitting, more advanced student interns may also be able to investigate and help understand application bottlenecks and propose code changes or application level settings to improve performance.

    Students - The project is open to undergraduate and graduate students.

    Skills and Qualifications: Ability to work in a Unix environment, using makefiles, scripts, compilers, plotting tools, etc as needed. The successful candidate must be very organized and systematic in their experimental technique. Previous experience with parallel applications and an understanding of MPI and OpenMP parallel paradigms are desirable. Basic programming skills in at least one programming language such as C or Fortran will be also beneficial.

    Apply

    Back to top