SIParCS internships benefit students, mentors, and CISL

By Brian Bevirt
08/27/2010 - 12:00am

During the second year of his doctoral studies, Chris Mckinlay heard about SIParCS from a colleague in UCLA’s mathematics department. He applied and was accepted in 2008, spending 10 weeks investigating wavelet compression methods to accelerate simulations. The experience helped him refine some ideas for his thesis, so he applied again and was accepted for a second summer in SIParCS.

SIParCS Mentor
Chris Mckinlay (seated, right) learned from both formal and informal mentors during his three summers working in CISL. John Clyne (seated, left) was his 2010 SIParCS mentor for a project involving GPU programming. Chris developed his interest in GPU acceleration during his lunchtime conversations with CISL software engineers Jose Garcia (standing, left) and Rory Kelly (standing, right) during his SIParCS project on another topic in 2009. Chris’ rich research experiences at NCAR have changed the direction of his graduate work at UCLA.

During SIParCS 2009, he met CISL engineers Rory Kelly and Jose Garcia who helped him pursue some curiosity-driven research into how Graphics Processing Unit (GPU) hardware can accelerate the application of large matrix operators. Chris learned that since the field of accelerating scientific computation with graphics processor hardware is still in its infancy, their work required interdisciplinary skills in both software engineering and applied mathematics. Although this topic was unrelated to his second SIParCS project, Chris began investigating GPU signal processing. His extracurricular GPU research attracted the attention of CISL software engineer John Clyne, who is applying GPU acceleration to visualization software.

John invited Chris to apply for a third SIParCS project, this time to implement a 3D wavelet compression algorithm on GPUs, and Chris was accepted to do that work. “I am grateful for the breadth of computing experiences available at NCAR, both through the SIParCS program and also through the many extracurricular learning opportunities that students in this program are encouraged to participate in. The most valuable aspect of my time at NCAR has been interacting with my formal and informal mentors in CISL. People do inspiring work here, and some of it has changed the direction of my thesis research and my upcoming work toward my Ph.D.”

What is SIParCS?

The Summer Internships in Parallel Computational Science (SIParCS) program provides opportunities for exceptional students to gain practical experience solving current, real-world problems in parallel computational science. SIParCS offers students with a background in computational science, computer science, applied mathematics, or the computational geosciences the chance to discover what a career in these fields can offer. During summer internships that typically last 10-12 weeks, CISL staff mentors guide each intern’s exploration of HPC systems and applications related to NCAR’s Earth System science mission.

SIParCS supports CISL's core educational mission to cultivate the future HPC workforce by aiming to improve the size and quality of the U.S. computational science workforce. Through outreach, it broadens participation in the computational sciences. By integrating research and education, SIParCS actively contributes to the number of trained scientists and engineers capable of using and maintaining 21st-century supercomputers. Students conduct research in diverse technical areas of numerical algorithms, geostatistics, and computer science. The following sample of their research projects indicates the wide range of their project topics.

GPU Acceleration of a Cloud-resolving Model, Hong Zhang, 2010.
The System for Atmospheric Modeling (SAM) cloud-resolving model is widely used to study detailed cloud processes. But cloud-resolving simulation is very computationally expensive, especially in large-scale 3D. Due to their increasing computational power and flexibility, multi-core Graphics Processing Units (GPUs) have become a powerful enhancement to scientific computing. This project accelerated SAM on GPUs and describes the potential for extending our effort to other cloud-resolving models.

Searching Multidimensional Parameter Spaces to Optimize Parallel I/O, Kate Ericson, 2009.
This project optimized the Parallel I/O (PIO) library, an MPI-IO based I/O library used in the Community Climate System Model (CCSM) version 4. Two methods were used to explore the large parameter space with PIO: Active Harmony, which uses a simplex method, and simulated annealing. Results of optimizing disk I/O bandwidths were demonstrated on both a Blue Gene/L at NCAR and a Cray XT5 at NICS. Preliminary Blue Gene/L results indicate that we sustain 70-90% of the peak disk I/O bandwidth.

Parallelizing Climate Model Analysis Using Swift, Taleena Sines, 2010.
Recent advances in climate models like the Community Earth System Model (CESM) have enabled an order-of-magnitude increase in its ability to both utilize additional parallelism and generate output data. However the scripts used for data analysis remain serialized and create a bottleneck in post-processing. In this project, the Swift scripting language was used to parallelize part of the CESM data analysis suite, which vastly improves post-processing times.

Optimizing Lookup Tables in the Community Atmosphere Model, Alan LaMielle, 2009.
Lookup tables are a static data structure that provide a convenient way to access data that is either experimentally determined or too expensive to recalculate. The size of large lookup tables can have a significant impact on overall memory usage, particularly at larger processor counts as the lookup table data is duplicated across all MPI tasks. This work reduced the high memory requirements of duplicated lookup tables with a distributed lookup table approach using one-sided MPI. This approach significantly reduced the memory consumption of the radiative source function lookup table.

Adding GridSpec Capabilities to NCL, Aaron Maus, 2010.
As climate models around the globe supply data for the next IPCC Assessment, a major task is analyzing the data to determine results for the AR5 report. The models in this effort run on different underlying grids, so to analyze the data, comparisons must be made between the models and their grids. This creates the need to easily convert from one grid to another to perform these comparisons for analysis. This project implemented the emerging Gridspec standard in the NCAR Command Language (NCL). The grid creation and regridding functions of Gridspec are now available in NCL to facilitate efficient analysis of climate model output worldwide for the IPCC AR5 and future assessments.

Porting VAPOR’s Wavelet Compression Utility to a Many-core Architecture, Chris Mckinlay, 2010.
VAPOR utilizes a JPEG-like wavelet compression algorithm to store 3D data. The standalone algorithm is a good candidate for GPU hardware acceleration under certain bandwidth assumptions. This work intends to resolve some of the issues involved with obtaining useful speedups for VAPOR.

SIParCS interns clearly benefit from their work experience in CISL, and sometimes their summer projects influence their careers. But the SIParCS program also provides significant benefits to CISL and the mentors, who gain new insights by working with people who bring new curiosity and fresh points of view. Well-conceived projects often advance the mentors’ research in directions that they don’t have enough time or staff resources to pursue. Just as importantly, the SIParCS program instills and reinforces CISL’s culture of service, giving, and lifelong learning.

Some of the ways SIParCS supports CISL’s goals are indicated by the example projects above. Most interns produce very high quality results for the laboratory that include:

  • Developing new technologies such as GPU acceleration
  • Increasing the speed and parallelism of leading community models
  • Optimizing high-performance software libraries
  • Adding new capabilities to community tools
  • Exploring new methods for improving scientific computing or Earth Science research

SIParCS is growing because it fills a vital need

SIParCS is growing rapidly because it is meeting the educational needs of students, and the goals of CISL, NCAR, affiliated research institutions, and the National Science Foundation. This year, support from the Colorado School of Mines, the University of Wyoming, and the Cooperative Institute for Research in Environmental Sciences/NOAA provided funding for three additional students.

SIParCS has also grown through sponsorship by the NCAR Director’s Diversity Fund, which supported two students from groups that are underrepresented in the computational sciences. In its fourth summer, the intern demographics of the SIParCS program continued to broaden: 20 students from 17 colleges and universities were selected from a nationwide pool of 55 applicants. This represents significant growth since the program’s 2007 inception with seven interns from four universities.

Interest and involvement from the students’ advisors at their home institutions also continued to increase in 2010. This promises to improve the strategic impact of the program, generating new collaborations and building stronger relationships with member Universities and the research community.

SIParCS is also engaging its participants in outside education and training opportunities wherever possible. For example, 10 interns attended the Virtual School of Computational Science and Engineering (VSCSE) Petascale Programming Environments and Tools class sponsored by the Great Lakes Consortium for Petascale Computation. This class focused on petascale programming tools and techniques and was held using Access Grid technology at 10 institutions around the U.S., including CISL’s Visualization Laboratory. Because NCAR is part of the TeraGrid, SIParCS also connects with students from other TeraGrid sites. For example, this year SIParCS hosted two students from universities affiliated with TeraGrid Resource Providers, namely UIUC and LSU.

SIParCS supports CISL's core educational mission to cultivate the future HPC workforce by providing the necessary research experiences and training in the mathematical and computational science concepts. The SIParCS program integrates research and education, and through outreach, broadens participation in the computational sciences.

SIParCS 2010
In Boulder, Colorado from late June to early August 2010, these SIParCS interns were paid to work for CISL on projects in computational science, applied mathematics, and geostatistics. They kept research journals, attended appropriate technical seminars and skills-enhancing workshops, and gave oral presentations of their results. They may receive additional support to write papers for research journals or present their findings at conferences.

 

The SIParCS year

This brief overview shows the sequence of significant events during each year’s SIParCS program.

  1. Throughout the year, CISL mentors envision ways that their work can be supported or expanded by summer intern projects. Mentors write project briefs and submit them to the SIParCS program office.
  2. The SIParCS program office announces upcoming internship opportunities, typically by late November.
  3. Internship opportunities are made available to potential applicants via the SIParCS website, by email to points of contact at numerous colleges and universities, and via brochures distributed at conferences, job fairs, and other venues.
  4. Interested undergrads and graduate students apply before mid February via the SIParCS website.
  5. The SIParCS program office determines how many projects can be funded, matches applicants with projects, selects the best-qualified applicant for each project, then announces placements by late March.
  6. The selected interns travel to Boulder, Colorado in June. In addition to salary, they receive round-trip airfare from within the U.S., and a local bus pass. Free housing is also provided during their 10-week stay in Boulder.
  7. Summer research and development projects involve a wide variety of experiences with a diverse collection of high performance computing equipment, software development projects, parallel computational science problems, and analysis of data and numerical methods. All of these projects are tied to the HPC systems and activities that support NCAR’s scientific mission. Interns work in offices among groups of computer scientists, researchers, and software developers who are pushing the boundaries of high performance computing.
  8. All interns keep a research journal, attend appropriate technical seminars, participate in skills-enhancing workshops, and give an oral presentation of their research results.
  9. Interns may receive support to publish papers, attend conferences, and present results related to their summer internship experience to the research community.

SIParCS is shaping the lives and careers of the young researchers who participate. David Appelhans, an 2010 intern from the Colorado School of Mines, offered this advice to prospective applicants: “If you’re a ... student considering this program, I would say don’t worry if you’re not accepted the first time. There’s a lot of applicants, and it’s worth it to keep applying, and it’s also worth it to take the internship.”

SIParCS Director Rich Loft is optimistic about the future of the program: “SIParCS has tripled in size since 2007 and has gone national in terms of student participation. If SIParCS is a successful program, and I think that’s the case, it is because of the dedication and hard work of our mentors, as well as the administrative support staff who ensure that the students have a quality experience. We continue to work tirelessly to attract underrepresented groups. We have done well in some areas: for example 20% of 2010’s interns were female. Our goal remains to attract a truly diverse talent base to the computational sciences.”