In an article last month, we described how CISL staff, led by Associate Scientist Raghu Raj Kumar in the Technology Development Division, planned to use “Pi Day” on March 14^{th} (3/14/15, or 3.1415… = π) as a teaching moment for students interested in high-performance computing.

On Pi Day, CISL deployed its 12-node Raspberry Pi cluster on a mission to calculate the first 1,000,000 digits of π in parallel. Here’s an update on what happened.

First, the Raspberry Pi cluster did eventually succeed in cranking out all 1 million digits. For those seeking an incredibly accurate value of π, the 831,604 digits of that magical ratio of the circumference to diameter of a circle can be found here. Don’t be confused if you find numbers and letters in the digits: it is expressed in compact hexadecimal notation - base 10 decimals are for wimps! In case you’re wondering, or are just into π trivia, the 831,604^{th} digit of π in “hex” is 2.

The interesting thing is that we learned a few things ourselves (the hard way) about doing these kind of parallel computations on a cluster.

First, the cost of calculating each digit grows four digits farther to the right of the decimal place, which means that we needed to *load-balance* the π calculations across our cluster. In fact, in our demo, we assigned Raspberry Pi node 0 to work on more than quarter of the digits, whereas node 11 was given a mere 7,604. However, the cost of the calculating π digits did not vary as smoothly as we thought, and so our static decomposition of the calculation across nodes did a rather poor job of balancing the workload: some of our nodes finished in one day (our original goal), while some took a week to finish. In hindsight, a better parallelization technique could have been chosen to dynamically balanced the workload.

Also, our I/O solution left something to be desired. Without a flush command in the code, digits were buffered by the system and only appeared in huge batches on our website after days or hours of calculating, instead of producing an eye-pleasing steady stream. Finally, poor node 7 perished in the performance of its duties. It appears to have overheated due to a work overload, and died a horrible numerical death, sacrificed on the altar of our π experiment. We had to distribute its workload to the surviving nodes to finish up.

Lesson learned. In the words of baseball fans: wait until next year!