Cybera’s Rapid Access Cloud is a small contributor to a global chain of compute nodes, brought together by the Open Science Grid to support science projects.
A computer cluster is when several “tightly connected” computers are made to work on a project together, essentially creating a single system. These clusters are a great means of directing powerful computing resources towards science or research problems.
But for the most data-intensive projects, a single cluster may not provide enough resources. Enter the Open Science Grid (OSG) — a federation of clusters from around the world that can be utilized for a variety of global science initiatives. As executive director Frank Wuerthwein describes: “The OSG is a consortium of the willing.”
Founded in 2005, the OSG provides common services and support for resource providers and scientific institutions “using a distributed fabric of high-throughput computational services”. Utilizing computer resources from around the world (including Cybera’s Rapid Access Cloud), the OSG supports scientists and resource providers alike. It also offers training to users, facilitators, and systems administrators on how to use the software stack and services.
“You can join the federation either by deploying our software, or letting us add your hardware to our stack,” says Wuerthwein.
He estimates that the OSG currently has around 20 compute clusters (for which OSG operates the API to connect the hardware to the federation); another 80-ish clusters that have downloaded the OSG API software, and run it themselves to connect their hardware; and a further 100 clusters internationally that run other people’s software stacks and API implementation. Wuerthwein calls the resulting resources “high-throughput computing” or “ingenious parallel computing.”
“Traditional, high-performance computing requires shared memory and message passing, and the executables can be tightly coupled,” says Wuerthwein. “High-throughput means there is no communication between the executables. This means there is no order in which they need to be executed, so the tasks can be done anywhere, at different times.”
This flexibility allows for more efficient use of computing resources, enabling more projects to get done using the same clusters.
Supporting all kinds of science
Currently, the OSG supports around 300 research projects, roughly two dozen of which are international in scope.
“What’s great about this setup is it can support any kind of science,” says Wuerthwein.
The list of projects utilizing the computing power of OSG run the gamut from economics forecasting, to designing cochlear implants or amputee prosthetics.
“We have one user who is studying evolutionary models — both in terms of genetics and in terms of modelling the hereditary transmission of traits — to better understand the spread of disease in groups of people,” says Wuerthwein. “Other projects involve looking at proteins, and where and how they bind, to create appropriate targets for pharmacology.”
In Canada, another big contributor / user of the OSG is the Cancer Computer, a volunteer organization that “supports cancer researchers by connecting them with the computer hardware, processing capacity and IT support they need to save lives.”
Promoting interoperability
A major benefit of the OSG setup is that it can integrate a variety of infrastructures.
“If you’re running a global experiment that requires in-kind contributions from universities around the world, ordinarily, you’d have to build project-specific infrastructure that everyone could access,” says Wuerthwein. “What the OSG is able to do, instead, is take their existing infrastructure pieces and allow them to work together. We’re an ideal integrator for these kinds of projects.”
Some of these projects are very large indeed, involving the interoperability of 100 clusters around the world (for a total of 200,000 cores). On the other end of the spectrum, some projects only need a few tens to hundreds of cores.
For Wuerthwein, the important thing is allowing all universities and projects — of all sizes — to contribute and make use of the shared resources. Another benefit of the OSG’s operating model is that it provides a good neutral ground for developers to share ideas, while still promoting competition.
“When software developers evolve the API we run, it allows others to learn from them. We have five different software teams that provide interoperability while still competing with each other, and influencing new ideas.” This cycle of learning and competing is important for developing new research capabilities.
For example: “Right now, with projects like the Large Hadron Collider in Switzerland, infrastructure researchers are trying to find ways to share and analyse an exabyte of data per year. We can allow infrastructure R&D towards these capabilities on the fringes of our shared science infrastructure. And over time, we can adopt the new APIs and tools that will emerge from this R&D.”
What’s next?
“I want every cluster on the planet to be accessible to collaboration,” says Wuerthwein. “I would like to see campuses (including zoos and museums) to federate globally, and by that I mean share their data and resources.” This would create a global computer that could be used by countless research projects, and enable countless new developments.
For now, there are specific hardware additions that he says the OSG could really use: 1) GPUs (“we have around 3,500 GPUs available now during peak use”); 2) CPUs (“always handy to have”); and 3) More data caches.
For the latter, Wuerthwein says the OSG would specifically benefit from more strategically placed caches, i.e. disk spaces in different network locations. Most of its current caches are in the US, with a few in Europe and Asia. “I would like to integrate more Canadian clusters, which means we could have a Canadian cache location and network provider.”
His ultimate call to action? “Bring me your cluster!” For more information, and to find out how to contat the Open Science Grid, visit the organization’s website.