Introduction
A reproducible environment on public or private clouds is very important for any commercial and science executable binaries. Using an automation approach to software installation and configuration management with such an environment, much data analysis can be done automatically. Programmers and scientists can focus on writing their own applications without worrying about the complicated details of software installation and configuration. The learning curve of different software tools and cloud infrastructures can be reduced. salsaDPI is an on-demand dynamic provisioning software that runs on public or private clouds. It has been tested to support user-defined binaries with more than 80 VM instances on the FutureGrid Eucalyptus cloud. This allows one to conveniently access the best features of clouds.
Features
Automates environment settings and application execution
Supports various cloud infrastructures and storage models
Executes user-defined binaries and returns results
GUI interface from Portal to cloud
Tutorial
The tutorial page provides instructions for using this online interface.
Show case: salsaDPI in Science Cloud Summer School 2012
In the summer of 2012, Indiana University and other 9 university sites hosted a week-long virtual Science Cloud Summer School. This conference introduced cloud technologies and science applications to over 200 graduate students and staff participants across the nation. SalsaDPI was used as one of the cloud tools and contributed to a tutorial for reproducible environments. It has the ability to deploy both a single node (sandbox) and a virtual cluster. Also, it automatically executes Hadoop and Twister applications like WordCount and Kmeans after installing the selected software stack. The tutorial is available online. A video example of Hadoop WordCount can be accessed from Youtube (please view it with video quality of 1080p).