Abstracts and Bio
Richard Moore, SDSC, UC Shared Research Computing
About the Speaker:
Dr. Richard Moore is the Deputy Director of the San Diego Supercomputer Center at UCSD.
He has been deeply involved with UC systemwide and campus projects, as well as being the
principal investigator for the National Science Foundation's TeraGrid high-performance
computing program at SDSC.
Abstract:
UCOP-sponsored Shared Research Computing Services, or ShaRCS, program.
Bill Strossman, UCR
About the Speaker:
Bill Strossman received his Ph.D. in high-energy physics from UCR in 1995.
His research was in neutrino oscillations. He is currently the UCR campus grid adminstrator
for both Open Science Grid and UC Grid. He also provide HPC support for UCR researchers
Shava Smallen, SDSC (Title: Experiences with the FutureGrid Testbed)
About the Speaker:
Shava Smallen received an M.S. in Computer Science at the University of California,
San Diego in 2001 and did her thesis work on tunable application-level scheduling
for an online parallel tomography application. Currently a programmer in the
UC Compute Systems group at the San Diego Supercomputer Center, she is the project
lead and developer for the Inca monitoring software tool (originally developed for TeraGrid)
and is UCSD's representative for the UC Grid project. Shava also serves as SDSC's PI
and chair of the Performance Committee for the FutureGrid project led by Indiana University,
where she previously did Grid portal development work for the Extreme! Computing
Lab in the Computer Science department.
Abstract:
FutureGrid is an experimental testbed that will be used to develop and test novel approaches to
parallel, grid, and cloud computing. These include topics ranging from authentication,
authorization, scheduling, virtualization, middleware design, interface design and cybersecurity,
to the optimization of grid-enabled and cloud-enabled computational schemes for researchers in
astronomy, chemistry, biology, engineering, atmospheric science and epidemiology. FutureGrid
supports user “experiments” that can be run either on the bare hardware or on virtual machines
using Cloud technologies Nimbus or Eucalyptus. This talk will describe FutureGrid’s features,
architecture, and early users as well as describe our early experiences in setting up this
infrastructure. The FutureGrid project is funded by the National Science Foundation (NSF)
and is led by Indiana University with University of Chicago, University of Florida,
San Diego Supercomputing Center, Texas Advanced Computing Center, University of Virginia,
University of Tennessee, University of Southern California, Dresden, Purdue University,
and Grid 5000 as partner sites.
Rich Wolski, Eucalyptus
About the Speaker:
Dr. Rich Wolski is the Chief Technology Officer and
Co-founder of Eucalyptus Systems Inc., and a Professor of
Computer Science at the University of California, Santa
Barbara (UCSB). Having received his M.S. and Ph.D. degrees
from the University of California at Davis (while a
researcher at Lawrence Livermore National Laboratory) he has
also held positions at the University of California, San
Diego, and the University of Tennessee. He is currently also
a strategic advisor to the San Diego Supercomputer Center
and an adjunct faculty member at the Lawrence Berkeley
National Laboratory. Rich has led several national scale
research efforts in the area of high-performance distributed
computing and grid computing, is the author of numerous
research articles concerning the empirical study of
distributed systems, and is the progenitor of the Eucalyptus
project.
Jeffrey Barr, Amazon Web Services
About the Speaker:
Jeff Barr is focused on furthering awareness among software developers of the
opportunity to innovate and build businesses using Amazon Web Services.
Jeff has written over 1000 posts for the AWS blog and another 1100 for his
personal blog and is also the author of Host Your Website in the Cloud (2010).
He earned his bachelor's degree in computer science from American University
and is a part-time graduate student in communication and digital media at the
University of Washington.
Shane Canon, LBL, Magellan: A Testbed to Explore Cloud Computing for Science.
About the Speaker:
Dr. Shane Canon received his Bachelor’s degree in Physics from Auburn Universary in 1993,
and his PhD in Physics from Duke University in 2000. From there he was recruited as a
System Administrator at Lawrence Berkeley National Laboratory and was responsible for
administering Linux Cluster and other HPC systems. In 2005, Dr. Canon joined the
Oak Ridge National Laboratory as a group leader. One of the more significant
accomplishments while at ORNL was architecting the 10 petabyte Spider File System.
In 2008, Shane returned to LBL to lead the Data Systems Group. In 2009, he
transitioned to leading the newly created Technology Integration Group which is
investigating Cloud Computing for science and works with the bioinformatics community.
Chandra Krintz, UCSB (Title: An Opening in the Clouds: Open-source Cloud Computing at UCSB)
About the Speaker:
Chandra Krintz is an Associate Professor at the University of
California, Santa Barbara (UCSB). She joined the UCSB faculty in 2001
after receiving her M.S. and Ph.D. degrees in Computer Science from
the University of California, San Diego (UCSD). Chandra's research
interests include automatic and adaptive compiler, programming
language, virtual runtime, and operating system techniques that
improve performance (for high-end systems) and that increase battery
life (for mobile, resource-constrained devices). Her recent work
focuses on programming language and runtime support for cloud
computing. Chandra has supervised and mentored over 40 students,
has published her work in a wide range of ACM venues including
ECOOP, PACT, PLDI, OOPSLA, ASPLOS, and others, and leads several
educational and outreach programs that introduce computer science
to young people, particularly those from underrepresented groups.
Chandra's efforts have been recognized with an NSF CAREER award,
the CRA-W Anita Borg Early Career Award (BECA), and the UCSB
Academic Senate Distinguished Teaching Award. Chandra is
also an ACM and IEEE Senior Member.
Abstract:
Cloud Computing is a term coined for a recent trend toward
service-oriented computing on cluster-based distributed systems. Such
systems provide users with a computing paradigm based on service-level
agreements with which users gain access to vast, large-scale resources
remotely. Cloud computing has experienced rapid uptake in the
commercial sector with companies such as Amazon, Google, Microsoft and
others providing "elastic" or "utility" products and services that
enable resource virtualization, application and system isolation,
large-scale, adaptive resource management, and per-application
customization via software infrastructures at different levels of
abstraction. These systems however are closed and proprietary which
limits both our understanding of execution behavior in these settings
as well as the degree of control and customization available to
application and system developers.
To address this limitation, to enable the use of such systems on local
cluster resources (private and hybrid clouds), and to provide the
software infrastructure necessary for the investigation of the
next-generation of cloud services, technologies, application domains,
etc., the cloud computing group from the UCSB computer science
department has developed and released as open-source, the AppScale
platform-as-service cloud fabric. AppScale is a software ensemble for
that both facilitates energy-efficient cloud computing research and
implements the Google App Engine (GAE) APIs (emulating the GAE cloud).
AppScale operates over virtualized cluster resources, Eucalyptus (also
from UCSB), and Amazon Web Services (AWS) Elastic Compute Cloud
(EC2), automatically, and executes GAE applications without modification.
AppScale implements a number of different database back-ends, a map-reduce
interface, as well as performance and resource monitoring for
automatic scaling. In this talk, we overview AppScale, describe its
design and implementation, and discuss the various research directions
that our group is pursuing using AppScale.
Philip Papadopoulos, SDSC, Using The Rocks Cluster Toolkit to extend local clusters into EC2
About the Speaker:
Dr. Philip Papadopoulos received his PhD in 1993 from UC Santa Barbara in Electrical Engineering.
He is currently the Director for UC Systems at the San Diego Supercomputer Center (SDSC) and
Adjunct in the Department of Computer Science. He is well-known as the principal investigator
of the open-source Rocks Cluster Toolkit. More than 2000 Clusters have been built with Rocks
ranging in size from 2 - 3000 nodes. Current development
efforts include extensions for virtual machines, virtual clusters, and high-performance storage.
Abstract:
Rocks is a toolkit the most often used to define, install and manage
hardware clusters. It is used in 1000s of installations worldwide,
scales to very large clusters and is readily extensible using the
concept of Rolls. Rolls are collections of packages and configuration
instructions to integrate new capability into a basic cluster stack.
Rolls for compilers, bioinformatics applications, parallel file
systems, software modules, statistical packages, debuggers and more
all exist and are in use on production-level systems. In 2008, we
introduced the Xen Roll to add a new kind of cluster appliance: a
virtual machine host or VM container. With this addition, Rocks
easily supports real clusters, complete virtual clusters (frontends +
compute as virtual machines) and hybrid mixes of real and virtual
hardware. The next logical step, now enabled by the EC2 roll, is to
re-use all of the cluster software machinery to first author VMs that
are compatible with EC2 and then to integrate running instances into a
locally owned cluster. In this talk, we'll describe both the
Rocks-defined Xen-based hosting environment and the
automated process to build, bundle, upload, and run Rocks-authored VMs
in EC2. We'll also describe some of the details and challenges of
adding nodes in EC2 to a running cluster.
Gregory Kurtzer, LBL
About the Speaker:
Greg Kurtzer is one of the names quoted in the top 1% of Linux and High Performance
Computing (HPC) professionals in the world. After his BS in
Biochemistry in 1997 he started his career as a Software Engineer at
Linuxcare. Later in October 2000 he joined Lawrence Berkeley National
Laboratory and has been one of the main persons responsible for
development of the scientific computing service. Greg developed
Warewulf and Perceus, most widely utilized and successful cluster
management solutions in the industry. Outside work he started and lead
the Caos foundation from 2002 to 2005 which has become Centos today.
At present he is working as a HPC Technical Lead and Systems Architect
at LBNL and is responsible for the technical architecture and
management of approximately 40 scientific cluster solutions covering a
wide range of scientific disciplines.
Jianwu Wang, SDSC (Title: Facilitate parallel computation using Kepler workflow system on virtual resources)
About the Speaker:
Dr. Jianwu Wang is an assistant project scientist at Scientific
Workflow Automation Technologies (SWAT) Laboratory, San Diego
Supercomputer Center, UCSD. He got his Ph.D. in computer science from
Chinese Academy of Sciences in Beijing China at 2007. He has published
over 20 papers. His main research interests are Scientific Workflow,
Distributed Computing, and Service-Oriented Computing. More information
about his research can be found at his web site:
http://users.sdsc.edu/~jianwu/
Abstract:
Kepler is one of the most popular open-source scientific workflow systems (http://kepler-project.org),
developed by a cross-project collaboration. It has been used in many scientific disciplines and over
20 e-Science projects as a solution to the need for management, processing and analysis of rapidly
increasing amount of scientific data. Since its initiation in 2003, Kepler has been downloaded over
40K times. In this talk, we will first present basic background information of Kepler system and
its supports for various distributed execution requirements. Then we will focus on data-parallel
scientific workflow scheduling via Kepler on physical/virtual compute cluster resources.
Data-parallel workflows apply the same set of tasks to a number of independent input data simultaneously.
Yet the workflow execution times for these input data might vary, which causes a scheduling challenge
when running the workflow on a compute cluster resource. To deal with the scheduling problem,
we use Kepler to balance compute resource workload for workflow execution. Two Computational Chemistry
workflows are built using Kepler scientific workflow system and interacting with Oracle Grid Engine and
Hadoop job schedulers respectively. Our experiments show that our approach can balance computational
resource workload well by interacting with the above-mentioned compute cluster job schedulers and provides
good workflow execution performance on three kinds of execution environments: physical cluster,
partially virtual cluster and fully virtual cluster.
Gideon Juve, USC, Scientific Workflows in the Cloud (USC/ISI)
About the Speaker:
Gideon is a PhD candidate at USC. He is working at the Information Sciences
Institute on workflow technologies for scientific computing,
Abstract:
The development of cloud computing has generated significant interest in
the scientific computing community. In this talk we describe our efforts
to deploy scientific workflows on cloud platforms. We examine the
benefits and challenges of running workflows in the cloud, and describe
the tools we use to automatically provision and configure virtual
clusters for workflows. We present the results of several studies that
we have done to evaluate the cost and performance of cloud workflows,
and relate our experiences running workflows on Amazon EC2, Magellan,
and FutureGrid.
Sriram Krishnan, SDSC (Title: myHadoop - Hadoop-on-Demand on Traditional HPC Resources)
About the Speaker:
Sriam is a Senior Distributed Systems Researcher and Group Leader in the
Advanced CyberInfrastructure Development Group (ACID) at the San Diego
Supercomputer Center (SDSC). At SDSC, he is a (Co-)Principal
Investigator on a couple of research grants studying the
implementation and performance of data intensive applications on cloud
and HPC resources. He is also the Informatics Lead at the
Bioinformatics and Biostatistics Shared Resource (BBSR) at the Moores
UCSD Cancer Center. His areas of expertise are Service-oriented
Architectures (SOA), and Grid and Cloud computing middleware and
applications.
Abstract:
Traditional High Performance Computing (HPC) resources, such as those available on the TeraGrid,
support batch job submissions using Distributed Resource Management Systems (DRMS) like TORQUE
or the Sun Grid Engine (SGE). For large-scale data intensive computing, programming para-digms
such as MapReduce are becoming popular. A growing number of codes in scientific domains such as
Bioinformatics and Geosciences are being written using open source MapReduce tools such as Apache Hadoop.
It has proven to be a challenge for Hadoop to co-exist with existing HPC resource management systems,
since both provide their own job submissions and management, and because each system is designed to have
complete control over its resources. Furthermore, Hadoop uses a shared-nothing style architecture,
whereas most HPC resources employ a shared-disk setup. In this talk, we describe myHadoop, a
framework for configuring Hadoop on-demand on traditional HPC resources, using standard batch
scheduling systems. With myHadoop, users can develop and run Hadoop codes on HPC resources,
without requiring root-level privileges. Here, we describe the architecture of myHadoop, and
evaluate its performance for a few sample, scientific use-case scenarios. myHadoop is open source, and available for download on SourceForge
Shreyas Cholia, NEWT - A RESTful service for HPC on the web (NERSC)
About the Speaker:
Shreyas Cholia works on science gateway, cloud and grid technologies
for NERSC, with the goal of making high-performance and distributed
computing more transparent and accessible. He has also been involved
in various grid and data-driven science efforts at NERSC since 2002.
Prior to his appointment at NERSC, Shreyas was a developer and
consultant at IBM.
Abstract:
NEWT (NERSC Web Toolkit) brings High Performance Computing (HPC) to
the web through easy to write web applications. We want to make HPC
resources accessible and useful to scientists who are more comfortable
with the web than with traditional HPC tools, batch queues and command
line scripts. NEWT is a web service that allows you to access
computing resources at NERSC through a simple RESTful API. The NEWT
API and web service will let you interact with the NERSC center
through simple HTTP urls and commands. NEWT responds to client
requests using JSON. This makes it very easy to build powerful web
applications using nothing but HTML and Javascript. NEWT is targeted
at science groups that want to build web applications around their HPC
workflows, while hiding the low level HPC/grid mechanics that enable
this.
Anthony Joseph, UCB, Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
About the Speaker:
Anthony D. Joseph is Director of Intel Labs Berkeley, and a
Chancellor's Associate Professor in Electrical Engineering and
Computer Science at UC Berkeley. He is developing adaptive techniques
for: cloud computing, distributed network monitoring and triggering,
network and computer security, and security defenses for machine
learning-based decision systems. He also co-leads the DETERlab
testbed, a secure scalable testbed for conducting cybersecurity
research. His principal field of interest is systems and networking:
cybersecurity, datacenter architectures, mobile systems and
networking, and overlay networks.
Abstract:
Mesos is a platform for running multiple diverse cluster computing
frameworks, such as Hadoop, MPI, and web services, on commodity
clusters. Sharing improves cluster utilization and avoids per-framework
data replication. Mesos shares resources in a fine-grained manner, which
allows frameworks to achieve data locality by taking turns reading data
stored on each machine. To support the sophisticated schedulers of
today's frameworks, Mesos introduces a distributed two-level scheduling
mechanism, called resource offers. Mesos decides how many resources to
offer each framework, while frameworks decide which resources to accept
and which computations to schedule on these resources. Our experimental
results show that Mesos can achieve near-optimal locality when sharing
the cluster among diverse frameworks, can scale up to 50,000 (emulated)
nodes, and is resilient to node failures.
Dingying Wei, Distributed Data Management for Biomedical Research (UCI)
About the Speaker:
Dingying has a M.S. in Meteorology and Ph. D. in Physics. She worked in the State Oceanic
Administration of China on optimization of far-ocean cargo routing with stochastic modeling
in 1980s, marine pollution forecasting in University of Virginia and cloud physics
in New Mexico Institute of Techonology in 1990s, a financial system for UCI foundation and
federated data management system for Biomedical Informatics Research Network at UCI since 1998.
Abstract:
The functional Biomedical Informatics Research Network (fBIRN) has developed and implemented a
data management system to facilitate the consortium's muti-site brain imaging research projects.
The system uses globusFTP for data storage, federated databases for metadata. The system contains
a variety of clients for data exchanges and data management needs in all phases of projects
from data acquisition to data publishing
Kent Wada, UCLA, UC Policy Developments Regarding Privacy and Security in the Cloud
About the Speaker:
Kent Wada is Director of Strategic IT and Privacy Policy for UCLA. He works with
counterparts nationally, across the UC system and with campus governance bodies
to address institutional issues such as copyright and privacy that sit at the
intersection of policy, law and people. He currently chairs the University of
California's Working Group on Privacy and Information Security for the 18-month
initiative charged by UC President Yudof to examine this delicate balancing act
in light of today's obligations and expectations. His primary interest continues
to be in the application of emergent technologies to people's lives and in the
protection of civil liberties in cyberspace.
Abstract:
We've been learning a lot in the last couple of years about the privacy and security implications
of using cloud services, whether by individual faculty and staff taking advantage of apps like Skype,
or by institutions negotiating agreements with Google for campuswide Gmail service. At the same time,
UC President Yudof has charged an 18-month initiative to reexamine the appropriate balance between
privacy and information security for the University, potentially changing the tenor of the workplace
for all UC employees and students and having implications for research, cloud-based or
otherwise (universityofcalifornia.edu/privacyinitiative).
This session will briefly touch upon these recent developments so audience members have a better
awareness of current privacy and security issues at UC and can follow up on those that could affect
them directly.
Curtis Salinas, UCB, Building a Private Virtualization Infrastructure: Lessons Learned and Future Directions @ UC Berkeley
About the Speaker:
Curtis Salinas currently works in the Information Services Technology department at
UC Berkeley as a Senior Systems Administrator on the Cloud Services and Enterprise
Windows Teams. He has been the lead for the campus' centralized virtualization
infrastructure for the past few years, including the Virtual Private Server and
Virtual Private Cluster Services. These services form the backbone for the servers running
much of UC Berkeley's critical IT infrastructure, and are actively being extended to
other UC campuses such as UC Merced and UC San Francisco.
Abstract:
This presentation will examine the research, design revisions, and automation efforts that have
shaped the virtualization services hosted by the Information Services and Technology
department at UC Berkeley. Despite the promise that the public cloud offers, it is
still the case in higher education that many factors currently prevent the complete
migration of computing off of internally owned and managed IT infrastructure.
Until the time comes when we can fully trust outside organizations with the
stewardship of our data, many institutions will have to invest in private
infrastructure to house the data that policy and practicality will not currently
allow us move outside of a controlled environment. After roughly 7 years of datacenter
virtualization adoption, UC Berkeley has in place a strong foundation on which to build our
private cloud needs and to bridge the gap to the public cloud.
Kai Song, LBL, Beyond Clusters: Adapting Lattice Optimization for Cloud and GPU
About the Speaker:
Kai received bachelor's degree in Computer Science at U.C. Berkeley, and is now
working at the High Performance Computing Support group in Lawrence Berkeley Lab.
He is interested in parallel and GPU computing.
Abstract:
A C/C++ Lattice Optimization application has been developed and running on our local Lawrencium cluster.
As Amazon Cloud and GPU platforms give us more options for on-demand, energy efficient computing,
we explored the feasibility and limitations of both technologies. Further comparisons were made
between Amazon Cluster Compute Instances(CCI) and our local clusters. GPU, on the other hand,
with its uniqe architecture and programming model, has been challenging to port existing C/C++ code.
However, there are many interesting lessons learned regarding GPU development and programming method.
Paul Weakliem, UCSB
About the Speaker:
Paul received his BA from Boston University in 1986 and PhD in Chemistry
from UCLA in 1992 where he studied growth and etching of semiconductor
surfaces by computer simulation. After a post-doc studying water and
ion migration in zeolites he became the Information Technology Director
for CNSI at UCSB.
Harry Mangalam, UCI
About the Speaker:
Harry received his PhD in Regulation of Gene transcription
at UCSD and his post-doc in Drosophila biology at the Salk
Institute. Although he has a strong background in Biology,
he is also very knowledgeable in technology in computing. He
applies this to his work in bio informatics and biological
databases. He is currently a member of both the Office of
Information Technology and Genomics High-Throughput Facility staff at UC Irvine.
Krishna Muriki, LBL
About the Speaker:
Krishna Muriki received his Bachelor of Technology, Electronics and Computer
Engineering from Jawaharlal Nehru Technological University in 2001, and his
Master of Science, Computer System Architecture from Clemson University in
2004. Muriki joined the San Diego Supercomputer Center as a High Performance
Computing Consultant in 2005 and continued to work there, maintaining SDSC’s
computational and storage resources in a condition that allows National
Science Foundation researchers to get their science done quickly and easily,
until 2008 when he was recruited to the Lawrence Berkeley National
Laboratory team as a Computer Systems Engineer, where he now resides.
Prakashan Korambath, UCLA
About the Speaker:
Prakashan Korambath received his Ph.D. from U. of Memphis, Memphis, TN.
He was a NSF postdoctoral fellow at The Ohio State University, Columbus, OH,
followed by a NRC associateship position at AFRL, Kirtland Air Force Base, Albuquerque, NM.
He is a contributor to computational chemistry software packages such as MOPAC, GAMESS and Q-Chem.
He is one of the UC Grid software developers at UCLA. His current interests are Grid, IaaS,
Storage as a Service, workflows and theoretical study of nonlinear optical properties. He is a HPC
consultant at UCLA.
Abstract:
This session is about building your own private clouds using open source tools.
Some of the experience in using Xen, KVM, Eucalyptus, Nimbus, OpenNebula and Amazon EC2 will
be discussed. Implementation of Hadoop in a SGE cluster will also be discussed.