README

From UGP-Wiki

Jump to: navigation, search

UGP README

                        Academic Technology Services
                    University of California, Los Angeles
                               December, 2006

Introduction

A Grid is a collection of independently owned and administered resources which have been joined together by a software and hardware infrastructure that interacts with the resources and the users of the resources to provide coordinated dynamic resource sharing in a dependable and consistent way according to policies that have been agreed to by all parties. Because of the large number of resources available on a Grid at any given time, an individual researcher can always be provided with the best resources available at that point of time for his/her needs, and overall, resource utilization can be distributed for maximum efficiency.

The UCLA Grid Portal Software (UGP) and the UCLA Grid Architecture bring computational clusters together into a Grid. The hardware resources making up the Grid consist entirely of computational clusters each of which consists of a head node, compute nodes, storage nodes, network resources, software, and data resources. Individual computational clusters can be quite large, containing hundreds of nodes.

By incorporating the concepts of pooled resources and Pool Users, UGP facilitates the sharing of resources among users. Administrative overhead is reduced because there is no longer a need to add individual user login ids on multiple clusters. UGP is a software product that provides a web-based Grid Portal. It allows users to interact with distributed computing clusters at the campus or institution level and the grids it works with can be hierarchical in that a higher level Grid Portal can also be created. For example, at the University of California, we deploy a University of California (UC) Grid Portal which interacts with the Grids and Grid Portals at the ten University of California campuses to allow users to access resources across campuses.

Single Campus/Institution Architecture

In the UCLA Grid Architecture clusters are connected to the Grid Portal by the addition of a single Grid Appliance to each cluster. A Grid Appliance is a separate machine that that serves as a gateway between the portal and the cluster. It runs a customized Globus Toolkit with additional UGP components. Each Grid Appliance is basically, a parallel head node for the compute cluster to which it is attached. For security, it runs a firewall and is open only to the cluster to which it is attached and the Grid Portal machine. All transactions between the Portal and Appliances use public key cryptography conforming to X-509 certificate standard. Adding a Grid Appliance to a cluster in no way modifies policy decisions at the cluster level. Any participating cluster can always also be used directly, without having to go through the Grid Portal.

Users interact with the Grid Portal through an https connection from a web browser. The Grid Portal uses GridSphere to run the portal and Jakarta Tomcat to run the web-interface. UGP is implemented as a set of portlets and interacts with Globus Toolkit. UGP uses MySQL database for the database of information about users, clusters, applications and job status that it needs to run the Portal. The architecture also includes a MyProxy server to store user certificates, a storage server connected to the Grid Portal to provide storage space for pool-only users, and a visualization server to drive the software that is optionally required to provide through-the-web data visualization services to users. With a stand-alone (independent) Grid and Grid Portal, UGP:

 • Provides a stand-alone through-the-web interface to all of the clusters in
   a Grid. This interface hides user interface and scheduler differences among
   the clusters and it makes it easy to work with multiple clusters at once.
 • Provides a single login for users. A user logs into the Grid Portal, not
   into each of the individual clusters that the user will use.
 • Provides resources both to: users who have login ids on individual clusters
   (Cluster Users), and users who do not (Pool-Only Users). Any person with
   campus/institution affiliation who does not have access to any of the
   clusters, can easily gain access to resources throughout the Grid by
   becoming a Pool-Only User.
 • Is secure to the extent possible by up to date technology. Clusters can sit
   behind firewalls if that is their policy. A Grid Appliances is open only to
   the cluster to which it is attached and the Grid Portal. Proxy certificates
   are used for authentication at every step of the way (between Grid Portal
   and Grid Appliances). Users never handle their certificates.
 • Provides a user access to those resources on other independent Grids that
   that user has access to, provided that those independent Grids: 1) also run
   Globus ToolKit and 2) store user credentials in a MyProxy Server that is
   accessible from the Grid Portal.
 • Provides for resource pools and Pool Users. This allows cluster owners to
   expose unused resources by contributing them to resource pools on the Grid.
   Pooled resources in the default pool are available for use by anyone who
   can login to the Grid Portal. When a user submits a job to the pool, UGP
   selects that cluster which will give their job the best turnaround. In
   addition to providing the best turnaround for individual jobs, this
   optimizes performance and resource utilization throughout the Grid.

At the same time as the UGP architecture presents a uniform appearance to users, it provides for a Grid made up of diverse computing environments (hardware, operating systems, job schedulers) and autonomous administrative domains. Local organizations own and maintain control of the resources involved, and local administrative procedures and security solutions take precedence.

UGP makes use entirely of open source software: Globus ToolKit, Tomcat, Java, Gridsphere and MySQL. UGP itself is also open source.


Hierarchical Architecture

Grids running UGP have been set up in a hierarchical fashion at multiple University of California campuses and at the University of California as a whole.

The hierarchical design allows:

 • Each user of a Campus Grid Portal to automatically be signed up to use the
   UC Grid Portal.
 • A user who has a login id on clusters on multiple campuses to access the
   Campus Grid Portal of each of those campuses as well as the UC Grid Portal.
 • The UC Grid Portal to access every cluster that belongs to each of the
   Campus Grids, i.e., every cluster that participates in a Campus Grid also
   participates in the UC Grid.
 • Clusters at the campus level to contribute both cycles and applications to
   the default and specialty UC resource pools in addition to the campus
   resource pools of the local Campus Grid.


Services Provided by UGP

The main services that are currently provided by UGP to its users are:

Data Manager

   Allows one to: list and manage files on the clusters and Storage Servers;
   view and edit text files, view image files, visualize computational
   results; copy files and directories between a cluster or Storage Server and
   the user’s local machine (upload/download); and transfer files and
   directories between clusters or between a cluster and a Storage Server.

Job Service

   Allows one to submit a job and view the job status and results. Special
   application services provide easy access to all popular applications.
   Cluster users can submit jobs to specific clusters. All users can submit
   jobs to one of the resource pools. When a job is submitted to the resource
   pool, using a best fit algorithm, UGP selects the cluster to run it and
   stages all the input files to that cluster from any accessible cluster or
   Storage Server.

Grid Developmental environment (GDE)

   Allows users to compile Fortran, C/C++ code on any of the computational
   clusters. Both user-supplied and GDE-created Makefiles are supported.

Resources

   Allows one to see at a glance, the status of all the clusters. Both
   summarized status information, and detailed status information is provided.

Other Grids

   Provides Data Manager and Job Services for clusters, not part of the Grid
   connected to the Grid Portal the user is using, but which are part of other
   Grids that are open and not behind firewalls. TeraGrid resources at SDSC
   are an example of this.


The Grid at UCLA

At UCLA, the UCLA Grid is maintained by, and the UCLA Grid Portal is run by Academic Tecnology Services (ATS). ATS is the UCLA Grid Administrator for UCLA.

If you administer a cluster at UCLA, you can join your cluster to the UCLA Grid by getting a Grid Appliance from ATS and attaching it to your cluster. The Grid Appliance comes to you fully installed and configured by ATS. As a Cluster Administrator you have control over which users of your cluster can access your cluster via the UCLA Grid Portal. Users apply to use the Grid directly via the Grid Portal. You will recieve an email whenever a user of your cluster asks to be Grid enabled. A web-based administration tool is available for you to either approve or deny. Additional web-based administration tools allow you to manipulate the database, add trusted clusters, list and add users and groups.


Setting up a Grid using UGP at your own Institution

If you are not at UCLA you can download the UCLA Grid software and use it to set up a Grid at your own organization complete with Grid Portal. The software is in the public domain and is distributed under the Apache Public License. An easy-to-use installer is included with the UCLA Grid software. A GUI interface to the installer is available to customize the installation and install it in a single step. For those who do not have access to a GUI environment, a command line installer is also available that requires just the editing of a single file followed by the issuing of a single command. Complete installation instructions can be found in the files INSTALL.html and INSTALL.txt that come with the UGP distribution.

To set up a grid from scratch you will need to set up and install:

 • A Grid Portal machine with all the software for the Grid. You must first
   install, configure and start MySQL as root on this machine. Then the UGP
   installer will install and automatically configure everything else you
   need: Java 1.5, ant 1.6, Globus Tookit 4.0.3, Gridsphere 2.1, Jakarta
   Tomcat 5.5, Simple CA and UGP itself.
 • A Grid Appliance for each cluster that will participate in the Grid.
   Installation of a Grid Appliance node requires just Globus Toolkit 4.0.3.
   However, when a Grid Appliance is attached to the cluster that it serves,
   some file systems have to be cross mounted and the Appliance has to be made
   equivalent to a head node as far as the job scheduler is concerned. You can
   start a grid with as few as a single Grid Appliance and add additional
   Appliances later to connect additional clusters. Currently three job
   schedulers are supported: SGE, PBS, and LoadLeveler.
 • Additionally, you can choose whether or not to install one or more MyProxy
   servers. UGP comes with two Plugable Authentication Modules (PAMs). These
   authenticate the user upon signin against the Grid Portal's local Simple CA
   and the MyProxy servers. The Grid Administrator can set the priority among
   the authenticators and a user is allowed to login to the Grid Portal if the
   user can be authenticated by any of them.

Please read the INSTALL documentation for complete step-by-step install instructions.

The UCLA Grid Team

The UCLA Grid Team University of California, Los Angeles

UCLA Academic Technology Services

Box 951557, Mail Code 155705 Los Angeles, CA. 90095-1557

Contact email: atshpc@ucla.edu

Personal tools