FAQ

From UGP-Wiki

Jump to: navigation, search

FAQ


Contents

[edit] The Data Manager does not work

   The Grid Portal machine must be able to access itself and the Grid
   Appliances on port 2811. To test whether it can, from the Grid Portal try
   to telnet to the Grid Appliance on that port:
       telnet appliance.ucla.edu 2811
   If your appliance is running as a NIS client of the head node of the cluster, 
   make sure that NIS is still working, e.g. that ypcat passwd works.  If NIS is 
   stopped for some reason, end users will be getting a 'NULL' or 'Socked disconnect' 
   problem from the DM, and you will see 'Bad Password' errors in the catalina.out 
   file on the Grid Portal.

[edit] GRAM job submission fails

   Please make sure that /etc/sudoers file is updated with the following information
   for GLOBUS_LOCATION where GLOBUS_LOCATION in the example below is /home/globus/GT4
  1. /etc/sudoers
   globus  ALL=(username1,username2) 
   NOPASSWD: /home/globus/GT4/libexec/globus-gridmap-and-execute 
   -g /etc/grid-security/grid-mapfile
   /home/globus/GT4/libexec/globus-job-manager-script.pl *
   globus  ALL=(username1,username2) 
   NOPASSWD: /home/globus/GT4/libexec/globus-gridmap-and-execute 
   -g /etc/grid-security/grid-mapfile
   /home/globus/GT4/libexec/globus-gram-local-proxy-tool *
  1. /etc/grid-security/grid-mapfile
  Please make sure user certificate DN is entered inside the grid-mapfile


[edit] The Resources table does not show the Status (up or down) of a particular cluster

     - Two methods in retrieving resources information: aggregation or poll upon requests
     - The Resources are agragated in the background periodically. You will
       normally have to wait up to 10 minutes after starting GT4 on the Grid
       Appliance connected to the cluster before the information about the
       cluster shows up under Resources.
      
     - Did you execute srun.sh? See the section titled "After Installation" in
       this INSTALL.txt document.
     - Check whether you can connect to the Grid Portal machine at port 8443
       from the Appliance. From the Grid Appliance enter:
           telnet portal.machine.address 8443

[edit] You do not see anything after you login to the Grid Portal

   As the Grid Administrator, you must have added at least one Grid Appliance
   to the Portal for the Portal to work.

[edit] Users cannot submit jobs to a particular cluster

   You must make sure certificates have been created and placed in
   "credentialDir" on the Grid Portal machine.
   You cannot submit jobs when logged in to the Grid Portal as the GridSphere
   super user. You must login as either the Grid Administrator or any other
   regular user for which a certificate has been created. When the user logs
   in, a proxy certificate is created for that session. The proxy certificate
   is used as credential to access the remote clusters.
   Also the Grid Appliance and Grid Portal must have a trust relationship with
   the CA. For an Appliance make sure to followed the instructions about
   copying the file:
       .globus/simpleCA/globus_simple_ca....tar.gz

[edit] There are numerous different connection problems between the Grid Portal and an Appliance

     - The Appliance CA must be signed or accepted by the Grid Portal CA, or
       whatever CA you are using.
     - Both machine must trust each other.
   See the section titled "Configuring your Certificate Authority (CA)" for
   the details.

[edit] The Grid Portal cannot retrieve the status of jobs; it always reports a status of "Unsubmitted"

     - For SGE, you must be running version 6.0 or higher for the status to
       appear. You must have $SGE_ROOT and $SGE_CELL define in the
       environment, and also have the file named:
           $SGE_ROOT/$SGE_CELL/common/reporting.
     - For PBS, you must cross-mount the file system containing PBS's server
       log file on the Grid Appliance. You must have $PBS_HOME defined in the
       environment and also have the directory named:
           $PBS_HOME/server_logs

== You have two network cards in the Grid Appliance machine and Globus Toolkit

   is running only at local network IP. The Globus Toolkit needs to be running
   on the public IP. ==
   The solution is to modify the two files:
       $GLOBUS_LOCATION/etc/globus_wsrf_core/client-server-config.wsdd
       $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
   and change them as follows so it runs using the hostname instead of the IP
   address.
       <parameter name="logicalHost" value="appliance.ucla.edu"/>
       <parameter name="disableDNS" value="true"/>

[edit] GridFtp is not working

   Check /etc/hosts.allow and make sure it contains a line like this:
       globus-gridftp-server: [Portal machine IP]
   If you do not see that line, add it.

[edit] When you try to submit a job to a cluster from the Grid Portal you get the following error:

       Job submission failed: Caused By Job Submission Exception:
       org.globus.wsrf.container.ContainerException: Container failed to
       initialize [Caused by: Configuration file directory './etc' does not
       exist or is not a directory or is not readable.]
   You must have GLOBUS_LOCATION variables defined in jakarta-tomcat: Look for
   lines in $CATALINA_HOME/bin/catalina.sh that look like:
       # Set juli LogManager if it is present
       if [ -r "$CATALINA_HOME"/bin/tomcat-juli.jar ]; then
       JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=
       org.apache.juli.ClassLoaderLogManager" "-Djava.util.logging.config.file
       ="$CATALINA_BASE/conf/logging.properties"
       JAVA_OPTS="$JAVA_OPTS "-DGLOBUS_LOCATION=/home/globus/UCPortal/GT4
   In the last of these lines replace /home/globus/UCPortal with the full path
   of your install directory.

[edit] Can the Grid Admin manually add a user?

   Yes. the Grid Admin can. Just go to Grid Admin:User Admin:Create New User form. 
   After the submission of the form, the user will receive an email for activating his/her account. The user creates
   the new password during that activation process.

[edit] How to join the UC Grid if your department has a cluster?

     If you come from any department or organization within any of the UC campus, you can join the UC Grid.  
        1. Contact Campus Grid Admin and asking for permission
        2. Install the Grid Appliance node, attach that to your cluster. 
        3. Configure the Grid Appliance node so that it can submit the job.
        4. Install UGP software   
        5. Generate a host certificate and send to UC Grid Admin
        6. UC Grid Admin signed the certificate, send back the signed host certificate
        7. Put host certificate to /etc/grid-security
        8. Ask Campus Grid Admin to add your cluster in the Campus Grid Portal. 
        9. Once the cluster is added to the Campus Grid Portal, it will show up in UC Grid Portal.

[edit] How does the user reset the password?

   In main page, click "Fogot your Grid Password?" page. You must authenticate yourself by Shibboleth. 
   An email will be sent to you for confirmation.  Once you confirm your email, the Grid admin would be notified
   for the reset. The Grid Admin login to the portal, click "Grid Admin:User Requests:ResetPasswordApproval", click 
   "Approve". The user would receive email to reset his/her password.

[edit] How does the user add an additional resource?

   First, you have to contact the resource admin for accounts.  Once you have an account, 
   you could login to the portal and click "Add Resources", and select the resource to add. 
   When the resource admin approves your request, the resource will be added to your resource access list.
Personal tools