Install Troubleshooting
From UGP-Wiki
[edit] VI. Troubleshooting
1. The Data Manager does not work
The Grid Portal machine must be able to access itself and the Grid Appliances on port 2811. To test whether it can, from the Grid Portal try to telnet to the Grid Appliance on that port:
telnet appliance.ucla.edu 2811
2. The Resources table does not show the Status (up or down) of a particular
cluster
□ The Resources are agragated in the background periodically. You will
normally have to wait up to 10 minutes after starting GT4 on the Grid
Appliance connected to the cluster before the information about the
cluster shows up under Resources.
□ Did you execute srun.sh? See the section titled "After Installation" in
this INSTALL.txt document.
□ Check whether you can connect to the Grid Portal machine at port 8443
from the Appliance. From the Grid Appliance enter:
telnet portal.machine.address 8443
3. You do not see anything after you login to the Grid Portal
As the Grid Administrator, you must have added at least one Grid Appliance to the Portal for the Portal to work.
4. As Grid Administrator, you added a user manually, without the user filling in the form on the web. The user can login but cannot work with any of the clusters. What is wrong?
If you manually add a user as Grid Administrator, you must also have a Cluster Administrator of a cluster on which that user has a login id, Grid enable that user for that cluster. See the steps in the section above on "Adding Users", for the details.
5. Users cannot submit jobs to a particular cluster
You must make sure certificates have been created and placed in "credentialDir" on the Grid Portal machine.
You cannot submit jobs when logged in to the Grid Portal as the GridSphere super user. You must login as either the Grid Administrator or any other regular user for which a certificate has been created. When the user logs in, a proxy certificate is created for that session. The proxy certificate is used as credential to access the remote clusters.
Also the Grid Appliance and Grid Portal must have a trust relationship with the CA. For an Appliance make sure to followed the instructions about copying the file:
.globus/simpleCA/globus_simple_ca....tar.gz
If the users get a "Socket Timed Out" message each time that they submit a job, check the /etc/sudoers file on the Appliance and make sure that you have commented out the line that reads:
Defaults requiretty
6. There are numerous different connection problems between the Grid Portal
and an Appliance
□ The Appliance CA must be signed or accepted by the Grid Portal CA, or
whatever CA you are using.
□ Both machine must trust each other.
See the section titled "Configuring your Certificate Authority (CA)" for the details.
7. The Grid Portal cannot retrieve the status of jobs; it always reports a
status of "Unsubmitted"
□ For SGE, you must be running version 6.0 or higher for the status to
appear. You must have $SGE_ROOT and $SGE_CELL define in the
environment, and also have the file named:
$SGE_ROOT/$SGE_CELL/common/reporting.
□ For PBS, you must cross-mount the file system containing PBS's server
log file on the Grid Appliance. You must have $PBS_HOME defined in the
environment and also have the directory named:
$PBS_HOME/server_logs
8. You have two network cards in the Grid Appliance machine and Globus Toolkit is running only at local network IP. The Globus Toolkit needs to be running on the public IP.
The solution is to modify the two files:
$GLOBUS_LOCATION/etc/globus_wsrf_core/client-server-config.wsdd
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
and change them as follows so it runs using the hostname instead of the IP address.
<parameter name="logicalHost" value="appliance.ucla.edu"/>
<parameter name="disableDNS" value="true"/>
9. GridFtp is not working
Check /etc/hosts.allow and make sure it contains a line like this:
globus-gridftp-server: [Portal machine IP]
If you do not see that line, add it.
10. When you try to submit a job to a cluster from the Grid Portal you get the following error:
Job submission failed: Caused By Job Submission Exception:
org.globus.wsrf.container.ContainerException: Container failed to
initialize [Caused by: Configuration file directory './etc' does not
exist or is not a directory or is not readable.]
You must have GLOBUS_LOCATION variables defined in jakarta-tomcat: Look for lines in $CATALINA_HOME/bin/catalina.sh that look like:
# Set juli LogManager if it is present
if [ -r "$CATALINA_HOME"/bin/tomcat-juli.jar ]; then
JAVA_OPTS="$JAVA_OPTS "-Djava.util.logging.manager=
org.apache.juli.ClassLoaderLogManager" "-Djava.util.logging.config.file
="$CATALINA_BASE/conf/logging.properties"
JAVA_OPTS="$JAVA_OPTS "-DGLOBUS_LOCATION=/home/globus/UCPortal/GT4
In the last of these lines replace /home/globus/UCPortal with the full path of your install directory.

