Note: This web site is only kept up to date for OSG Software 1.2 (VDT 2.0.0). If you are looking for information for the most recent release, the RPM-based OSG Software 3.0, please see the OSG documentation web site

VDT Office Hours 2 February 2006

Happy Groundhog day!


Gridmonitor problem

People have become aware of a problem with the Condor-G gridmonitor: If someone submits a job from Condor 6.7.8 or ealier (this includes OSG 0.2.x) to a gatekeeper running GT 4.0 pre-web service, the Condor-G gridmonitor won't work. Note that it does not matter which batch system is used on the gatekeeper or which version of Condor might be used on the gatekeeper: it only matters which version of Condor is used for submissions.

The result is that the load on the gatekeeper is higher than expected because the grid monitor is not used.

This has two possible fixes:

  1. People submitting jobs can upgrade to a newer version of Condor. If they prefer, they can upgrade just the gridmonitor.
  2. Site administrators can fix the problem by setting permissions on the Globus tmp directory to make it world-writable. We are not yet sure of the implications of this change: Stu Martin from Globus is looking into it and will report back if it's a safe change or not.

Pre-WS scalability enhancement

As part of a larger effort to get the VDT and the TeraGrid software stacks to use identical versions of Globus and other software, we would like to consider using an enhancement to pre-web service GRAM that TeraGrid is using.

It is a back-port of a small piece of web-services GRAM. It changes the pre-web services jobmanagers so that they do not constantly query the batch system to find the state of the jobs, but instead consult the batch system logs. It is claimed that this reduces the load of the jobmanagers by ninety percent.

There are two possible downsides to this.

  1. The batch system logs must be accessible by the gatekeeper node. This is already true in WS-GRAM, but it was not true in pre-WS GRAM. Some people (like Mark Green) prefer not to have the logs accessible via NFS or accessible on the gatekeeper. He is willing to make it work on his system, and no one else complained about it.
  2. It requires an extra process to run that translates the job logs into something the jobmanager can understand. This process must be able to read the logs, so for some people it will need to be setuid.
Overall, there were no real objections to this modification and people agreed that it would be good. Alain will bring it up at the OSG integration meeting today to run it by a larger crowd.