Note: This version of the VDT (1.10.1) is supported, but is not our latest stable release. The current stable release is 2.0.0.
Congratulations! You've installed the VDT. That wasn't so hard now, was it? If it was, please let us know. We're always working to make the VDT easier to install and your feedback is essential.
After the VDT install completes there are still a few server components left unconfigured. Take a quick look at this list and handle anything that applies to you. You almost certainly do not need to do everything in this list--just do the ones that are relevant to you.
To learn more about the software you've just installed take a look at the documentation for VDT 1.10.1.
After you install the VDT, none of the services are running. To start
the services, you need to run vdt-control:
> cd $VDT_LOCATION > . setup.sh > vdt-control --on
This will install each service into a system-wide location (such as root's crontab, /etc/xinetd.d, or /etc/init.d). For programs that are run via an init script, the script is run.
More documentation on vdt-control
Runas_Alias GLOBUSUSERS = user1, user2
globus ALL=(GLOBUSUSERS) \
NOPASSWD: /opt/vdt/globus/libexec/globus-gridmap-and-execute \
-g /etc/grid-security/grid-mapfile \
/opt/vdt/globus/libexec/globus-job-manager-script.pl *
globus ALL=(GLOBUSUSERS) \
NOPASSWD: /opt/vdt/globus/libexec/globus-gridmap-and-execute \
-g /etc/grid-security/grid-mapfile \
/opt/vdt/globus/libexec/globus-gram-local-proxy-tool *
Note that you must replace 'user1, user2' with a list of
comma-separated user id names. If you prefer, you can allow Globus
to sudo to all users except root by substituting the following
Runas_Alias line for the one above:
Runas_Alias GLOBUSUSERS = ALL, !root
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-WS-Condor-Setup
or
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-WS-LSF-Setup
or
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-WS-PBS-Setup
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Condor-Setup
or
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-LSF-Setup
or
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-PBS-Setup
or
> pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-SGE-Setup
> ls -l /etc/grid-security/ldap total 12 -rw-r----- 1 daemon daemon 1194 Feb 2 13:34 ldapcert.pem -rw-r----- 1 daemon daemon 1351 Feb 2 13:34 ldapcert_request.pem -r-------- 1 daemon daemon 887 Feb 2 13:34 ldapkey.pem > ls -l /etc/grid-security/http total 12 -rw-rw-r-- 1 daemon daemon 1193 Feb 2 13:34 httpcert.pem -rw-r--r-- 1 daemon daemon 1379 Feb 2 13:34 httpcert_request.pem -r-------- 1 daemon daemon 887 Feb 2 13:34 httpkey.pem
$GLOBUS_LOCATION/setup/globus/setup-simple-ca (This will ask you several questions, including the name of your CA and your passphrase) $GLOBUS_LOCATION/setup/globus_simple_ca_HASH_setup/setup-gsi -default (The HASH will be replaced with the hash for your CA--the first command will print it out)For more details, see: Globus's Simple CA directions
If you are running PBS or LSF and your site has multiple queues, user's can specify which queue they wish to use when they submit jobs. For GRAM 2 (pre-web services) jobs, they specify it in their RSL with a string like:
...(queue=longjobs)...Globus verifies that users specify correct queue names and rejects jobs if they do not. If you add or delete queues, you need to tell Globus to rebuild its list of queues.
For PBS:
You can rebuild the list of queues with the following
commands:
> cd $VDT_LOCATION > . setup.sh > globus/setup/globus/setup-globus-job-manager-pbs
You can verify that the queues are listed by looking at:
$VDT_LOCATION/globus/share/globus_gram_job_manager/pbs.rvfYou do not need to restart any processes after you do this.
For LSF:
You can rebuild the list of queues with the following
commands:
> cd $VDT_LOCATION > . setup.sh > globus/setup/globus/setup-globus-job-manager-lsf
You can verify that the queues are listed by looking at:
$VDT_LOCATION/globus/share/globus_gram_job_manager/lsf.rvfYou do not need to restart any processes after you do this.
The problem
GRAM 2 (a.k.a pre-web services GRAM) has a known problem: every request to submit a job will create a new job manager process, and this process will poll the underlying batch system at ten second intervals. When there are a lot of job managers, the computer can be overwhelmed.
One good solution for this problem is for clients to use Condor-G to submit jobs, and to make sure the Condor Grid Monitor is turned on. If users have installed Condor-G from the VDT, it is turned on. Grid Monitor details from the Condor 6.8 manual
However, the Condor-G grid monitor is a partial solution for several reasons. First of all, it requires participation from all clients. Any single client that doesn't use Condor-G or doesn't use the grid monitor can bring GRAM 2 to its knees. Second, there is a single grid monitor per-user so if a lot of individual clients submit jobs there is a still a problem. Third, there are rare occasions where the grid monitor fails to work correctly. Fourth, the grid monitor still relies on the job manager to work correctly, and restarts it when a job finished. If many jobs finish at the same time, there can be a lot of job managers running.
A solution
A solution is to limit how many GRAM 2 job managers can be running at a time. Unfortunately, Globus does not provide a way to do this. We have patched Globus in the VDT to give you a method. Before we explain it, we will explain how a job manager is created.
When a user submits a job, the Globus gatekeeper handles the authentication and authorization of the user. The gatekeeper is started by the standard xinetd process: one gatekeeper is started for each connection. As soon as the gatekeeper has authorized the user successfully, the gatekeeper starts a new process which becomes the job manager. (For you developer types, the gatekeeper uses fork() and exec()). After it is started, the gatekeeper exits.
Xinetd has facilities to limit how many processes are created. However, because the gatekeeper exits after the job manager is created, xinetd will not limit the total number of job managers.
Our patch to the gatekeeper allows xinetd to control how many job managers are created. To use the new behavior, you need to do two things: edit the gatekeeper configuration and edit the xinetd configuration.
Edit the gatekeeper configuration
To edit the gatekeeper configuration, you need to add a
single line to $VDT_LOCATION/globus/etc/globus-gatekeeper.conf.
There are three options, and the first two will allow you to
rate-limit the job managers with xinetd.
| Option | Meaning |
-launch_method dont_fork |
After authorization, the gatekeeper becomes the job manager,
so you can rate limit the job managers with
xinetd. (Technically speaking, the gatekeeper just does an
exec() instead of a fork()/exec() combination.) This does
have one interesting side effect. The globus-gatekeeper.log
file will no longer have the following message:
PID: 19792 -- Notice: 0: Child 19793 startedbut will instead have the following message: Starting child 12345 |
-launch_method fork_and_wait |
This option leaves the gatekeeper running as long as the job
manager runs. This is slightly safer than the
dont_fork option because it does not change the
gatekeeper's log file at all. The downside is that there is a
gatekeeper process for each job manager, but these should have a
small impact because they are not doing anything interesting.
|
-launch_method fork_and_exit |
This is the original behavior that does not let you rate-limit the job managers. This is the default. |
Edit the xinetd configuration
Edit the xinetd configuration in three steps:
vdt-control --off globus-gatekeeper
$VDT_LOCATION/etc/services/xinetd-globus-gatekeeper.
Change the following line: instances = UNLIMITEDto something like this:
instances = 100
vdt-control --on globus-gatekeeper