Note: This version of the VDT (1.5.0) is no longer supported. Feel free to look through the documentation and install it, but we cannot guarantee support for it. The current stable release is 2.0.0.
Managed Fork Jobmanager
What is the Managed Fork Jobmanager?
Unlike the standard fork jobmanager that simply forks off jobs on demand, the managed fork jobmanager submits
local universe jobs to a Condor queue. Jobs still run on the headnode, but this gives site administrators the power to do things like limit the number of jobs running at a time or grant preferential treatment to one group or another.
Installing
Use Pacman to install the
Globus-ManagedFork-Setup
package. This will install Condor, unless you have set some
environment variables to specify a pre-existing Condor installation.
More information
Using Condor
The managed fork job manager uses Condor to manage the fork jobs. If
you are already using Condor to manage your batch system, this will be
installed in the same way and will use the same Condor
schedd
to manage the jobs. That is, if you look at your Condor queue, you
will see both your batch jobs and your managed fork jobs at the same
time. The jobs can be distinguished based on the Condor universe
because the managed fork jobs are in the local universe. You can look
at just your local universe/managed fork jobs with the folllowing command:
> condor_q -constraint "JobUniverse == 12"
-- Submitter: chopin.cs.wisc.edu : <128.105.121.21:40759> : chopin.cs.wisc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
24.0 roy 3/2 16:47 0+00:00:01 R 0 0.0 simple.van Hello 3
You can see the jobs that are not local universe/managed fork jobs
with the following command:
> condor_q -constraint "JobUniverse != 12"
-- Submitter: chopin.cs.wisc.edu : <128.105.121.21:40759> : chopin.cs.wisc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
23.0 roy 3/2 16:46 0+00:00:05 R 0 1.3 simple.van Hello 3
Other batch systems
If you are using a batch system other than Condor, the managed fork
job manager will not interfere. It will only run Condor on your
gatekeeper computer, and will not interfere with job submissions to your
batch system. Condor will be running on the gatekeeper node, but it
should not cause significant load or interfer with other processes.
Replacing the default fork with the managed fork
By default the managed fork jobmanager is available via jobmanager-managedfork. Replacing the standard jobmanager-fork with a managed fork jobmanager can be done easily with configure_globus_gatekeeper:
> vdt/setup/configure_globus_gatekeeper --managed-fork y
Example configurations
By default, the managed fork jobmanager will behave just like the fork jobmanager. If you wish to restrict it you need to modify your local Condor configuration. If you're using Condor from the VDT this can be done by editing
$VDT_LOCATION/condor/local.<hostname>/condor_config.local. Please note that the following example configurations will only work in Condor 6.7.15 or greater; older versions of Condor will run all local universe jobs as soon as they are submitted.
Only allow 20 local universe jobs to execute concurrently:
START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20
Set a hard limit on most jobs, but always let grid monitor jobs run (strongly recommended):
START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20 || GridMonitorJob == TRUE