One large topic Globus's assumption that the gatekeeper has a shared filesystem. Alain has worked around this in an experimental way by modifying the Condor jobmanager to use file transfer, and it works fairly well. He wondered if the same thing could work with GT4 WS-GRAM. Stu and Jens joined the meeting to talk about it.
After some discussion, we concluded that it's theoretically possible, but it will take hacking similar to what Alain did in GT2/3. Alain will investigate a bit more as he works on integrating GT4 into the VDT, and may recommend some changes to the Globus folks to see how the issue can be solved in the future.
The VDT folks learned some interesting things during this discussion. The pre-WS GRAM requires that the gatekeeper shares a filesystem with the execution nodes. WS-GRAM has a similar requirement, but it actually requires that the GridFTP server shares a filesystem with the execution nodes. In practice, this means the same thing because most people will run the GridFTP server on the gatekeeper node, but they have the option to have it on a different computer. We assumed that people would probably do it on the same one.
We agreed that there might be various levels of support for non-shared file transfer systems. Perhaps jobs specify two things: what files to transfer to the grid ftp server, and what files to transfer to the job. These might be different if some files were pre-staged. But we might also assume that these two lists are the same.
Alain talked about the progression of getting GT4 into the VDT. This week he built GT4 but hasn't yet tested it. The plan is to release 1.3.7 with pre-ws GT4 in "a few weeks". While it should be a smooth upgrade, we need to do a fair amount of testing (can GT2 and GT4 clients submits job? Will VOMS still compile against GT4? etc.) Then we'll add support for GRAM-WS (and therefore RFT) in VDT 1.3.8, which will come out "soon" after 1.3.7.
Right now RFT (which is required for GRAM-WS) requires Postgres, but Globus 4.0.1 will add support for MySQL, which the VDT may take advantage of. Globus 4.0.1 should be released in a couple of weeks.
Jens gave some advice for configuring Globus 4. He recommends 100-200 RFT threads, he points out that there are four PORT_RANGE variables to set now if you are using a firewall (the old two for TCP, and two new ones for UDP. UDP is used for some low-priority notifications). He also recommends starting the Globus Java containers with more memory than the default 128MB max heap size. Probably 384MB, but 256MB might work. Jens is going to write up his suggestions in more detail, and send them to Globus folks and VDT folks.
Alain noticed the mention of UDP and mentioned two stories from Condor experience. Condor uses UDP for an execution compute to notify the central manager (which does matchmaking) that it exists. It's okay if some UDP messages are lost, because we won't eliminate a computer from our list unless we miss several messages. We have encountered two problems. First, a site in Korea updating to a site at Fermilab was never able to get UDP packets out Monday-Friday, but weekends worked. For them, we had to add TCP updates. Second, some firewalls (like the Windows XP SP2 firewall) seem to have small fragmentation buffers, so we lost packets until we handled fragmentation ourself.
John Weigand asked about how to update VDT components like VOMS and GUMS that rely on data stored in a MySQL database. We talked about how this data is now stored in $VDT_LOCATION/vdt-app-data, so it can be easily transferred to a new installation. Just create the directory for the new installation, copy over vdt-app-data, and install. In VDT 1.3.5, not all of the VOMS configuration was in vdt-app-data, so some things will need to be transferred by hand.
John also asked about how to write Pacman files for VOMRS (apparently pronounced V-O-M-R-S). Alain pointed him at some sample Pacman files and offered more help if he would like it.