We tried using WHAT on our brand-new Intel xServe running Mac OS X Server 10.4, but it killed and ate the machine. So, after having it reimaging, we started putting some critical test infrastructure pieces in place manually. These are notes on the process.
I searched and searched for my old notes on creating user accounts in Mac OS X from the command line, but I didn’t find them until too late. So, I asked Patrick Carlisle (of Apple) for tips and then also found some old code lingering on our WHAT library. I merged the two scripts together to get this:
# Change these USER_NAME=cat REAL_NAME='Tim Cartwright' USER_ID=16501 USER_SHELL=/bin/bash nicl . create /users/$USER_NAME nicl . create /users/$USER_NAME uid $USER_ID nicl . create /users/$USER_NAME gid $USER_ID nicl . create /users/$USER_NAME realname '$REAL_NAME' nicl . create /users/$USER_NAME passwd '' nicl . create /users/$USER_NAME shell $USER_SHELL nicl . create /users/$USER_NAME home /Users/$USER_NAME nicl . create /users/$USER_NAME _writers_passwd $USER_NAME nicl . create /groups/$USER_NAME nicl . create /groups/$USER_NAME gid $USER_ID nicl . create /groups/$USER_NAME users $USER_NAME nicl . append /groups/wheel users $USER_NAME nicl . append /groups/admin users $USER_NAME mkdir -p /Users/$name ditto '/System/Library/User Template/English.lproj' /Users/$USER_NAME chown -R $USER_NAME:$USER_NAME /Users/$USER_NAME passwd $USER_NAME
We added accounts for cat, roy, and kronenfe using this tool.
I had forgotten this about Mac OS X — from my own machines at that! — that there is no need for the sudoers file, because membership in the wheel group (I think, but maybe admin) gives you full sudo access. So we’re done with user accounts for now.
Mac OS X Server 10.4 has Python 2.3.5 installed at /usr/bin/python and /usr/bin/python2.3, which is good.
I recalled that we did some funky things in the way that we installed Pacman, so I consulted the WHAT module that does so. It’s at what/what/lib/What/Node/Pacman.pm in an SVN checkout of the test framework.
Here are the steps I followed to install Pacman:
Download and unpack Pacman
cd /opt curl -LO http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-3.20.tar.gz tar xzf pacman-3.20.tar.gz rm -f pacman-3.20.tar.gz
Add the executable wrapper script as /opt/pacman-3.20/pacman
#!/bin/sh export PATH=/opt/pacman-3.20/src:$PATH export PACMAN_LOCATION=/opt/pacman-3.20 export PYTHONPATH=/opt/pacman-3.20/src /usr/bin/python $PACMAN_LOCATION/src/pacman $@
chmod 0755 /opt/pacman-3.20/pacman ln -s /opt/pacman-3.20 /opt/pacman
The new machine needs to know about our funky private network, since there is no DHCP. After having Ken update the root/etc/hosts file in our WHAT repository, I copied that file to the Mac:
scp 'email@example.com:/p/condor/workspaces/cat/svn/tests/trunk/root/etc/hosts' /etc/hosts
It turned out that this wasn’t sufficient to make Mac OS X aware of the new host definitions. Ken found some information that said we had to slurped the hosts file into NetInfo:
niload -v -m hosts . < /etc/hosts
In retrospect, I think that that may have been unnecessary. In the man page for lookupd, I found this information (emphasis is mine):
When lookupd searches for information about an item, it queries agents in a specific order until the item is found or until all sources of information have been consulted without finding the desired item. By default, lookupd first queries its cache agent, then NetInfo, then the Directory Services agent. If the item is a host or network, lookupd will query the cache, the Flat File agent, then the DNS agent, then NetInfo, and Directory Services last. The default search order for services, protocols, and rpc protocols uses the cache, then the Flat File agent, NetInfo, and then Directory Services.
I think we may have been able to send a SIGHUP to lookupd and gotten the new values in /etc/hosts. We can try this out next time we change the hosts file. In any case, we can ping other VDT test machines from vdt-macosx4-amd64 now.
I set up a Globus server on vdt-rhas4-ia32 and started it up. Then, from the Mac, I got a grid proxy and tried running a job:
vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork /bin/echo '-n hello world!' GRAM Job submission failed because the job manager failed to open stderr (error code 74)
Alain pointed me at a very helpful troubleshooting page, which suggested trying a basic authentication command:
vdt-macosx4-amd64: globusrun -a -r vdt-rhas4-ia32.cs.wisc.edu GRAM Authentication test successful
Then, Alain pointed me at a Globus troubleshooting page, which — under a section for the error I am getting — suggested running globus-hostname:
vdt-macosx4-amd64: globus-hostname vdt-macosx4-amd64.local
Ken and Ross will work on fixing the self-reported hostname. In the meantime, Alain told me that I can set GLOBUS_HOSTNAME in the environment to get around this.
vdt-macosx4-amd64: export GLOBUS_HOSTNAME=vdt-macosx4-amd64.cs.wisc.edu vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork /bin/hostname t0:p19994: Fatal error: tcp_init(): globus_io_tcp_create_listener() failed /Users/cat/Documents/1.8.1_Globus-Client_cat_test/globus/bin/globus-job-run: line 1: 19994 Abort trap \ /Users/cat/Documents/1.8.1_Globus-Client_cat_test/globus/bin/globusrun -q -o -r \ "vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork" -f /tmp/globus_job_run.cat.rsl.19954
A bit of Googling reminded me that I had not yet set up the host and service certificates.
Alain had sent me an email a while ago with the certificate information:
Certificates are vdt-macosx4-amd64 are located in:
If you prefer the tarball:
Installation is a snap (as root, of course):
mkdir -p /etc/grid-security cd /etc/grid-security scp 'firstname.lastname@example.org:/p/vdt/workspace/grid-security/grid-security-vdt-macosx4-amd64.tar.gz' . tar xzf grid-security-vdt-macosx4-amd64.tar.gz chown -R root:wheel /etc/grid-security chown -R daemon:daemon /etc/grid-security/ldap chown -R daemon:daemon /etc/grid-security/http find /etc/grid-security -name '*cert.pem' | xargs -t chmod 0644 find /etc/grid-security -name '*key.pem' | xargs -t chmod 0400
With the hostname fixed (after a reboot) and the certificates in place, I got a job to run:
vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork /bin/hostname vdt-rhas4-ia32.cs.wisc.edu
vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork /bin/echo '-n hello world!' hello world!
vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-condor /bin/echo '-n hello world!' hello world!
vdt-macosx4-amd64: globus-url-copy file:///Users/cat/Documents/1.8.1_Globus-Client_cat_test/vdt-install.log \ gsiftp://vdt-rhas4-ia32.cs.wisc.edu/tmp/tim-vdt-test.txt vdt-macosx4-amd64: globus-url-copy gsiftp://vdt-rhas4-ia32.cs.wisc.edu/tmp/tim-vdt-test.txt \ file:///Users/cat/Documents/1.8.1_Globus-Client_cat_test/copied-vdt-install.log vdt-macosx4-amd64: diff vdt-install.log copied-vdt-install.log
I had not installed uberftp previously, so I had to do that first.
/opt/pacman/pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:UberFTP -trust-all-caches -v download pac
Then the tests:
vdt-macosx4-amd64: uberftp -P 2811 -a GSI vdt-rhas4-ia32.cs.wisc.edu \ 'cd /tmp; put /Users/cat/Documents/1.8.1_Globus-Client_cat_test/vdt-install.log tim-uberftp-test.txt' 220 vdt-rhas4-ia32.cs.wisc.edu GridFTP Server 2.5 (gcc32dbg, 1182369948-63) ready. 230 User cat logged in. /Users/cat/Documents/1.8.1_Globus-Client_cat_test/vdt-install.log: 12951 bytes in 0.00 seconds. 2728.25 KB/sec vdt-macosx4-amd64: uberftp -P 2811 -a GSI vdt-rhas4-ia32.cs.wisc.edu \ 'cd /tmp; get tim-uberftp-test.txt /Users/cat/Documents/1.8.1_Globus-Client_cat_test/copied-log-2.log' 220 vdt-rhas4-ia32.cs.wisc.edu GridFTP Server 2.5 (gcc32dbg, 1182369948-63) ready. 230 User cat logged in. tim-uberftp-test.txt: 12951 bytes in 0.00 seconds. 2952.13 KB/sec
I had not installed GSI-OpenSSH previously, so I had to do that first. The service is not registered, so I had to start the service myself.
/opt/pacman/pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:GSIOpenSSH -trust-all-caches -v download pac export GSIOPENSSH_PORT=12346 $VDT_LOCATION/globus/sbin/sshd -p $GSIOPENSSH_PORT -e
Then the tests:
vdt-macosx4-amd64: gsiscp -o BatchMode=yes -P 12346 \ vdt-install.log vdt-rhas4-ia32.cs.wisc.edu:/tmp/tim-gsiscp-test.txt vdt-install.log 100% 14KB 13.7KB/s 00:00 vdt-macosx4-amd64: gsiscp -o BatchMode=yes -P 12346 \ vdt-rhas4-ia32.cs.wisc.edu:/tmp/tim-gsiscp-test.txt copied-by-gsiscp.log tim-gsiscp-test.txt 100% 14KB 13.7KB/s 00:00 vdt-macosx4-amd64: diff vdt-install.log copied-by-gsiscp.log vdt-macosx4-amd64: gsissh -o BatchMode=yes -p 12346 vdt-rhas4-ia32.cs.wisc.edu echo foo foo
I started a new client installation, this time of VOMS-Client, which picked up all the bits I needed.
Then I had to create a vomses file for the RHEL 4 machine in glite/etc/vomses:
"VDT" "vdt-rhas4-ia32.cs.wisc.edu" "15000" "/DC=org/DC=doegrids/OU=Services/CN=http/vdt-rhas4-ia32.cs.wisc.edu" "VDT" "40"
After that, a voms-proxy-init seemed to work:
vdt-macosx4-amd64: voms-proxy-init -voms VDT Cannot find file or dir: /Users/condor/execute/dir_12660/userdir/glite/etc/vomses Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973 Cannot find file or dir: /Users/condor/execute/dir_12660/userdir/glite/etc/vomses Creating temporary proxy ............................................... Done Contacting vdt-rhas4-ia32.cs.wisc.edu:15000 [/DC=org/DC=doegrids/OU=Services/CN=http/vdt-rhas4-ia32.cs.wisc.edu] "VDT" Done Creating proxy .................................... Done Your proxy is valid until Tue Aug 14 09:17:29 2007
However, voms-proxy-info failed horribly:
vdt-macosx4-amd64: voms-proxy-info -all Bus error
After a bunch of debugging, I discovered that merely creating /etc/grid-security/vomsdir was sufficient to allow voms-proxy-info to work. This is documented in an LCG Savannah bug ticket.
vdt-macosx4-amd64: sudo mkdir /etc/grid-security/vomsdir vdt-macosx4-amd64: voms-proxy-info -all WARNING: Unable to verify signature! Server certificate possibly not installed. Error: Cannot find certificate of AC issuer for vo VDT subject : /DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973/CN=proxy issuer : /DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973 identity : /DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973 type : proxy strength : 512 bits path : /tmp/x509up_u16501 timeleft : 16:53:18 === VO VDT extension information === VO : VDT subject : /DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973 issuer : /DC=org/DC=doegrids/OU=Services/CN=http/vdt-rhas4-ia32.cs.wisc.edu attribute : /VDT/Role=NULL/Capability=NULL timeleft : 16:53:18
Also, I noticed that the “timeleft” information was too large. I investigated and ultimately submitted another LCG Savannah bug ticket.
To test our new Mac-generated VOMS proxy against a non-Mac server, I installed the entire VDT on vdt-rhas4-ia32. Many extra steps were needed to get the gatekeeper ready to use VOMS, GUMS, and PRIMA for authentication. In the account that follows, I have omitted the details of many errors and missteps I made in getting the server to work.
To be sure that I was using PRIMA and GUMS, I removed my DN from the grid-mapfile. Without that entry, my jobs would not run.
Mostly, I got these steps from our test script, prima.t. The first time through I enabled PRIMA-GT4, which turned out to be a mistake. It replaces one of the conf files with one that names a special copy of the hostcert, called containercert. But, configure_globus_ws is the script that actually creates the containercert, and apparently I hadn’t run it or something. Anyway, it was better to just do the GT2 PRIMA steps:
cp post-install/*-authz.conf /etc/grid-security mkdir -p /etc/grid-security/vomsdir cp /etc/grid-security/http/httpcert.pem /etc/grid-security/vomsdir/`hostname -f`.pem chmod 0644 /etc/grid-security/vomsdir/vdt-rhas4-ia32.cs.wisc.edu.pem
Next up, configuring GUMS. First, I added my own DN as a GUMS administrator. I’m not sure that this was strictly necessary, but it made using the command-line and web tools possible.
tomcat/v55/webapps/gums/WEB-INF/scripts/addMySQLAdmin \ '/DC=org/DC=doegrids/OU=People/CN=Timothy A. Cartwright 949973'
The major change to GUMS was in its gums.config XML file. After a few major headaches, I ended up with this document:
<gums version='1.2'> <persistenceFactories> <hibernatePersistenceFactory name='mysql' description='' hibernate.connection.driver_class='com.mysql.jdbc.Driver' hibernate.dialect='net.sf.hibernate.dialect.MySQLDialect' hibernate.connection.autoReconnect='true' hibernate.c3p0.min_size='3' hibernate.c3p0.max_size='20' hibernate.c3p0.timeout='180' hibernate.connection.username='gums' hibernate.connection.url='jdbc:mysql://vdt-rhas4-ia32.cs.wisc.edu:49151/GUMS_1_1' hibernate.connection.password='ELSUEIGLJFF'/> </persistenceFactories> <vomsServers> <vomsServer name='vomsadmin' persistenceFactory='mysql' baseUrl='https://vdt-rhas4-ia32.cs.wisc.edu:8443/voms/VDT/services/VOMSAdmin' sslCAFiles='/home/cat/1.8.0_VDT_root_macserver/globus/TRUSTED_CA/*.0' sslCertfile='/etc/grid-security/http/httpcert.pem' sslKey='/etc/grid-security/http/httpkey.pem'/> </vomsServers> <userGroups> <manualUserGroup name='admins' description='' persistenceFactory='mysql' access='write'/> <vomsUserGroup name='vomstest' vomsServer='vomsadmin' voGroup='/VDT' matchFQAN='group' acceptProxyWithoutFQAN='true'/> </userGroups> <accountMappers> <groupAccountMapper name='vdttest' accountName='vdttest'/> </accountMappers> <groupToAccountMappings> <groupToAccountMapping name='vomstest' accountingVoSubgroup='vomstest' accountingVo='VDT' userGroups='vomstest' accountMappers='vdttest'/> </groupToAccountMappings> <hostToGroupMappings> <hostToGroupMapping cn='vdt-rhas4-ia32.cs.wisc.edu' groupToAccountMappings='vomstest'/> </hostToGroupMappings> </gums>
I started up some services and ran jobs:
vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-fork /bin/echo '-n hello world!' hello world! vdt-macosx4-amd64: globus-job-run vdt-rhas4-ia32.cs.wisc.edu/jobmanager-condor /bin/echo '-n hello world!' hello world!