Hadoop Distributed File System (HDFS) is a scalable reliable distributed file system developed in the Apache project. It is based on map-reduce framework and design of the Google file system. The VDT distribution of Hadoop includes all components needed to operate a multi-terabyte storage site. Included are an SRM interface for grid access, GridFTP-HDFS as transport layer, and a FUSE interface for localized POSIX access.
The VDT packaging and distribution of Hadoop is based on YUM. All components are packaged as RPMs. Two YUM repositories are available: (a) Stable repository for wider deployments and production usage; (b) Testing repository for limited deployments and pre-release evaluation.
Please note that this distribution is independent of the Pacman packaging of the VDT: it is separately versioned, and separately packaged. We expect future releases to eventually be common with the rest of the VDT, as the "rest" of the VDT begins to be packaged as RPMs. For now, the VDT distribution of Hadoop is distinct from the rest of the VDT.
The stable YUM repository is enabled by default through the osg-hadoop RPM, and contains the 'golden release' supported by OSG for LHC operations.