The Particle Physics Data Grid - Caltech Effort

Participants:

Julian Bunn
Harvey Newman
James Patton
Jim Pool

Application Information

Project Name

Particle Physics Data Grid

Application Description

PI's

Sites Involved

site

ISP/Network

application/system component

Networks Used

ESnet, NTON, Abilene

Bandwidth Requirements

Latency Requirements

QOS Requirments (bandwidth, jitter, etc.)

Other (multicast? multiple streams? etc)

Systems used and their locations (e.g.: HPSS, T3E, PC cluster, Cave, etc.)

site	systems used

WAN POP Contact (1 per site)

LAN Contact (1 per site)

Application contact (persons responisible for making the applications work well in an NGI enviromnent)

Computing Infrastructure at Caltech

Software
	PPDG Middleware
	PPDG System Overview
	PPDG Use Case Idea
Hardware
	GIOD Testbed
Networks	LAN Connectivity in CACR
	WAN Connectivity to CACR
	PPDG Network Diagram

Overview of the Project

The Particle Physics Data Grid has two long-term objectives. Firstly the delivery of an infrastructure for very widely distributed analysis of particle physics data at multi-petabyte scales by hundreds to thousands of physicists and secondly the acceleration of the development of network and middleware infrastructure aimed broadly at data-intensive collaborative science.

The specific research will design, develop, and deploy a network and middleware infrastructure capable of supporting data analysis and data flow patterns common to the many particle physics experiments represented among the proposers. Application-specific software will be adapted to operate in this wide-area environment and to exploit this infrastructure. The result of these collaborative efforts will be the instantiation and delivery of an operating infrastructure for distributed data analysis by participating physics experiments. While the architecture must address the needs of particle physics, its components will be designed from the outset for applicability to any discipline with large-scale distributed data needs.

Among the hypotheses to be tested are these:

that an infrastructure built upon emerging network and middleware technologies can meet the functional and performance requirements of very wide area particle physics data analysis;
that specific data flow patterns, including sustained bulk data transfer and distributed data access by large numbers of analysis clients, can be supported concurrently by common middleware technologies exploiting appropriate emerging network technologies and network services;
that an infrastructure based upon these emerging technologies can be compatible with commercial middleware technologies such as object databases, object request brokers, and common object services.

Caltech Expertise

High Energy Physics

The experimental high energy physics program at Caltech began with the construction of the Caltech Electron Synchrotron, built and used by R. Bacher, M. Sands, R. Walker and A. Tollestrup. Now Caltech, like other university groups, performs its research through participation in large experiments on major facilities away from the campus.
The Caltech administration has traditionally given strong support to the high energy program, and within physics, activities in astrophysics and fundamental particle physics have had the highest priority. Institutional support has been given in the form of faculty appointments, special grants for developing or purchasing equipment, start-up funds for new faculty, funds for visitors, flexible teaching arrangements, special support for postdoctoral fellows, substantial support for computing, an upgrade of our electronics shop, forward financing for equipment projects, etc.

The HEP experimental program is made up of an active group of faculty, a strong post-doctoral research staff and a number of very bright graduate students. The experimental activities consist of a mix of productive ongoing physics, plus significant effort toward future upgrades or new projects. The ongoing program is centered on detectors at Beijing, CERN, Cornell, and the Gran Sasso. In addition, the department is deeply involved in the SLAC B factory detector, plus other activities toward future experiments.

The GIOD Project

The GIOD project, a joint effort between Caltech, CERN and Hewlett Packard Corporation, has been investigating the use of WAN-distributed Object Database and Mass Storage systems for use in the next generation of particle physics experiments. In the Project, we have been building a prototype system in order to test, validate and develop the strategies and mechanisms that will make construction of these massive distributed systems possible.

We have adopted several key technologies that seem likely to play significant roles in the LHC computing systems: OO software (C++ and Java), commercial OO database management systems (ODBMS) (Objectivity/DB), hierarchical storage management systems (HPSS) and fast networks (ATM LAN and OC12 regional links). We are building a large (~1 Terabyte) Object database containing ~1,000,000 fully simulated LHC events, and using this database in all our tests. We have investigated scalability and clustering issues in order to understand the performance of the database for physics analysis. These tests included making replicas of portions of the database, by moving objects in the WAN, executing analysis and reconstruction tasks on servers that are remote from the database, and exploring schemes for speeding up the selection of small sub-samples of events. The tests touch on the challenging problem of deploying a multi-petabyte object database for physics analysis.

So far, the results have been most promising. For example, we have demonstrated excellent scalability of the ODBMS for up to 250 simultaneous clients, and reliable replication of objects across transatlantic links from CERN to Caltech. In addition, we have developed portable physics analysis tools that operate with the database, such as a Java3D event viewer. Such tools are powerful indicators that the planned systems can be made to work.

Future GIOD work includes deployment and tests of terabyte-scale databases at a few US universities and laboratories participating in the LHC program. In addition to providing a source of simulated events for evaluation of the design and discovery potential of the CMS experiment, the distributed system of object databases will be used to explore and develop effective strategies for distributed data access and analysis at the LHC. These tests are foreseen to use local, regional (CalREN-2) and the I2 backbones nationally, to explore how the distributed system will work, and which strategies are most effective.

Center for Advanced Computing Research

CACR is dedicated to the pursuit of excellence in the field of high-performance computing, communication, and data engineering. Major activities include carrying out large-scale scientific and engineering applications on parallel supercomputers and coordinating collaborative research projects on high-speed network technologies, distributed computing and database methodologies, and related topics. CACR's goal is to help further the state-of-the-art in scientific computing.

Current Major Computational Resources at CACR

A 16-node, 256-cpu Hewlett-Packard Exemplar
A 512-node Paragon
A 12-node IBM SP2
Naegling: Caltech's Beowulf System
330-Terabyte capacity archival storage with HPSS

The Particle Physics Data Grid - Caltech Effort

Participants:

Application Information

Computing Infrastructure at Caltech

Software

Hardware

Networks

Overview of the Project

Caltech Expertise