Disclaimer: This document is first-cut and VERY DRAFTY! Many of the sections are just placeholders for information to follow: lack of content in a particular section does not relate to the importance of that topic. The document is based on material provided by: Brian Carpenter, Dave Foster, Sverre Jarp, Chris Jones, Les Robertson, George Smyris, .... and on material from
  • the Libraries for HEP Computing working group home page.
  • the RD45 home page.

    It is foreseen that CN will organise a set of Technology Tracking Teams shortly, whose mandate will be to prepare detailed prediction documents that can be referenced by the Computing Technical Proposals of the LHC experiments. The teams will cover (at least) the topics outlined in this document.

  • Introduction

    HEP computing is no longer at the leading edge of the field (except perhaps in some special DAQ areas). Commodity computing for the games and home PC market is increasingly characterised by the need for fast and high capacity disks (software complexity, multimedia data), fast CPUs (interactivity, audio/video streaming), fast and plentiful memory (ditto), fat network pipes (WWW, remote interaction), high resolution screens (games, desktop publishing), etc.. These commodity needs are driving down prices, and driving up performance, i.e. exactly in the directions we need.

    On the other hand, the commodity market is geared towards individual users today (although there is a growing trend towards cooperative computing), whereas HEP needs are (should be) focussed on World-wide interworking of computing devices. The most important aspect of computing to concentrate on developing for the LHC era is thus networking between the collaborating institutes. This will require both technological and financial advances.

    Perhaps another area that is "special" is that of data storage, or rather, information storage and retrieval. Although LHC event volumes appear staggering today, they may not be so special when compared with future needs in commercial digital multimedia (digital masters for movies etc.).

    In this document we should focus on both what in the market will sell well and therefore be plentiful for HEP, and what we perceive as being fringe requirements that are likely to throttle down deployment of HEP facilities ...


    Software


    Common Software Libraries for HEP

    The CERN Program Library will rapidly move to a completely new scheme, characterised by:
  • Adherence to industry standards (de facto or de jure)
  • Modular Components
  • Out-sourcing of first-line support and installation
  • Concentration on HEP specific problem solutions
  • The new Program Library, tentatively called LHC++ (for Libraries for HEP Computing ++), will thus comprise layers:
  • Physics applications and libraries. These include GEANT4 (the RD44 Project), JETSET, ISAJET and PYTHIA.
  • ODMG (Object Database Management Group) compliant Object Database Management Systems, and associated applications
  • Graphics and data visualisation and applications. In particular we target OpenGL, OpenInventor and Explorer as key software toolkits in this area. OpenGL is identified as being the most likely future standard for 3D graphics. OpenInventor is an OO toolkit for developing interactive 3D graphics applications, and is the basis for the VRML (Virtual Reality Modelling Language) standard. Explorer tools include those from SGI (now marketed by NAG) and IBM (Data Explorer). We target NAG Explorer as it is based on OpenInventor and OpenGL.
  • Statistical and mathematical libraries. We have identified the Numerical Algorithms (NAG) double precision Fortran libraries with C++ wrappers that allows the library to be used from a C++ application, LAPACK++ (a partial rewrite and partial encapsulation of the well known industry standard Linear Algebra library), and some parts of CLHEP (those not obsoleted by the standard C++ libraries), with additional parts (for example Pseudo Random Number generator classes) to be provided.
  • Standard C++ libraries.
  • Implicit in this model is the use of C++ in the LHC collaborations. The areas in the above list that are not industry standards are exactly those areas where HEP has specific requirements.

    Operating Systems

    We believe that there will be a convergence towards mainly two OS: Windows/NT (or its successor) for the desktop and a "standard" Unix for high-end specialised servers. Chris: World-wide revenues from these two ends of the market are expected to be the same in 2000 (high volume, low cost for NT, low volume, high cost for Unix). Dave: Revenues from NT will exceed those from Unix in 1998... by 1997 NT will already have 30% of the Web server market (Unix 50%) ... and the NT growth rate is phenomenal. The importance of OS is questionable as emphasis is increasingly put on the applications in the commodity market.

    Aside: We lack any evaluation of Windows/NT as a viable desktop for Physicists.

    DataBases

    There are two major categories of database that will be used in LHC era:

    Object Orientated

    In High Energy Physics, we typically deal with things such as histograms, calibration data, detector geometry descriptions, production control information, meta-data concerning collections of events and, of course, event data itself. In today's environment, we use various data structures to represent the above information and data structure managers to handle these data structures both in memory and on persistent storage. In an Object Oriented environment, these data structures and the algorithms that act on them would be replaced by objects. Although modern programming languages provide many powerful facilities for managing and manipulating objects, an area that is typically weak is that of object persistence. In the commercial world, solutions such as Object Oriented Databases are emerging which combine object oriented features with traditional database facilities, including the provision of persistence. An important consideration regarding the HEP environment is that of scale. The exact data volumes that will be recorded at the LHC are still uncertain, but event data in the Petabyte range (10**15 bytes) and calibration data in the 100 GB range (10**11 bytes) is currently foreseen. This is dramatically different to existing domains to which OO techniques have been applied and is expected to have implications on the suitability of commercial solutions and on the design and implementation of any HEP specific solution.

    Nevertheless, commercial products are now starting to offer many of the features that are required for HEP persistent storage systems and it is important that we understand their advantages and limitations. Current research in other HEP software projects indicates that a suitable architecture is appearing based on emerging "standards" (often de-facto, rather than de-jure), but that the components themselves may not be capable of supporting HEP's requirements in terms of scale and/or performance. We strongly emphasize the need to adopt the interfaces outlined by the various standards so that we can take full advantage of standard components for a persistent object data manager (PODM) should they exist or come to market between now and the running period of LHC.

    Relational

    Probably will still have a role in Management Information Services (MIS).

    Distributed Objects

    The distinction between objects and components is purposefully left fuzzy in this document.

    We predict a distributed application environment where distributed components which sit close to the data but allow operations on the data by client requests from remote devices. This environment has the benefit that components are only invoked when actually required by the end user's application. There are implications here on licensing costs: as an example we can imagine that it will no longer be required for a user to install a local copy of Word, the user will instead call up the word component running on the departmental or institutional server. This model has benefits in problem areas such as where to store the collaboration's documents whilst allowing many different authors access to the document database.

    Careful attention has to be given to the placement of the collaboration's components in the network. The following factors must be considered:

  • fault tolerance (of particular importance in a networked environment)
  • data locality and efficiency of access (e.g. database queries should be made close to the data, but the results should be available to the client in a timely fashion)
  • performance
  • ownership/licensing (ownership of the computing resource, licensing for commercial component software)
  • end user authentication (e.g. Kerberos)
  • In the last category, we see the possibility of "renting" access to a component on the network.

    Distributed FileSystems

    Wide area network distributed filesystems are a niche market. AFS is still not widely supported, and DFS is painfully slow in arriving. The lack of a properly supported global filesystem threatens to inhibit World-wide collaboration in HEP in the long term, unless other solutions can be found that allow transparent access to HEP data without the concept of a global file system. One such solution, based on distributed objects, has already been described above.

    In summary, distributed file systems have had their chance: we are sceptical that they will be fully deployed in the future.

    Hierarchical Storage Management Systems

    Will become a fringe requirement as we move to distributed components. Maybe needed at CERN, but the requirement for file-based storage management is obsoleted if use of OO databases is widespread.

    Browsers

    Advent of Java, distributed components etc.. Some believe that the Operating Systems of the future will be browsers: one will always work in the context of the browser. Others say that browsers will become the "DOS" of the 90's: a limited single-threaded environment with severe restrictions on usability. Browsers are in any case constrained to run on a particular OS, and make use of the OS functions.

    Visual Programming

  • Object Paradigm (MS Visual x, Borland Delphi)
  • Data Flow Paradigm (Explorer)
  • Traditional (PAW)
  • Self Documenting Development Environments

    Collaboration-wide programming should be governed by the following factors:
  • Keep complexity under control
  • Develop applications within available manpower
  • Focus on application usability
  • Enforce modularity and loose coupling
  • Formal methods are in general too heavyweight. They cut productivity and are a source of confusion. The Microsoft approach is to split the application domain into components. The methodology is then concerned with the linking together of components. Component development conventions and rules are stored in a common database. The component development tools then help to enforce the adoption of the conventions and rules by integrating them in, for example, Class Wizards. One benefit that comes for free with this approach is the possibility of enforcing correct levels of code documentation at development time.

    It is crucial that conventions and rules be applied in as lightweight and user-friendly way as possible, and certainly not as an after-the-event activity undertaken by a code librarian, which provokes all sorts of practical and sociological problems...

    Standards

  • Programming Languages (C++, F90 ...)
  • Graphics (OpenGL, OpenInventor, VRML, PostScript)

    Industry graphics driven by domains like movies, Computer Aided Design (CAD), games.

    HEP is moving away from home-grown standards (HIGZ etc.) towards industry standards. OpenGL already "free" with Windows/NT and Irix. Available (at license cost) for AIX, Solaris, OSF. Available shortly for HP/UX etc.. (Virtual reality Modelling Language) VRML and OpenInventor consistent with RD44 plans, amongst others. Open Inventor chosen by BaBar experiment. X/Motif unlikely to persist beyond next 5 years. PostScript de facto standard for print ready formats, unlikely to be superceded in short/medium term. GKS now abandoned.

  • Query Languages (SQL, Objectivity? ...)
  • Software Licenses

    Today, everyone expects to have to buy software licenses for compilers and other OS-dependant tools for their workstation. The average license costs are perhaps around 2000 CHF per workstation, with yearly maintenance of around 200 CHF.

    Aside: A move towards Windows/NT will result in a sharp drop in commercial software costs: occasionally a factor 2 for the "same" product.

    As explained elsewhere in this document, we advocate a move towards commercial, industry standard software where appropriate for HEP, so that, in the future, the number of licenses required will increase from just compiler licenses to include licenses for standard libraries and application development tools. This naturally involves a financial committment, both for licenses and for support. There is real cost saving if this route is followed, simply because it frees manpower from supporting home-grown software where better commercial software exists.

    We estimate that the LHC collaborations should each foresee yearly fees of around 100 kCHF for collaboration-wide software licenses and software maintenance.


    Hardware


    Devices used directly by Physicists

    Computing Devices

    We identify the following categories of devices that will be used by physicists for their everyday computational tasks.

  • Portable and Desktop

    Relative processing power = 1. Relative disk capacity = 1.

    The distinction between portable and desktop will blur. Improved screen technology important: currently too bulky, too power hungry, poor resolution. Our prediction for a "good" PC in 2005 is a machine with 2 processors of 500 CERN Units each (2000 dollars), 2.6 GBytes of memory (1000 dollars) and a 100 GByte disk (less than 1000 dollars). We justify this prediction in the sections that follow.

  • Departmental

    Relative processing power = 10. Relative disk capacity = 10.

    The portable or desktop device is "connected" to the departmental device for sharing the departmental facilities. The departmental device connects to the institutional device for laboratory wide facilities.

  • Institutional

    Relative processing power = 100. Relative disk capacity = 100.

  • National and Continental ?
  • Aside: We have no feeling for how applications will make best use of the GByte memories that will be available at this time. The opportunities for caching whole event samples are evident ...

    Interaction Devices

    Interactivity will continue to be a "hot potato" in the next few years. The Games market is what drives the technology, and is seen as a multi billion dollar opportunity in the sector. Sound input will only become really useful when the host can "converse" with the user pseudo-intelligently.
  • Mice, Pens, Tablets, TouchPads, TouchScreens, Microphones, (Brain Adapters)
  • Video (Graphics cards, Frame grabbers)
  • Sound (Integrated sound/modem)
  • Scanners
  • Virtual Reality - Engineering etc.

    Aside: We hope that in the not too distant future physicists will be able to sit around and view a fully rendered 3D image of events in a detector. The "overlaying" of virtual scenes on real world views like this we consider to be of more use than "total immersion" virtual reality (visors, full-body suits etc.).

  • Processors

    General consensus seems to be that availability of "enough" processing power will not be a problem for LHC computing.

    Processor powers are doubling every 1.5 years. Today's average PCs are around 10 CERN Units: at this rate, by 2005, they will be around 600 CERN Units. Probably a conservative estimate. As the vast majority of people in each LHC experiment will have desktop devices, this represents a significant compute capacity for each experiment by itself.

    Technological factors governing microprocessor evolution include: number of gates (transistors) per unit volume, maximum clock frequencies, lithography methods (Optical, DUV, X-Ray), interconnects (insulator thickness, running down to 100 Angstroms) and power dissipation. The commercial factors include the raw cost of materials and the cost of fabrication plants. The yield of satisfactory chips is also in the equation.

    In our prediction of where processor technology will be early in the next Millenium, we have assumed that CERN Unit ratings will double at least every 2 years due to the use of Super Scalar, Very Large Instruction Words, Massively Parallel or MT (?).

    Probably unimportant which of the following categories will dominate in the market, although it seems unlikely that MPP will go anywhere (except perhaps in the area of parallel database query engines):

  • CISC
  • RISC - still better at floating point than CISC (is floating point performance of major importance to us?)
  • SMP
  • MPP
  • By 2008 we predict processors will operate at 1.2 GHz, with 20-way parallel instruction issue. Such processors will rate at about 2000 CERN Units at a cost of around 1 dollar per CERN Unit.

    Memory

    Memory prices are rather stable. This is despite technology advances permitting higher gate densities in silicon, and is probably due to the heavy requirement for memory resources typical of newer operating systems and applications.

    By 2008 we predict feature sizes of 0.1 microns, 20k million gates per chip, and prices of around 0.1 dollars per MByte (around 250 times cheaper than 1996 prices).

    Memory is a potential bottleneck in the long term, since there are fundamental limits to the gate density. GaAs memories are technically difficult to produce (materials science problem), but maybe offer a solution in the long term.

    Bus Architectures

    The development of bus architectures tends to be driven towards satisfying the I/O speeds of end devices.

    Devices for Data Storage

  • Magnetic Disks - Highly cost-effective. Trends are very well extrapolated into the next millenium. Miniaturisation for portable devices is driving densities and rotation speeds higher. Chris: Capacity per dollar is roughly quadrupling every four years. Sverre: Capacity per dollar is roughly tripling every two years. Desktop/portable PCs will thus sport local devices of at minimum 20 GB by 2005. Standard disk form factors will move from 3.5" to 0.9". Access speeds will achieve 200 MBytes/sec by 2008.
  • Optical Disks (RO and RW)
  • Magnetic Tapes (Digital and Analogue) - This a potential problem area as the market is very fringe. If online systems expect to store directly onto magnetic tape, then some major developments need to be made in densities, recording methods, R/W speeds and/or tape striping. There is doubt that current technology (suitably extrapolated) could satisfy the ALICE DAQ requirements, for example (very intense data rates for short periods).
  • Networking Components

    One of the most difficult areas in which to predict trends. Complicated by peculiar national PTT politics and tariff strategies, particularly in Europe. One area in which cost of services is likely to be the factor limiting what can be achieved for LHC computing.

    The key technology is ATM (Asynchronous Transfer Mode). The standard data rates available are 155 Mbits/sec and 622 Mbits/sec, and, later, 2.4 Gbits/sec and 10 Gbits/sec. (We expect to see 25 Mbits/sec desktop ATM interfaces for 150 dollars in 1996.) This is a switching technology which allows point-to-point applications an assigned data rate. The advantage is that the available bandwidth can be shared between the applications using the ATM according to the needs of each. The ATM market is already highly bouyant, and we predict major growth of ATM networks into the next Millenium.

    CERN network backbone probably at least 622 Mbits/sec ATM by 2005. Although there are 155 Mbits/sec desktop interfaces available, there is little or no current demand for them. There is no market pressure at the moment for anything like 622 Mbits/sec to the desktop, and some say that 10 Mbits/sec switched is quite enough.

  • Switches
  • Routers
  • Fibres
  • Satellite for developing countries
  • GSM (Grindingly Slow Modem)
  • Radio LANs
  • PTT equipment

  • Julian Bunn, Julian.Bunn@caltech.edu