Disclaimer: This document is first-cut and VERY DRAFTY! Many of
the sections are just placeholders for information to follow: lack of
content in a particular section does not relate to the importance
of that topic. The document is
based on material provided by: Brian Carpenter, Dave Foster, Sverre Jarp,
Chris Jones, Les Robertson, George Smyris, .... and on material from
Introduction
HEP computing is no longer at the leading edge of the field (except perhaps in
some special DAQ areas). Commodity computing for the games and home PC market
is increasingly characterised by the need for fast and high capacity disks (software
complexity, multimedia data), fast CPUs (interactivity, audio/video streaming),
fast and plentiful memory (ditto), fat network pipes (WWW, remote interaction),
high resolution screens (games, desktop publishing), etc.. These commodity needs
are driving down prices, and driving up performance, i.e. exactly in the
directions we need.
On the other hand, the commodity market is geared towards individual users today (although
there is a growing trend towards cooperative computing),
whereas HEP needs are (should be) focussed on World-wide interworking of
computing devices. The most important aspect of computing to concentrate on
developing for the LHC era is thus networking between the collaborating
institutes. This will require both technological and financial advances.
Perhaps another area that is "special" is that of data storage, or rather,
information storage and retrieval. Although LHC event volumes appear
staggering today, they may not be so special when compared with future
needs in commercial digital multimedia (digital masters for movies etc.).
In this document we should focus on both what in the market will sell
well and therefore be plentiful for HEP, and what we perceive as being
fringe requirements that are likely to throttle down deployment of HEP
facilities ...
Software
Common Software Libraries for HEP
The CERN Program Library will rapidly move to a completely new scheme,
characterised by:
The new Program Library, tentatively called LHC++ (for Libraries for
HEP Computing ++), will thus comprise layers:
Implicit in this model is the use of C++ in the LHC collaborations.
The areas in the above list that are not industry standards are exactly
those areas where HEP has specific requirements.
Operating Systems
We believe that there will be a convergence towards mainly two OS:
Windows/NT (or its successor) for the desktop and a "standard" Unix
for high-end specialised servers. Chris: World-wide revenues from these
two ends of the market are expected to be the same in 2000 (high volume,
low cost for NT, low volume, high cost for Unix). Dave: Revenues from
NT will exceed those from Unix in 1998... by 1997 NT will already
have 30% of the Web server market (Unix 50%) ... and the NT growth
rate is phenomenal.
The importance of OS is questionable as emphasis is increasingly
put on the applications in the commodity market.
Aside: We lack any evaluation of Windows/NT as a viable desktop for
Physicists.
DataBases
There are two major categories of database that will be used in LHC era:
Distributed Objects
The distinction between objects and components is purposefully left fuzzy in this
document.
We predict a distributed application environment where distributed
components which sit close to the data but allow operations on the data
by client requests from remote devices. This environment has the benefit that
components are only invoked when actually required by the end user's
application. There are implications here on licensing costs: as an example
we can imagine that it will no longer be required for a user to install
a local copy of Word, the user will instead call up the word component
running on the departmental or institutional server. This model has benefits
in problem areas such as where to store the collaboration's documents whilst
allowing many different authors access to the document database.
Careful attention has to be given to the placement of the collaboration's
components in the network. The following factors must be considered:
In the last category, we see the possibility of "renting" access to a component
on the network.
Distributed FileSystems
Wide area network distributed filesystems are a niche market. AFS is still not
widely supported, and DFS is painfully slow in arriving.
The lack of a properly supported global filesystem threatens to
inhibit World-wide collaboration in HEP in the long term, unless other solutions
can be found that allow transparent access to HEP data without
the concept of a global file system. One such solution, based on distributed
objects, has already been described above.
In summary, distributed file systems have had their chance: we are sceptical
that they will be fully deployed in the future.
Hierarchical Storage Management Systems
Will become a fringe requirement as we move to distributed components.
Maybe needed at CERN, but the requirement for file-based storage
management is obsoleted if use of OO databases is widespread.
Browsers
Advent of Java, distributed components etc.. Some believe that the
Operating Systems of the future will be browsers: one will always
work in the context of the browser. Others say that browsers will
become the "DOS" of the 90's: a limited single-threaded environment
with severe restrictions on usability. Browsers are in any case
constrained to run on a particular OS, and make use of the OS
functions.
Visual Programming
Self Documenting Development Environments
Collaboration-wide programming should be governed by
the following factors:
Formal methods are in general too heavyweight. They cut productivity
and are a source of confusion. The Microsoft approach is to split the
application domain into components. The methodology is then concerned
with the linking together of components. Component development
conventions and rules are stored in a common database. The component
development tools then help to enforce the adoption of the conventions
and rules by integrating them in, for example, Class Wizards.
One benefit that comes for free with this approach is the possibility
of enforcing correct levels of code documentation at development time.
It is crucial
that conventions and rules be applied in as lightweight and user-friendly
way as possible, and certainly not as an after-the-event activity undertaken
by a code librarian, which provokes all sorts of practical and sociological
problems...
Standards
Software Licenses
Today, everyone expects to have to buy software licenses for compilers
and other OS-dependant tools for their workstation. The average license costs
are perhaps around 2000 CHF per workstation, with yearly maintenance of
around 200 CHF.
Aside: A move towards Windows/NT will result in a sharp drop in commercial
software costs: occasionally a factor 2 for the "same" product.
As explained elsewhere in this document, we advocate a move towards
commercial, industry standard software where appropriate for HEP,
so that, in the future, the number of licenses required will increase from just
compiler licenses to include licenses for standard libraries and application
development tools. This
naturally involves a financial committment, both for licenses and for
support. There is real cost saving if this route is followed, simply because
it frees manpower from supporting home-grown software where better
commercial software exists.
We estimate that the LHC collaborations should each foresee yearly fees
of around 100 kCHF for collaboration-wide software licenses and software
maintenance.
Hardware
Devices used directly by Physicists
Computing Devices
We identify the following categories of devices that
will be used by physicists for their everyday computational
tasks.
Aside: We have no feeling for how applications will make best use
of the GByte memories that will be available at this time. The
opportunities for caching whole event samples are evident ...
Interaction Devices
Interactivity will continue to be a "hot potato" in the next
few years. The Games market is what drives the technology,
and is seen as a multi billion dollar opportunity in the sector.
Sound input will only become really useful when the host can
"converse" with the user pseudo-intelligently.
Processors
General consensus seems to be
that availability of "enough" processing power will not
be a problem for LHC computing.
Processor powers are doubling every 1.5 years.
Today's average PCs
are around 10 CERN Units: at this rate, by 2005, they will be
around 600 CERN Units. Probably a conservative estimate. As the
vast majority of people in each LHC experiment will have desktop devices, this
represents a significant compute capacity for each experiment by itself.
Technological factors governing microprocessor evolution include:
number of gates (transistors) per unit volume, maximum clock
frequencies, lithography methods (Optical, DUV, X-Ray), interconnects
(insulator thickness, running down to 100 Angstroms) and power
dissipation. The commercial factors include the raw cost of materials
and the cost of fabrication plants. The yield of satisfactory chips
is also in the equation.
In our prediction of where processor technology will be early in the
next Millenium, we have assumed that CERN Unit ratings will double at
least every 2 years due to the use of Super Scalar, Very Large
Instruction Words, Massively Parallel or MT (?).
Probably unimportant which of the following categories
will dominate in the market, although it seems unlikely
that MPP will go anywhere (except perhaps in the area of parallel
database query engines):
By 2008 we predict processors will operate at 1.2 GHz, with 20-way
parallel instruction issue. Such processors will rate at about
2000 CERN Units at a cost of around 1 dollar per CERN Unit.
Memory
Memory prices are rather stable. This is despite technology
advances permitting higher gate densities in silicon, and is
probably due to the heavy requirement for memory resources
typical of newer operating systems and applications.
By 2008 we predict feature sizes of 0.1 microns, 20k million
gates per chip, and prices of around 0.1 dollars per MByte (around
250 times cheaper than 1996 prices).
Memory is a potential bottleneck in the long term, since there
are fundamental limits to the gate density. GaAs memories are
technically difficult to produce (materials science problem),
but maybe offer a solution in the long term.
Bus Architectures
The development of bus architectures tends to be driven
towards satisfying the I/O speeds of end devices.
Devices for Data Storage
Networking Components
One of the most difficult areas in which to predict trends. Complicated by
peculiar national PTT politics and tariff strategies, particularly
in Europe. One area in which cost of services is likely to be
the factor limiting what can be achieved for LHC computing.
The key technology is ATM (Asynchronous Transfer Mode). The standard
data rates available are 155 Mbits/sec and 622 Mbits/sec, and, later, 2.4 Gbits/sec
and 10 Gbits/sec.
(We expect to see 25 Mbits/sec desktop ATM interfaces for 150 dollars in 1996.)
This is a switching technology which allows point-to-point
applications an assigned data rate. The advantage is that the available
bandwidth can be shared between the applications using the ATM according to
the needs of each. The ATM market is already highly bouyant, and we predict
major growth of ATM networks into the next Millenium.
CERN network backbone probably at least 622 Mbits/sec ATM by 2005.
Although there are 155 Mbits/sec desktop interfaces available,
there is little or no current demand for them.
There is no market pressure at the moment for anything like 622 Mbits/sec
to the desktop, and some say that 10 Mbits/sec switched is quite enough.
Julian Bunn, Julian.Bunn@caltech.edu