Next: HBOOK and its Up: Introduction Previous: Introduction

Data processing flow in particle experiments

In the late sixties and early seventies a large fraction of particle physicists were active in bubble chamber physics. The number of events they treated varied between a few hundreds (neutrino) to several tens of thousands (e.g. strong interaction spectroscopy). Normally users would reduce there raw ``measurement'' tapes after event reconstruction onto Data Summary Tapes (DST) and extract from there mini and micro DSTs, which would then be used for analysis. In those days a statistical analysis program SUMX [4] would read each event and compile information into histograms, two-dimensional scatter diagrams and `ordered lists'. Facilities were provided (via data cards) to select subset of events according to criteria and the user could add routines for computing, event by event, quantities not immediately available.

Although the idea and formalism of specifying cuts and selection criteria in a formal way were a very nice idea, the computer technology of those days only allowed the data to be analysed in batch mode on the CDC or IBM mainframes. Therefore it was not always very practical to run several times through the data and a more lightweight system HBOOK , easier to learn and use, was soon developed.

It was in the middle seventies, when larger proton and electron accelerators became available, that counter experiments definitively superseded bubble chambers and with them the amount of data to be treated was now in the multi megabyte range. Thousands of raw data tapes would be written, huge reconstruction programs would extract interesting data from those tapes and transfer them to DSTs. Then, to make the analysis more manageable, various physicists would write their own mini-DST, with a reduced fraction of the information from the DST. They would run these (m, $μ$ )DSTs through HBOOK, whose functionality had increased substantially in the meantime . Hence various tens of one- or two-dimensional histograms would be booked in the initialization phase and the interesting parameters would be read sequentially from the DST and be binned in the histograms or scatter plots. Doing this was very efficient memory wise (although 2-dim. histograms could still be very costly), but of course all correlations, not explicitly plotted, were lost.

HBOOK in those days still had its own memory management, but with version 4 [9], which became available in 1984, the ZEBRA data memory manager was introduced. This not only allowed the use of all memory managament facilities of ZEBRA, but at the same time it became possible to use the sequential FZ and random access RZ [10] input-output possiblities of that system. This allows ``histograms'' to be saved and transferred to other systems in an easy way. At about the same time Ntuples, somewhat similar in functionality to ``events'' as written on a miniDST were implemented. This way the complete correlation matrix between the various Ntuple elements can be reconstructed at will. The last few years multi Mflop machines have become available on the desktop, and ``farms'' of analysis machines are being set up to ``interactively'' reconstruct events directly from the raw data as registered in the experimental setup, hence bypassing the ``batch'' reconstruction step. The first Ntuple implementation can be thought of as a static large two-dimensional array, one dimension representing the number of events and the other a number of characteristics (floating point numbers) stored for each event. With the present version of HBOOK Ntuples can contain complex substructures of different data types, which allow a certain dynamicity. Moreover tools have been developed to dynamically share data between various processes (Unix) or global sections (VMS). This makes it now possible to sample events as they are registered in the experimental setup or, when the computing power is available, to reconstruct, vizualise and analize events in real time as they are recorded in the experimental apparatus. It is expected that this will progressively eliminate the intermediate Batch/DST analysis step and allow, with the help of Monte Carlo events and calibration data, an (almost) immediate response to the data taking needs of a large experiment.

Next: HBOOK and its Up: Introduction Previous: Introduction

Last update: Tue May 16 09:09:27 METDST 1995