Geant4 simulation production in LHC experiments

Status of January 2005

 

23 Mar 2005, 15:00 CET

Editor: J. Apostolakis

Contributors: M. Stavrianakou (CMS), G. Corti (LHCb), A. Rimoldi (ATLAS)

Abstract

Detector simulation programs based on Geant4 were put into production in 2003-2004 in three LHC experiments.  This note documents key aspects of their entry into production use, and the current status of these Geant4-based simulation programs.

Introduction and Executive Summary

Geant4-based detector simulation programs entered production between November 2003 and May 2004 in CMS, ATLAS and LHCb.  The simulation of an LHC experiment is an important element to allow the understanding of the experimental conditions and its performance, both in the optimization and design phase as well as during future data taking.

 

Geant4 development, by the RD44 (1994-1998) and Geant4 (1999-present) collaborations, has provided a toolkit for detector simulation at LHC and other HEP facilities, and for applications in other areas including space and medicine. Atlas, CMS and LHCb were amongst the first participants and users, closely followed this development.  They have been utilizing the toolkit in test beam comparisons, for the validation of the physics modeling, and in the creation of full detector simulation programs, pursuing improvements in robustness and performance to obtain production quality programs.

 

The three Geant4-based experiment simulation programs during 2003-2004 have demonstrated very low crash rates (less than one crash per ten thousand events) and computing performance comparable to Geant3 (within a factor of 1.5 to 2).  The considerable set of physics validations in test beam setups has provided a measure of the physics performance achieved and is continuing to provide a yardstick. The creation and entry into production of these simulation programs have benefited strongly from the interaction and support of the Geant4 collaboration.

 

Continued progress is underway, improving these programs and utilizing the more current releases of the Geant4 toolkit.  In addition the widespread use of the simulation programs in current productions for physics studies is now enabling their further validation in realistic physics studies conditions.

1. Geant4 simulation production in CMS

 

The first complete and fully physics-validated version of the Geant4-based CMS OSCAR simulation program, was delivered for production in mid-November 2003. This was the result of an iterative process, starting with an already technically complete version in June 2003, subjected to O(10K) physics event testing in production, followed by a second iteration with enhanced functionality in August 2003 and production-scale tests of 0.5M single particles and 1M physics events by end September 2003, and a final iteration of extensive detector and physics validation by the CMS Physics Reconstruction and Selection (PRS) groups by November 2003. At that point, OSCAR 2.4.5 based on Geant4 5.2.p02 was officially deployed. CMSIM, its Geant3-based predecessor, was phased out, after having delivered more than 50M physics events in the course of the CMS DC04 preproduction.

 

Throughout this process, CMS has profited from an excellent collaboration with the Geant4 experts, addressing new requirements and tracking and solving problems in a variety of areas. Issues resolved ranged from the tuning of electro-nuclear processes, to detailed technical analyses in terms of software integrity and performance. Developments that addressed new emerging needs included non-trivial requirements for changing the identity of particles propagated and the handling of Monte Carlo truth.

Current status

OSCAR 2.4.5, has been the longest-lived version of CMS software in production, and in the course of 12 months has delivered more than 40M physics events of a total of 90M for DC04.  We note that OSCAR 2.4.5 is now being replaced by the latest release, OSCAR 3.6.5 based on Geant4 6.2.p01.

 

Regarding robustness, OSCAR 2.4.5 provided a low rate of one crash per ten thousand proton-proton events (mainly in baryon decays and photo-nuclear interactions).  This appears to have dropped even further with Oscar 3.6.5, as indicated by robustness tests with over 1M single particles and 0.5M physics events. Moreover, event reproducibility, achieved after detailed memory debugging in collaboration with the Geant4 experts, and major improvements in the Geant4 debugging facilities, will enable even faster discovery and fix turnaround, even in the event of rare crashes.

 

It terms of memory, OSCAR 2.4.5 showed on average a footprint of 220 MB compared to 100 MB for the Geant3 simulation.  Improvements in the course of 2004, largely pioneered by the Geant4 team in the handling of electromagnetic processes, have resulted in a much reduced footprint of 110 MB for pp events (~600 MB for Heavy Ion events). This is largely stable throughout the simulation of at least 1K event physics event samples. The small memory footprint also makes it possible to run several production jobs per machine node without performance penalties due to swapping.

 

In terms of performance, OSCAR 2.4.5 has been roughly a factor 2 slower than its Geant3-based predecessor. We note that this comparison is not straight forward, due to a number of other differences between the programs: a more sophisticated scheme, the choice of conservative values for range cuts in Oscar, the use of physics models with higher overall quality (more detailed descriptions, new processes such as the hadron-induced electro-nuclear processes and synchrotron radiation) and the superior tracking, in which, by choice, no tracking cuts are applied. Performance tuning in Geant4 and CMS code during 2004 has improved this, by 10-20% depending on the physics channel. Additional tuning, aided by detailed profiling studies and technical upgrades in the CMS magnetic field access code, is underway and is expected to further improve this.

 

OSCAR has now been largely validated in CMS, for use in simulation of Heavy Ion collisions, after a successful test processing 100 events.  Unlike its Geant3-based predecessor in which events were processed in slices of 100 tracks, due to ZEBRA limitations, OSCAR is able to process entire heavy ion events (more than 50K particles impacting the detector).

As of OSCAR 3.4.0, CPU performance is improved: an average event consumes 180 CPU minutes on a 2.8 GHz P4, compared to 230 CPU minutes for the Geant3-based program. Improvements are foreseen to bring down the memory footprint, which, depending on the multiplicity, can rise to over 600 MB.

Underway and future

OSCAR 2.4.5 is now being replaced by the latest release, OSCAR 3.6.5 based on Geant4 6.2.p01.

 

Production-scale tests of OSCAR 3.6.0 demonstrate better performance behavior than OSCAR 2.4.5: the time per event has a narrower distribution for the same physics channel. OSCAR 2.4.5 showed long tails in the time per event. These tails made it difficult to spot potential problem events (in which tracks experienced infinite loops in tracking), and, on occasion, caused disproportionate inefficiencies in large-scale productions. The improved behavior observed with the Geant4 6.2.p01 based release will translate into a higher efficiency for CMS production.

 

More recently, tests are underway for utilizing shower parameterization for improving performance.  The parameterization of the electromagnetic parts of a shower uses the techniques of the GFLASH package for Geant 3, and was developed by CMS, extending a first implementation by ATLAS members. CPU comparisons between full simulation and GFLASH-like EM shower parameterization, indicate that performance gains of factor 2-3 at 1-10 GeV to factor 20-60 at 100-300 GeV for a single electron can be achieved, while maintaining good agreement in terms of energy deposits and profiles. The resulting package for shower parameterization has been productized in collaboration with Geant4 experts, will be available as of the Geant4 7.0 release.

 

OSCAR 3.6.5, based on Geant4 6.2.p01, is now going into production for the CMS physics TDR, for which the goal is to deliver no less than 10M physics events per month through the entire chain of generation-simulation-digitization-reconstruction and DST production.

 

2. Putting in place Geant4 as LHCb’s production simulation engine

General

 The primary goal of LHCb experiment is to make precision measurements of CP violating decays and other rare phenomena in the b-system. These enable detailed tests of the Standard Model description of CP violation, and investigation of possible new physics, profiting from the copious amount of b hadrons to be produced at the LHC.

The studies performed for the LHCb TDRs produced through end 2003 were based on the Geant3 simulation package. The FORTRAN based simulation has been in use since the Technical Proposal and included the detectors as designed at this time.

 However no further development was foreseen in the FORTRAN based simulation package.

 As LHCb software is using OO technology and the C++ programming language, it was natural to switch to the OO Geant4 toolkit for the simulation application of LHCb. New developments and active support are provided by the Geant4 collaboration, of which the LHCb simulation can profit.

 

Gauss

 Gauss, the LHCb Object-Oriented (OO) simulation application, was put in production  in spring 2004 and has been used for the LHCb Data Challenge performed during 2004 (DC04).

 

 Gauss mimics what will happen in the spectrometer and integrates the generation of proton-proton collisions and decays of b hadrons (e.g. Pythia  and EvtGen) and the successive tracking of these primary events in the  experimental setup (Geant4).

 These two independent phases are normally run in a single job but can be also executed separately. Particles in the HepMC generic format are  produced by the generator phase and transformed into Geant4 primary particles  for further processing.

 Gauss, as all LHCb applications is based on the Gaudi framework and uses common core software for the LHCb detector geometry (Detector Descriptions) and event data (LHCb event model).

 Geant4 interacts in Gauss with these LHCb components via a set of interfaces and converters encapsulated in dedicated software (GiGa), which allows the conversion of the LHCb geometry in the Geant4 description. It also allows the conversions of the output of Geant4 in the form of hits produced in the sensitive detectors as well as the Monte Carlo truth history in the LHCb event model. The behavior of the Geant4 simulation engine in terms of detectors to simulate and the physics models to use is set up at run time via job options configurations.

 The development of GiGa, the interface between Gaudi and Geant4 started in 1999 and a first prototype was available in 2000. In parallel first standalone studies to explore the feasibility to use Geant4 to simulate the RICH detectors were performed in 2000.

 RICH test beam setups have been simulated and analyzed since then, to refine the RICH simulation in the Gauss application.

 In 2000 also comparisons of ECAL (Electromagnetic calorimeter) test beam data with Geant3 and Geant4 were performed. Similar comparisons were performed at that time for HCAL (Hadron calorimeter) test beam data.

 The Gauss project to provide a complete LHCb simulation application based on Gaudi and on Geant4 was started at the end of 2001.

Bringing Gauss into production

 In July 2003 LHCb had a fully functional Gauss program. In order to consider it fully operational Gauss had to be complete, simulating all detectors and providing all the information needed for further processing.

 Detailed comparisons with the Geant3 based simulation were carried out by the different sub-detectors, comparing occupancies, momentum distributions, energy depositions to validate Gauss as a replacement of the Geant3 based simulation.

 In addition it is under continuous validation with test beam data to tune the physics settings; the work in the different sub-detectors evolving at a different pace depending on their needs and data.

 In this context LHCb also collaborate to the LCG physics validation project and is investigating the use of Geant4 for radiation studies.

 The simulation programs was required to be stable in term of CPU time consumption  and low crash rate in order to be used for massive production. Test productions, of the order of 100k events, began in autumn 2003 to validate this aspect.

 The good collaboration with the Geant4 team allowed to improve the code in term of stability.

 

 As mentioned before Gauss has been used in the LHCb DC04 for massive data production, starting from the 3rd of May 2004. This has allowed to further test its robustness. The version of Geant4 used is 6.1. There have been four versions of Gauss deployed, to fix bugs identified in production. The failure rate in Geant4 in production is currently below 1 per mille of the runs. In each run 1500 event are produced.

 

 Gauss has been used to produce signal B events of interest for the physics program of the experiment as well as generic B and minimum bias events for trigger studies and consolidation of the physics performance studies performed in 2003.

Current status

In a first stage of DC04 around 250 M events have been produced, a second stage started in December 2004 produced over 80 M events by mid-January 2005 with a rate of around 1.5M events per day.

 

 The memory usage of Gauss is around 220 MBytes, without significant memory leaks. The CPU performance varies depending on the complexity of the primary events to simulate ranging from 22 s/event for minimum bias events to 65 s/event for b events (on a 2.4 GHz Pentium IV with gcc 3.2 -O2 compilation).

 

 The data produced with Gauss in DC04 will be intensively scrutinized in the coming months by many physicists in the LHCb collaboration. This will allow to better understand and improve the simulation itself and identify new needs.

 

 It is foreseen to introduce more realism and details in Gauss both from better understanding of the simulation and test beam data.

 It is foreseen to adopt and validate new versions of Geant4 as they are available to try relevant new features as well as provide feedback and new requirements to Geant4. Alternative choices to those adopted currently in Gauss are also foreseen to be explored as for example physics lists.   

 

3. Commissioning the full simulation program for the Atlas detector

Introduction

 The ATLAS detector simulation programs have been heavily based on the GEANT3 detector simulation infrastructure since the experiment inception and for about 10 Years. These programs have been used in the preparation of the Letter of Intent, in detector optimization studies and for the sub-detectors Technical Design Report. With the implementation of the GEANT4 toolkit, ATLAS prepared to move to the OO paradigm for its simulation suite as of 2000.

GEANT3 and GEANT4 have been run together in order to validate the new suite against the previous one, the switchover happening in 2003 in preparation of the upcoming Data Challenge (DC-) 2. ATLAS GEANT3 has now been discontinued, and all new developments are carried out in the new environment.

Key requirements for the G4ATLAS program were to enable dynamic loading and action-on-demand. All user-requested functionality are added by means of plug-in modules.

The simulation suite is currently fully deployed for both detector and test beam simulations. A single executable is used in all cases, with the detector configuration chosen at run time. This ensures consistency throughout all applications.

G4ATLAS has been intensively exploited in running production tests, simulations for the ATLAS Combined Test-beam facility, heavy ions simulations and ultimately in DC-2, which took place in spring-summer 2004.

Continuous comparisons with experimental results and previous simulations, through tests, constant improvements and upgrades contributed in making G4ATLAS a highly dependable and robust tool which will be utilized for the data taking preparation activities.

Physics validation activity and data challenges

 

A thorough validation programme was undertaken already in the early stages of the preproduction tests (2001) and throughout the Data Challenges (2003-2004).

This validation of the simulation suite was a multi-step process. Initial tests were run on simplified geometry layouts, validating and refining the sub-detector simulation packages.

More extensive tests were done during the Data Challenges (0 and 1), with the main emphasis shifting to complicated physics event samples.

Data Challenge 2 is fully GEANT4-based. Its target is to produce event samples for large scale physics analyses and for testing the ATLAS computing model: more than 12 millions of physics events were to be produced, reconstructed and analyzed in several laboratories around the world.

 

During initial tests in a short ramp-up phase, a job failure rate was validation at a level of 10 % for single particle samples and 30% for full physics events. Fixes were found and introduced with the help of the Geant4 team. The simulation programs then became quite stable and robust, with rather good performance. The full event sample was processed over a four month period with only one job failing due to a bug in GEANT4. As an example the whole NorduGrid event sample (3.5 millions of events subdivided in 35,000 jobs) was processed without a single reported failure.

 

An extensive physics validation programme has been undertaken since 2000 to exploit the physics models implemented in GEANT4 and to ensure  that the GEANT4 simulation would meet the expected precision targets, by comparing with test beam results (whenever available.)

In all cases, extensive comparison with experimental data from beam tests shows very good agreement (normally at the 1% level or better for electromagnetic observables) and predictive power.

The sub-detectors

The ATLAS sub-detectors are all simulated in detail. In the Inner Detector, for example, a detailed implementation for the three technologies (Pixel, Silicon and TRT) is in place.  The detector response is calibrated on test-beam results and used to feed the ATLAS simulation.

 

The calorimeter simulation is very detailed for each component (barrel and forward calorimeters, both electromagnetic and hadronic).  Extensive tests are investigating and optimizing the the physics model choice from GEANT4, and the geometry – to reduce the memory used at runtime. Studies using shower parameterization are also under way, to further optimize the performance.

 

The muon spectrometer, which is the outermost detector of ATLAS, is carefully simulated in order to reproduce not only all the detector details but also its peculiar asymmetries. A final/initial layout switch, is implemented (as in the ID), to compare performances and to check the earliest discoveries at experiment start-up.

 

Pre-production and validation

Preliminary simulation validation started at the end of 2003. Comparisons with Geant3 using common event samples and with the same geometry were performed. Hits and digits for all the Atlas sub-detectors were collected. Jobs ran on the LSF batch / Castor facility at CERN and outside. At different event/run phases the performance and memory usage was measured. The generated samples were single particle at different energies, SUSY events, H→4 leptons Z→2leptons(e, m, t), dijets and minimum bias events.

 

The initial failure rate was ~10% for single particle jobs and 30% in physics events, where many problems arose. Correcting geometry problems, physics problems (GEANT4 physics lists) resulted in a final failure around 0%, excluding AFS or Castor problems. All jobs now run reliably to the end. The hit size is less than a factor of 1.5 of that GEANT3 (700Kb for G3 with 900KB G4), while the execution time is ~1.5 times that of G3.

 

Runtime inspection of memory and CPU usage allows users to monitor the performance, to detect memory leaks during data production, and to evaluate the cost of new features. Contributions to total process size at different processing steps have been tabulated.

Timing for different event samples

Measured the average CPU time per event, using different event samples,  (results are expressed in seconds using a Pentium4 2.4GHz machine). Single particle at pT= 50 GeV were shot in different pseudorapidity intervals (depending on relative detector coverage). Samples of full SUSY/SUGRA events, H→4l, Z->2l were generated and the CPU time consumption evaluated.

A small sample of heavy ions (Pb-Pb collisions at 5.5 TeV and |h|< 6 or 3.2) were also evaluated.

Conclusion

The GEANT4-based simulation was successfully tested in ATLAS and it now replaces the GEANT3-based one. Extensive measurements of the performance and robustness of the new simulation show that it is a great success. Further improvements in CPU performance from the parameterization of calorimeter response are expected.

Acknowledgements

 

The work described in this short report is the result of collaborative efforts between the simulation teams of the experiments, other experiment groups, the Geant4 team in PH/SFT and the Geant4 collaboration at large.

Part 3 of this report are adapted from Reference [A], which documents the bringing into production of the G4ATLAS simulation program by the Atlas detector simulation team.

References

 

[A] Tests of the full simulation program for the ATLAS detector: An Overview of performance and robustness, A. Rimoldi, University of  Pavia & INFN, Italy,  A.Dell’Acqua, M. Gallas, A.Nairz, CERN, Geneva, Switzerland, J. Boudreau, V.Tsulaia, University of Pittsburgh, USA, D. Costanzo, LBL, USA