AF
meeting 11 June 2009
Attendees
Oliver Keeble,
Vincenzo Innocente, Pere Mato, Rene Brun, Peter
Kelemen, Vincenzo Innocente, Stefan Roiser, Marco
Cattaneo, Witek Pokorski, David Quarrie, Gabriele
Cosmo, Paolo Calafiura, Liz Sexton-Kennedy[remote], Predrag Buncic, Pete Elmer
[remote]
Agenda
- Line management issues
- Action Items
- Proposal for a common heap analysis tool
- Status of releases
- Platform and external libraries issues
- Experiments
feedback
- R&D
project status
- Topics
for AA meetings
- AoB
version 0 (12/06/09 @ 11:00)
Line
management issues (Pere)
- The SFT group will receive the Portuguese trainee previously
working in IT-DM. He will be working on the Math work package of
ROOT.
- The group will also host a new Fellow for the Linear Collider detector studies starting in September,
- DCRB decisions for the two people in the group has been postponed to July.
Action
items
- Update on the schedule for the deployment of SLC5 at CERN (Peter)
- Peter informed that the SLC5 computing capacity at CERN is
ramping-up according to the plans approved by the WLCG-MB. There is now
more capacity in the form of SLC5 than SLC4 (which is very much
underused)
- The current schedule for switching the lxplus alias to the SLC5
interactive service is end of July as was announced by Tony in the last
GDB. This is not what the experiments have requested in several
occasions.
- Peter also reported that the SLA for the "build servers" was
agreed by the WLCG-MB since they are considered by IT-FIO as
VOBoxes.
- Follow-up of the proposal of disabling SELinux (Pere)
- There is general consensus by the GDB that sites should
configure SL5 worker nodes with SELinux partially disabled. There is
still an issue with the interactive services (i.e. lxplus), which
should have the same kind of configuration as the batch/worker nodes
otherwise end-user programs can not be developed and tested. This needs
to be followed --> Action Pere.
Proposal for a common heap analysis tool
- Paolo explained the motivation for a LHC-wide tool to analyze the
content of our applications heap, in particular which object does it
contain, how are they positioned, who allocated them, and also which
objects are fragmenting the heap. Every experiment has developed a tool
that gather the raw data needed to analyze heap contents (e.g.
Hephaestus, igProf, and TMemStat) which is similar to kcachegrind but
much faster. He is convinced that there is room for doing something in
common.
- What is realistic is to have some discussions among experts to
exchange ideas and come out with a definition of a common data
collected format such that we can then build the tool for doing the
analysis.
- Rene explained the idea behind of TMemStat. This enables to
monitor only a part (e.g. package) of a complete program. It simply
collects the information associated to each malloc call (time
stamp, size, callstack,...) and stores it in an optimized TTree.
There are tools available to visualize and analyze the results.
- Pete claims that the problem sould collect all the information
and the filtering be done off-line. The way it is done in igProf
implies only a 30% overhead. Collecting unfiltered data allows the heap
profiling and the leak check at the same time.
- There was general agreement that the kind of information
collected by the different tools is the similar, so it should not be
difficult to achieve a common format. Rene said that the agreement
should be on what to collect and the filtering criteria. Paolo
suggested to have a look at the Massif tool format of the information.
- There was agreement to setup a group of experts and discuss the
details. Very easily CMS and ATLAS experiments can provide the names of
their experts and organize the meetings.
- Marco proposed to have an AA meeting dedicated to these issues, and this was endorsed.
Status
of Releases
- ROOT version 5.24 will be released at the end of June.
- It is included new version of TMVA, RooStat, xrootd, DCache plugin (to get rid of the client),...
- There is currently a 'blocking' problem with TDCache plugin.
The same client version (1.8.0p1) with two different versions of ROOT
one fails and the other not. It is claimed that the new version of
dcache_client fixes the problem. LCG_56b will be made with the new
release of the client (to be made available very soon).
- It was agreed that testing for all the different data storage
plugins will be made available in the LCG nightlies. Currently only
castor is tested.
- A new (very recent) version of gccxml is needed to fix some problems with TList::operator=. It will be included in the LCG_56 series too.
- There was a 'beta' release of Geant4 last week. It includes new
physics lists to be tried out by the experiments. CMS is already
doing it.
Platform
and external libraries issues
- Additional SLC5 issues and feedback from yesterday's GDB
- The GNU gcc 4.3 installation will be distributed by the experiments
- Stefan's proposal to change the name of the compiler to avoid conflicts with other installations was not taken.
- The response from RH was to use gcc 4.4 (which is again a technology pre-view quality release).
- Stefan also proposed to produce a strip down installation
removing unneeded components such as ADA, ObjectiveC, Java, etc. This
was agreed.
- The meta-RPM for deployment of the compatibility libraries was accepted.
- As of yesterday the list of libraries in the meta-RPM was almost complete.
- The RPM will be installed first in one worker node and Stefan
will run some verification. Then the installation will be done in a
number of sites for the experiments to test.
- David requested to have the new version of tcsh 6.1.6 (with less
limits in the command/environment length) to be installed in the
contrib area since it can not be put as part of SL5.
- Marco mentioned the existence of a 'SLC5 reference machine' on which the latests RPMs are installed.
Experiments
feedback
- ATLAS
- The 15.2 production release is being validated. They have been having problems with multi-platform installation in sites.
- The validation of native SL5 is starting outside CERN. They had still some problems with some MC generators (SELinux related)
- CMS:
- Nothing to add with respect to the statement for the GDB.
- LHCb:
- They have produced a complete release of the software stack for
SLC5. It is in use locally but there is no deployment and validation on
the Grid yet.
- ALICE:
R&D
projects
- The agenda for the Workshop
on adapting applications and computing services to multi-core and
virtualization is now complete.
- Multi-core
(Vincenzo)
- There was a meeting last week meeting with Alice (new DOCT
student) on the work they are starting in adapting the simulation and
AliROOT to multi-core.
- There have been a discussion with Intel concerning matrix expansion and vectorization in Larrabee.
- Danielle has finalized the inclusion of perfmon calls in a module of CMSSW.
- Virtualization
(Pedrag)
- Development of version 1.3.0 CernVM for LC people has been finalized.
- There have been a number of workshop preparatory discussions and meetings with IT groups (batch systems, network,..).
- CVMFS development. Complete re-write of the code (transparent
to the end-users). A lot of new optimizations (compression,
pre-fetching, signing) for the read-only version. For the Read/write
version the catalog will be automatically created.
AA
meetings topics
- Suggested to schedule a 'Heap monitoring tools' session to present the findings and proposals of the new working group.
AoB
- The next meeting clashes with the workshop therefore it was decided to cancel it.