A Computational and Data Scheduling Architecture for HEP Applications
Michael Ernst Deutsches Elektronen-Synchrotron, DESY Hamburg michael.ernst@desy.de Patrick Fuhrmann Deutsches Elektronen-Synchrotron, DESY Hamburg patrick.fuhrmann@desy.de Alexander Papaspyrou Robotics Research Institute alexander.papaspyrou@udo.edu Martin Radicke Deutsches Elektronen-Synchrotron, DESY Hamburg martin.radicke@desy.de Lars Schley Robotics Research Institute lars.schley@udo.edu Ramin Yahyapour Robotics Research Institute ramin.yahyapour@udo.edu
This paper discusses an architectural approach to enhance job scheduling in data intensive applications in HEP computing. First, a brief introduction to the current grid system based on LCG/gLite is given, current bottlenecks are identified and possible extensions to the system are described. We will propose an extended scheduling architecture, which adds a scheduling framework on top of existing compute and storage elements. Goal is an improved and better coordination between data management and workload management. This includes more precise planning and prediction of file availability prior to job allocation to compute elements, as well as better integration of local job and data scheduling to improve response times and through-put. Subsequently, the underlying components are presented, where for the design of the computing element standard grid components are used. The storage element is based on the dCache software package that supports a scalable storage and data access solution, which is enhanced in a way that it can interact with scheduling services. For broader acceptance of the scheduling solution in Grid communities beyond High Energy Physics, an outlook is given on how the scheduling framework can be adapted to other application scenarios like e.g. the climate community.The project is funded by the German Ministry of Education and Science as part of the national e-science initiative D-Grid and is jointly carried out by IRF-IT of University of Dortmund and DESY.