Kernel Interfaces for Goal Oriented Workload Management

The eWLM project[1] is motivated by IBM's autonomic computing vision[2]. IBM has considerable experience with managing heterogeneous workloads (e.g. web services, database services, transaction coordination and management as well as traditional batch) running on homogeneous clusters of large servers[3]. Resources (CPU, I/O priority, etc.) are dynamically allocated to important work (e.g. stock trades in excess of some monetary threshold) that is not meeting goals (sub-second response time) in preference to work which is meeting goals or is "less important" (e.g. balance queries for small accounts). We wish to apply this experience to similar workloads on a distributed collection of heterogeneous servers. As a first step we needed to provide instrumentation for measuring internal kernel activity that is analogous, if not completely equivalent, to what is available for mainframes.

Data is collected on applications instrumented using libraries implementing the Application Response Measurement (ARM) standards[4]. Different types of work have different performance goals, and any data collected must be attributable to a particular class of work. On each server, a local agent collects data about application response time, resource usage and reasons for delay (waiting for CPU, I/O, network, etc.). This data is collected from the local agents by an eWLM manager which uses the data to form a view of the topology of the configuration (i.e. how individual "transactions" flow between servers), where bottlenecks can arise and what can be done to meet desired performance goals.

The ARM 4.0 standard[5] defines APIs to

as well as others. In addition to the implementation of the ARM APIs which associate transactions/classes of work with kernel entities, interfaces are required for the local agent to collect "sampling" data from the kernel. Information on resources used and delays experienced by individual transactions and/or classes of transactions and applications is collected by the local agents from the kernel and forwarded to the eWLM manager. We describe the data structures and APIs designed for a prototype kernel implementation of the ARM APIs for two different UNIX kernels, and compare (qualitatively) the data collected by the UNIX-style instrumentation with the more mature mainframe workload manager[3]. We restrict our attention mainly to the requirements of local agents on individual servers, but comment on some issues about collection and assimilation of data by the eWLM manager. A key issue is providing adequate data collection granularity (sub second) without imposing unacceptable overhead (actual or perceived).

Investigators:

All of IBM T.J. Watson Research. This poster session reports on work done in collaboration with many others in various product divisions. The emphasis is on the exploratory work done in the Research Division as distinguished from the product division design and development work in support of which the exploratory work was done.

Notes:

  1. http://www.research.ibm.com/thinkresearch/pages/2002/20020529_ewlm.shtml
  2. http://www.research.ibm.com/autonomic/research/
  3. "Adaptive Algorithms for Managing a Distributed Data Processing Workload," J. Aman, C.K. Eilert, D. Emmes, P. Yocom and D. Dillenberger, IBM Systems Journal, Vol. 36, No. 2, p. 242, 1997, http://www.research.ibm.com/journal/sj/362/aman.html
  4. Application Response Measurement - ARM, The Open Group, http://www.opengroup.org/tech/management/arm/
  5. ARM 4.0 C Binding - Final Ballot Draft (PDF), http://www.opengroup.org/tech/management/arm/doc.tpl?CALLER=index.tpl&gdid=3600