Kernel Interfaces for Goal Oriented Workload Management

The eWLM project[1] is motivated by IBM's autonomic computing vision[2]. IBM has considerable experience with managing heterogeneous workloads (e.g. web services, database services, transaction coordination and management as well as traditional batch) running on homogeneous clusters of large servers[3]. Resources (CPU, I/O priority, etc.) are dynamically allocated to important work (e.g. stock trades in excess of some monetary threshold) that is not meeting goals (sub-second response time) in preference to work which is meeting goals or is "less important" (e.g. balance queries for small accounts). We wish to apply this experience to similar workloads on a distributed collection of heterogeneous servers. As a first step we needed to provide instrumentation for measuring internal kernel activity that is analogous, if not completely equivalent, to what is available for mainframes.

Data is collected on applications instrumented using libraries implementing the Application Response Measurement (ARM) standards[4]. Different types of work have different performance goals, and any data collected must be attributable to a particular class of work. On each server, a local agent collects data about application response time, resource usage and reasons for delay (waiting for CPU, I/O, network, etc.). This data is collected from the local agents by an eWLM manager which uses the data to form a view of the topology of the configuration (i.e. how individual "transactions" flow between servers), where bottlenecks can arise and what can be done to meet desired performance goals.

The ARM 4.0 standard[5] defines APIs to

register applications
register transactions
- "classes" of transactions as opposed to individual transactions which are instances of the transaction class.
report the start of a transaction
report the completion of a transaction
associate/disassociate a transaction with/from an individual thread
- I.e. a pthread in the case of a UNIX-like OS
report that a transaction is "blocked"

as well as others. In addition to the implementation of the ARM APIs which associate transactions/classes of work with kernel entities, interfaces are required for the local agent to collect "sampling" data from the kernel. Information on resources used and delays experienced by individual transactions and/or classes of transactions and applications is collected by the local agents from the kernel and forwarded to the eWLM manager. We describe the data structures and APIs designed for a prototype kernel implementation of the ARM APIs for two different UNIX kernels, and compare (qualitatively) the data collected by the UNIX-style instrumentation with the more mature mainframe workload manager[3]. We restrict our attention mainly to the requirements of local agents on individual servers, but comment on some issues about collection and assimilation of data by the eWLM manager. A key issue is providing adequate data collection granularity (sub second) without imposing unacceptable overhead (actual or perceived).

Investigators:

Matt Thoennes
Donna Dillenberger
Josh Knight

All of IBM T.J. Watson Research. This poster session reports on work done in collaboration with many others in various product divisions. The emphasis is on the exploratory work done in the Research Division as distinguished from the product division design and development work in support of which the exploratory work was done.

Kernel Interfaces for Goal Oriented Workload Management

Investigators:

Notes: