Skip to main content

30 OCTOBER 2006

Speaker: Barton Miller, Professor, Computer Sciences Department, University of Wisconsin – Madison
Title: Binary Code Patching: An Ancient Art Refined for the 21st Century
Host School: NCSU
Duke Hosts: Xiaobai Sun (xiaobai at and Paolo Bientinesi (pauldj at 
UNC Host: 
Rob Fowler (rjf at
NCSU Host: Frank Mueller (mueller at

Patching binary code dates back to some of the earliest computer systems. Binary code patching allows access to a program without having access to the source code, obviating the need to recompile, re-link, and, in the dynamic case, re-execute.

In the early days, it was a bold technique used by serious programmers to avoid the long recompile/reassemble and link steps. Code patching required an intimate knowledge of the instruction set and its binary representation. Great advances have been made in simplifying the use of code patching, making it less error prone and more flexible. “Binary rewriters” were a great advance in the technology for modifying a binary before its execution. These tools, such as OM, EEL, and Vulcun, have enabled the building of tools for tracing, simulation, testing, and sandboxing.

Moving beyond static patching, we developed “dynamic instrumentation,” the ability to patch code into a running program. Dynamic instrumentation provided the ability to adapt the code to the immediate need, dynamically control overhead costs. This technology has been applied to both user programs and operating system kernels. Example of dynamic instrumenters include Dyninst and Kerninst. This technology forms foundation of the Paradyn Performance Tools.

Dynamic code patching continues to get more aggressive. The latest work in the area, which we call “self-propelled instrumentation,” inserts instrumentation code that propagates itself along the program’s control flow as the program executes. At its best, this technique can provide very low overhead, detailed instrumentation. Examples of such systems include systems such as spTrace, DIOTA, FIT, and DynamoRIO.

Key to both static and dynamic patching are the interfaces. There is difficult balance between providing an interface that abstracts the details of the code, often using control- and data-flow graphs and instruction categories, and an interface that exposes the details of the instruction set.

In this talk, I will discuss the development of code patching over the years, with examples from the various technologies (including our tools) and present results from our latest work in self-propelled instrumentation. I will also discuss interface abstractions and our work towards the goal of multi-platform interfaces and tools.

Some related readings from our project:
The original Dyninst paper:
Instrumenting threaded programs:
Details of dynamic instrumentation:
Dynamic kernel instrumentation (Kerninst):
Self-propelled instrumentation:

Barton Miller is Professor of Computer Sciences at the University of Wisconsin, Madison. He directs the Paradyn Parallel Performance Tool project, which is investigating performance and instrumentation technologies for parallel and distributed applications and systems. He also co-directs the WiSA security project. His research interests include tools for high-performance computing systems, binary code analysis and instrumentation, computer security, and scalable distributed systems.

Miller co-chaired the SC2003 Technical Papers program, was Program co-Chair of the 1998 ACM/SIGMETRICS Symposium on Parallel and Distributed Tools, and General Chair of the 1996 ACM/SIGMETRICS Symposium on Parallel and Distributed Tools. He also twice chaired the ACM/ONR Workshop on Parallel and Distributed Debugging. Miller was on the editorial board of IEEE Transactions on Parallel and Distributed Systems, and the Int’l Journal of Parallel Processing. He is currently on the Boards of Concurrency and Computation Practice and Experience, Computing Systems, and the Miller has chaired numerous workshops and has been on numerous conference program committees. He is also a member of the IEEE Technical Committee on Parallel Processing.

Miller is a member of the Los Alamos National Laboratory Computing, Communications and Networking Division Review Committee, IDA Center for Computing Sciences Program Review Committee, and has been on the U.S. Secret Service Electronic Crimes Task Force (Chicago Area), the Advisory Committee for Tuskegee University’s High Performance Computing Program, and the Advisory Board for the International Summer Institute on Parallel Computer Architectures, Languages, and Algorithms in Prague. Miller is an active participant in the European Union APART performance tools initiative.

Miller received his Ph.D. degree in Computer Science from the University of California, Berkeley in 1984. He is a Fellow of the ACM.


13 NOVEMBER 2006
Speaker: Hank Levy, Chairman and Wissner-Slivka Chair, Department of Computer Science and Engineering, University of Washington
Title: Reliability Challenges for Commodity Operating Systems
Host School: Duke
Duke Hosts: Jeff Chase (chase at, Landon Cox (lpcox at
UNC Host:
Kevin Jeffay (jeffay at
NCSU Hosts: Ed Gehringer (efg at, Douglas Reeves (reeves at, Peng Ning (pning at

This talk will describe two research efforts that explore internal and external challenges to operating system reliability.

First, I will talk about the major internal challenge to reliability, namely operating system extensions, i.e., code written by third parties that is loaded dynamically into the OS. Surprisingly, extensions such as device drivers now account for an enormous portion of privileged code. For example, drivers make up 70% of Linux kernel code, while there are over 35,000 drivers for Windows/XP! Unfortunately, these extensions cause the vast majority of system crashes. I will describe the “Nooks” approach to solving the extension/reliability problem. Nooks allows the system and its applications to continue running in the face of extension failures, while requiring little or no change to either the extensions or the OS.

Second, I will talk about a major external challenge to reliability, namely Web-borne threats, such as spyware. I’ll discuss the problem of spyware and an Internet study we’ve conducted to try to quantify the spyware problem. As well, I’ll briefly discuss some new techniques we’ve prototyped for protecting against spyware and other malicious code.

Henry M. Levy holds the Wissner-Slivka Chair in Computer Science and Engineering at the University of Washington. Hank’s research projects focus on computer operating systems, distributed and parallel computing, the world-wide web, and computer architecture.

He is author of two books and numerous papers on computer systems, including eight best paper selections from the top OS conferences (SOSP and OSDI). He is former chair of ACM SIGOPS (the Special Interest Group on Operating Systems), former program chair of the ACM Symposium on Operating Systems Principles (SOSP), the ACM Conference on Arhictectural Support for Programming Languages and Operating Systems (ASPLOS), and the IEEE Workshop on Hot Topics in Operating Systems (HOTOS). Before coming to Washington, Hank was a Consulting Engineer with Digital Equipment Corporation, where he worked on operating systems and architectures for distributed systems and workstations. He is a member of the Technical Advisory Boards of Isilon Systems,, Mercury, and Madrona Venture Group, and was a co-founder of Performant, Inc. (acquired by Mercury in 2003). Hank is a Fellow of the ACM (Association for Computing Machinery), a Fellow of the IEEE (Institute for Electrical and Electronics Engineers), and recipient of a Fulbright Research Scholar Award.

For more details, see


Speaker: David Karger, MIT
Title: Why everyone should be their own database administrator, UI designer, application developer, and web site builder, and how they can.
Host School: Duke
Duke Host: Kamesh Munagala (kamesh at
UNC Host: 
Leonard McMillan (mcmillan at
NCSU Hosts: Jon Doyle (Jon_Doyle at, Rudra Dutta (dutta at

Although computers are touted as tools that help us manage information, they often seem to hinder us instead. They fail to display, or even to record, some aspect of the information that we need. They clutter their presentations with distracting inessential aspects. Information is fragmented over multiple applications and sites, making it hard to record, visualize, or navigate important connections. The operation we want to apply to data locked inside one site or application is only available at a different one.

End users can fix many of these problems themselves, if they are given the right levers for reshaping information management tools and repositories to suit their needs. In this talk, I will survey my group’s explorations of three such levers: a simple, structured-but-sloppy, information model for holding whatever information a particular user considers important; a user-interface framework that can flex to present that arbitrary information; and tools that let end users ‘edit’ (rather than program) and share information views and workspaces that are appropriate for the data they want to record and the tasks they want to perform. Our approach offers a way to build personalized information management applications for users’ own data, and to create useful aggregations and visualizations of information dispersed over the standard and Semantic Web.


David R. Karger is a Professor of Electrical Engineering and Computer Science at MIT’s Computer Science and Artificial Intelligence Laboratory. His interests have ranged broadly, starting in theoretical computer science and combinatorial optimization, and moving on to information retrieval, machine learning, computer networking and systems (particularly peer-to-peer systems), and Human-Computer Interaction. His interest in efficient distributed systems led to research on distributed Web caching and to participation at the founding of Akamai technologies, while his interest in signal processing brought him to the technical advisory board of Vanu Inc.

A constant interest, however, has been personal information management, and particularly the question of how tools can be built that will actually let users do what they want with their information, instead of what their applications demand. He organized the Haystack group, and built a system by the same name, to explore this question.


29 JANUARY 2007
Speaker: Edmund Clarke, FORE Systems Professor of Computer Science and Professor of Electrical and Computer Engineering, Carnegie Mellon University
Title: Model Checking: From Hardware To Software And Back Again
Host School: UNC
Duke Host: Alan Biermann (awb at
UNC Host: 
Montek Singh (montek at
NCSU Host: Purush Iyer (purush at

The first major successes of model checking were in hardware verification. Symbolic model checking with BDDs and later SAT-based bounded model checking made it possible to handle hardware designs of realistic complexity. Combining counterexample-guided abstraction refinement and predicate abstraction with these symbolic techniques, led to a new generation of powerful software model checkers (SLAM, BLAST, MAGIC, CBMC, etc). Recently, we have used software model checking techniques to verify programs in hardware description languages like Verilog. Most model checkers for hardware verification currently compile the design into a gate-level description or netlist before verification. As a result, much of the structure of the design is lost and the state explosion problem is exacerbated. By analyzing the hardware description language using software model checking techniques, we are able to avoid the translation into netlist form and handle even larger designs.



Speaker: Daphne Koller, Associate Professor, Department of Computer Science, Stanford University
Title: Probabilistic Models for Structured Domains: From Cells to Bodies
Host School: Duke
Duke Host: Ron Parr (parr at
UNC Host: 

NCSU Host:

Many domains in the real world are richly structured, containing a diverse set of objects, related to each other in a variety of ways. For example, a living cell contains a rich network of interacting genes, that come together to perform key functions. A robot scan of a physical environment contains classes of objects such as people, vehicles, trees, or buildings, each of which might itself be a structured object. However, most applications of machine learning aim to simplify the problem by considering objects in the domain as independent instances from a single distribution. In this talk, I aim to show that one can gain from modeling both the dependencies arising from the relationships between objects, and the rich structure of the similarities and differences between them. The first part of the talk will describe a rich language, based on probabilistic graphical models, which allows us to model the rich network of dependencies between related objects; we show how to learn such models from data and how to use the learned model both for knowledge discovery and for reasoning about new instances. The second part of the talk focuses on methods for learning the similarities and differences between related yet diverse classes of objects (such as different types of animals), so as to allow information learned for one class to transfer to another. I will describe applications of this framework to two main tasks: modeling objects in the physical world, and recognizing them in laser range scans and in images; and inferring a network of regulatory interactions in a cell, and how this network is perturbed by individual genotype.

Daphne Koller received her BSc and MSc degrees from the Hebrew University of Jerusalem, Israel, and her PhD from Stanford University in 1993. After a two-year postdoc at Berkeley, she returned to Stanford, where she is now an Associate Professor in the Computer Science Department. Her main research interest is in creating large-scale systems that reason and act under uncertainty, using techniques from probability theory, decision theory and economics. Daphne Koller is the author of over 100 refereed publications, which have appeared in venues spanning Science, Nature Genetics, the Journal of Games and Economic Behavior, and a variety of conferences and journals in AI and Computer Science. She was the co-chair of the UAI 2001 conference, and has served on numerous program committees and as associate editor of the Journal of Artificial Intelligence Research and of the Machine Learning Journal. She was awarded the Arthur Samuel Thesis Award in 1994, the Sloan Foundation Faculty Fellowship in 1996, the ONR Young Investigator Award in 1998, the Presidential Early Career Award for Scientists and Engineers (PECASE) in 1999, the IJCAI Computers and Thought Award in 2001, the Cox Medal for excellence in fostering undergraduate research at Stanford in 2003, and the MacArthur Foundation Fellowship in 2004.


19 FEBRUARY 2007
Speaker: Dan Gusfield, Professor, Department of Computer Science, University of California – Davis
Title: Algorithms for estimating and reconstructing recombination in populations
Host School: NCSU
Duke Host: Herbert Edelsbrunner (edels at
UNC Host: 
Wei Wang (weiwang at
NCSU Host: Steffen Heber (sheber at

The work discussed in this talk falls into the emerging area of Population Genomics. I will first introduce that area and then talk about specific problems and results involved in the inference of recombination from population data.

A phylogenetic network (or Ancestral Recombination Graph) is a generalization of a tree, allowing structural properties that are not tree-like. With the growth of genomic and population data (coming for example from the HAPMAP project) much of which does not fit ideal tree models, and the increasing appreciation of the genomic role of such phenomena as recombination (crossing-over and gene-conversion), recurrent and back mutation, horizontal gene transfer, and mobile genetic elements, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks.

In this talk I will survey a range of our recent results on phylogenetic networks with recombination and show applications of these results to several issues in Population Genomics: Association Mapping; Finding Recombination Hotspots in genotype sequences; Imputing the values of missing haplotype data; Determining the extent of recombination in the evolution of LPL sequences; Distinguishing the role of crossing-over from gene-conversion in Arabidopsis; Characterizing some aspects of the haplotypes produced by the program PHASE; Studying the effect of using genotype data in place of haplotype data, etc.

Various parts of this work are joint work with Satish Eddhu, Chuck Langley, Dean Hickerson, Yun Song, Yufeng Wu, V. Bansal, V. Bafna and Zhihong Ding. Papers and associated software can be accesses at

Professor Gusfield’s background is in Combinatorial Optimization, and various applications of Combinatorial Optimization. He has worked extensively on problems of network flow, matroid optimization, statistical data security, stable marriage and matching, string algorithms and sequence analysis, phylogenetic tree inference, haplotype inference, and inference of phylogenetic networks with homoplasy and recombination. He received his Ph.D. in 1980 from UC Berkeley, working with Richard Karp, and was an Assistant Professor at Yale University from 1980 to 1986.

Professor Gusfield moved to UC Davis in July 1986. Since then, he has mostly addressed problems in Computational Biology and Bioinformatics. He first addressed questions about building evolutionary trees, and then problems in molecular sequence analysis. He presently focuses mostly on optimization problems related to population genetics and population-scale genomics. Two particular problems are haplotype inference and inferences about historical recombination. His main support for work on computational biology and bioinformatics came initially from the Department of Energy Human Genome Project through the Lawrence Berkeley Labs Human Genome Center, then directly from DOE, Human Genome Project, but since then, his work in computational biology has been funded by the NSF. His book, “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology” has helped to define the intersection of computer science and bioinformatics. It has been translated into Russian, and a South Asian edition has recently been published. Professor Gusfield serves on the editorial board of the Journal of Computational Biology, and is the founding Editor-in-Chief of The IEEE/ACM Transactions on Computational Biology and Bioinformatics. The journal was presented the “honorable mention” for Best New Journal in 2004 by the American Association of Publishers. Other notable service to the Computational Biology community consists of serving as Program Chair for the 2004 RECOMB conference.

At UCD, Professor Gusfield was chair of the Computer Science Department for four years, and wrote the bioinformatics section (one of three) of the Genomics/Bioinformatics initiative proposal that resulted in the creation of the UCD Genomics Center (which has hired 17 new faculty), and continues to serve on its internal Steering committee. He is currently co-chair of the UCD campus initiative on Computational Characterization and Exploitation of Biological Networks” (see, which will hire seven new faculty in this area over the next three years.


26 FEBRUARY 2007
Speaker: Harry Shum, Managing Director, Microsoft Research Asia and Distinguished Engineer, Microsoft Corporation
Title: Prior, Context and Interactive Computer Vision
Host School: UNC
Duke Host: Jun Yang (junyang at
UNC Hosts: 
Marc Pollefeys (marc at
NCSU Host:

For many years, computer vision researchers have worked hard chasing the illusive goals such as “can the robot find a boy in the scene” or “can your vision system automatically segment the cat from the background”. These tasks require a lot of prior knowledge and contextual information. How to incorporate prior knowledge and contextual information into vision systems, however, is very challenging. In this talk, we propose that many difficult vision tasks can only be solved with interactive vision systems, by combining powerful and real-time vision techniques with intuitive and clever user interfaces. I will show two interactive vision systems we developed recently, Lazy Snapping (Siggraph 2004) and Image Completion (Siggraph 2005), where Lazy Snapping cuts out an object with solid boundary using graph cut, while Image Completion recovers unknown region with belief propagation. A key element in designing such interactive systems is how we model the user’s intention using conditional probability (context) and likelihood associated with user interactions. Given how ill-posed most image understanding problems are, I am convinced that interactive computer vision is the paradigm we should focus today’s vision research on.

I will also give a quick overview of Microsoft Research Asia, “the world’s hottest computer lab” by MIT Technology Review (June 2004). I will review our activities in basic research, technology transfer, product incubation, university relations and internship program.

Dr. Shum is a Distinguished Engineer of Microsoft Corporation. He received his Ph.D. in robotics from the School of Computer Science, Carnegie Mellon University. After he graduated, he worked as a researcher at Microsoft Research Redmond. In 1999, he moved to Microsoft Research Asia (Beijing, China) where he is currently the Managing Director. His research interests include computer vision, graphics, human computer interaction, statistical learning and robotics. He is on the editorial boards for IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), and International Journal of Computer Vision (IJCV). He is the Program Co-Chair of Eleventh International Conference on Computer Vision (ICCV 2007 Brazil). He is a Fellow of IEEE and a Fellow of ACM.


26 MARCH 2007
Speaker: Jennifer Widom, Professor Departments of Computer Science and Electrical Engineering, Stanford University
Title: Trio: A System for Data, Uncertainty, and Lineage
Host School: Duke
Duke Host: Shivnath Babu (shivnath at and Jun Yang (junyang at
UNC Host: 
Jan Prins (prins at
NCSU Host:

Trio is a new type of database system that manages uncertainty and lineage of data as first-class concepts, along with the data itself. Uncertainty and lineage arise in a variety of data-intensive applications, including scientific and sensor data management, data cleaning and integration, and information extraction systems. This talk will survey our ongoing work in the Trio project: the new extended-relational “ULDB” model upon which the Trio system is based, Trio’s SQL-based query language (TriQL) including formal and operational semantics, and a selection of new theoretical challenges and results. We will explain and demonstrate Trio’s initial prototype implementation, and describe our planned research directions.

Trio web site:

Jennifer Widom is a Professor in the Computer Science and Electrical Engineering Departments at Stanford University. She received her Bachelors degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering, was a Guggenheim Fellow, and has served on a variety of program committees, advisory boards, and editorial boards.


9 APRIL 2007
Speaker: Tuomas Sandholm, Professor and Director, Agent-Mediated Electronic Marketplaces Lab, Carnegie Mellon University
Title: Algorithms for solving sequential imperfect information games, and application to poker
Host School: NCSU
Duke Host: Vincent Conitzer (conitzer at
UNC Host: Montek Singh (montek at
NCSU Host: Peter Wurman (wurman at

In many settings, most notably two-person zero-sum games, game theory provides particularly robust solution concepts. However, in most interesting AI applications, it has been prohibitively complex to find such solutions because the state space is extremely large. In this talk I will discuss our work on taming the complexity over the last three years, focusing on sequential incomplete-information games. First, I introduce automated abstraction algorithms. I will cover lossless (i.e., equilibrium-preserving) algorithms, as well as algorithms that are lossy, but which yield much smaller games that retain the most important features of the original game. Second, I present algorithms for finding approximate equilibria. They are based on recent theoretical advances in non-smooth convex optimization, and provide drastic improvements over prior (linear programming-based) algorithms for finding approximate equilibria. Combining these two streams, we enable the application of game theory to games that are many orders of magnitude larger than previously solvable. In particular, we find near-optimal solutions for a full four-round model of (Heads’Up Limit) Texas Hold’em poker, and demonstrate that the resulting player beats prior computer poker players.

This is joint work with Andrew Gilpin, and parts are also joint with Troels Bjerre Sørensen, Javier Pena, and Samid Hoda.

Tuomas Sandholm is Professor in the Computer Science Department at Carnegie Mellon University. He received the Ph.D. and M.S. degrees in computer science from the University of Massachusetts at Amherst in 1996 and 1994. He earned an M.S. (B.S. included) with distinction in Industrial Engineering and Management Science from the Helsinki University of Technology, Finland, in 1991. He has published over 250 technical papers on electronic commerce; game theory; artificial intelligence; multiagent systems; auctions and exchanges; automated negotiation and contracting; coalition formation; voting; safe exchange; normative models of bounded rationality; resource-bounded reasoning; machine learning; networks; and combinatorial optimization. He has 17 years of experience building optimization-based electronic marketplaces, and several of his systems have been commercially fielded. He is also Founder, Chairman, and Chief Scientist of CombineNet, Inc., which has fielded over 400 large-scale generalized combinatorial auctions, with over $25 billion in total spend and over $3.2 billion in generated savings. He received the National Science Foundation Career Award in 1997, the inaugural ACM Autonomous Agents Research Award in 2001, the Alfred P. Sloan Foundation Fellowship in 2003, and the IJCAI Computers and Thought Award in 2003.