7th Undergraduate Research Symposium (2013)
Held 26 April 2013 in 011 Sitterson Hall
Open Source Design Versus Proprietary Solutions in an Ad Hoc System: Case Study
Zach Cross, supervised by Ketan Mayer-Patel
Each year, UNC’s School of Government produces a dashboard on utilities in several states including North Carolina. Specifically, the Environmental Finance Center collects data from hundreds of utility agencies and processes it to produce this dashboard visualization. Their team does not have dedicated technical staff. They rely on proprietary business-to-business software called Xcelsius to produced a Flash-based dashboard from Excel spreadsheet data. The process of preparing their data for visualization is full of overhead, arguably prohibiting their scalability. In this project, I analyzed their existing system in order to reproduce it using more modern and maintainable technologies with preference for open source. The result was the recreation of the entire pipeline except for one step of data normalization which proved to be prohibitively complex for the scope of this study. Nonetheless, I was able to recreate the visualization as a proof of concept, in addition to significant portions of back-end web application logic. The overarching goal was to demonstrate the viability of open source web development techniques with these classes of application and organization.
Zach Cross is a senior Computer Science major from Pamlico County, North Carolina. Zach interned last summer as a software engineer at SpanishDict, a Spanish translation and language learning startup in Washington, D.C.. After interning in IBM’s Extreme Blue program this upcoming summer, he will return to the department as part of the BS/MS program. Zach enjoys going to movies, most sports involving water, and biking when he can.
NoSQL Databases and their Applications
Ian Kim, supervised by Diane Pozefsky
We present a case study of the application of a NoSQL database to an existing web application, ChemoText. ChemoText is a platform for discovering connections between chemicals, proteins, and diseases. Drawing from the public health database PubMed, it uses annotations in PubMed articles to link chemicals, proteins, and diseases together. A prior implementation of ChemoText used a PHP and MySQL backend that had unacceptably slow performance. We demonstrate a new system using a NoSQL database, specifically Neo4j, that displays response times orders of magnitude faster than the original implementation.
Ian Kim is a senior Computer Science and Linguistics double major from Apex, North Carolina. He is currently enrolled in the Computer Science BS/MS program and will return next year for his Master’s in computer science. His academic interests include cognitive linguistics, natural language processing, and machine learning. In his spare time, he enjoys board games, karaoke, and fine writing instruments.
Off-Line Scheduling of Mixed Criticality Job Sets
Alexandra French, supervised by Sanjoy Baruah
We present an algorithm to schedule job sets of mixed criticality in a hard real-time processing environment. First, we formalize the concept of criticality-correctness. Then, we derive a scheduling algorithm which will produce a deadline-correct and criticality-correct schedule for a given job set, provided that some correct schedule exists. Finally, we prove the correctness of our algorithm and show that it executes in efficient polynomial time.
Alexandra French is a senior from Pilot Mountain, North Carolina, majoring in Computer Science and Mathematics. She spends her free time drawing and knitting. After graduation, Alex will begin working at Research Triangle Park as a software engineer for IBM.
Improved Parallelization in Job Management for Chembench
Samuel Gass, supervised by Diane Pozefsky
Parallelization is an increasingly important method of increasing the performance of modern software. However, parallelizing code is not an easy task; it requires a careful algorithmic examination to subdivide code into distinct, atomically executable sections. Here, we present an analysis of job execution on the chemistry modeling server Chembench and the steps taken to improve its performance via increased parallelization, specifically focusing on the job queueing code.
Samuel Gass is a senior Mathematics and Computer Science double major from Durham, North Carolina. After graduating this May, he’ll be returning to the UNC Computer Science department for the Master’s program. In his spare time, haha who are we kidding, he doesn’t have spare time anymore.
Multithreading and Parallelization for Improved Performance with Chembench QSAR Model Generation
Nicholas Bartlett, supervised by Diane Pozefsky
QSAR model generation in the field of Chemistry is an extensive operation, using large amounts of data as input. Chembench is a web service which allows for this type of model generation, but since these calculations can be quite complex the performance may be slow. We examine parallelization and the use of multi-threading in multiple contexts, and analyze whether it could increase the performance of Chembench’s QSAR model generation.
Nicholas Bartlett is a senior Computer Science and Mathematics double major from Kings Mountain, North Carolina. In his spare time, Nick enjoys basketball and racquetball. Nick is currently an intern at IBM in RTP, where he will be working as he begins the Master’s program this fall.
Implementation of an Efficient Model of Large-scale Neural Networks using GPUs
Moshin Ali, supervised by Jan Prins
Many studies have shown that much of brain processing relies heavily on the behavioral dynamics unique to the large-scale networks of neurons found throughout the nervous system, but study of these dynamics using conventional biological techniques is limited because most of these techniques are only able to observe only a few neurons at once. The neural networks can instead be studied through modeling. However, these models are highly computationally intensive and as a result have been limited in the size of the networks they simulate. Therefore, in order to be feasible tools to study large-scale neural networks, the performance of these models need to improve. To address this issue, others have implemented these models using specialized parallel architectures, however the accessibility of the required hardware is limited. Here, we took a model of a large-scale network of 200,000 spiking-neurons, which was originally implemented to run on a conventional processor, and ported it to run on off-the-shelf Graphics Processing Units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA) API. GPUs offer the advantage of being widely accessible, while still providing a powerful parallel architecture, but its concurrency introduces nondeterminacy to the results of the simulation. Nevertheless, we were able to develop a deterministic implementation of the simulator on a GPU while still taking advantage of its parallel architecture and as a result increased the runtime significantly. In addition to allowing for an increase in the size of the network simulated, this faster model also allows for other possible applications of the model such as connecting to and receiving input from a live cortical slice.
Mohsin Ali is a senior from Harrisburg, North Carolina majoring in Computer Science and Biology with a minor in Chemistry. For the past year, he has worked under the mentorship of both Dr. Flavio Frohlich from the Psychiatry department and Dr. Jan Prins from the Computer Science department on a collaborative research project in Computational Neuroscience. Much to his parent’s disappointment he still hasn’t decided exactly what he’ll be doing after graduation (don’t worry mom and dad, I’ll move out eventually).
Nutrition Nation: Monitoring Our Food Environment via Crowd Sourcing
Dennis Given, supervised by Hye-Chung Kum
Monitoring and measuring the foods and nutrients bought and consumed in the United States have a significant impact on the nation’s health. However, existing efforts to collect such data have severe limitations in keeping up with dynamic changes in the food environment as well as issues with the quality of the data. In this project, we explore a crowd sourcing system of collecting updated ongoing nutritional information in an effective and efficient manner using a smart phone. The system relies on a simple application that anyone can use to take pictures of the Nutrient Fact Panel (NFP) which is then transmitted to a centralized server. This presentation explores how the Optical Character Recognition technology – a software technology that extracts text from an image – is utilized on the server to convert the NFP images to a rich database on the US food environment. We implement a wrapper, both pre-processing and post-processing methods, around Google’s Tesseract OCR engine to improve the translation. We ran extensive experiments on NFP images on matte surface boxes to find the optimum configuration. The best configuration achieved 80%-90% accuracy.
Dennis Given is a junior majoring in Computer Science with a minor in Japanese from Raleigh, North Carolina. Dennis will be interning with a bank in New York City this summer, working in the Application Architecture & Development department.
Mixed-Initiative Friend-list Creation
Isabella (Haoyang) Huang, supervised by Prasun Dewan
Our research focuses on reducing users’ efforts in creating Facebook friend-lists. This is important in the field of social networks since the friend-lists reflect how users view and perform group-specific information sharing. We have developed a friend-list creation system that incorporates computer recommendations as well as user input. The computer recommendations are generated based on user’s Facebook information (friendship, events, posts, etc.) collected through the Facebook Graph API. We also develop a user interface (a Facebook app) for users to edit their recommended lists.
Isabella (Haoyang) Huang is a senior double majoring in Computer Science and Business Administration from Beijing, China. She has worked under the mentorship of Professor Prasun Dewan for two years. Isabella has interned with Microsoft Unified Communications Team (Lync) for the past two summers, where she worked with both the client side of Lync to developed a social add-on to Lync 2010 and on the server side of Lync and shipped her code as part of Office 2013. After graduation, she will return to Microsoft as a full-time Software Development Engineer.
A More Secure and Transparent Solution to Password Authentication
Winston Howes, supervised by David Stotts
We describe a browser extension, GoPhish, that transparently and unobtrusively generates a unique password for each website a user may visit. This improves web password security and defends against several attacks, specifically phishing scams and common password attacks. Because the browser extension applies a cryptographic hash of each plaintext password, a stolen hashed password will provide no information as to what the user’s password is. Additionally we have implemented a mechanism that attempts to login to websites in the background and thus inform the user of whether or not they are on a phishing website. We have implemented this scheme with two goals in mind: transparency for the user and no necessary server changes. We currently have a Firefox implementation of our browser extension, although we plan to expand GoPhish to other modern browsers and devices. We describe the challenges associated with implementing GoPhish and our resolutions with the goal of being both secure and transparent.
Winston Howes is a sophomore Computer Science major at UNC. He developed his anti-phishing research project under Professor Stotts, and has received several awards for the technology.
Offline TarHeel Reader Using HTML5 AppCache
Ameem Shaik, supervised by Gary Bishop
The HTML5 Application Cache (AppCache) interface enables web applications to run offline. The interface allows users to specify resources that will be available offline in a cache manifest file. From this manifest file, the browser will load and cache the specified resources. Using the HTML5 AppCache, I implemented offline functionality for the online library www.tarheelreader.org. With this new functionality, users can now select books, and be able to read them offline.
Ameem Shaik is a senior Computer Science major. His interests lie mainly in web development, and networks. After graduation he plans to complete his Master’s in computer science at UNC.
A Comparison of Natural Language Processing Methods for the Purpose of Authorship Attribution
Keethan Kleiner, supervised by Stanley Ahalt
In order to determine the author of a text, the ‘test’ file, it is compared to a collection of texts of known authorship, the ‘train’ files. Each train file is a collection of texts from an author. The closest match from these comparisons indicates the author of the test file. The simplest method is a comparison of word frequencies between the texts. Another method is compression. Each train file is compressed with the compressed file’s size recorded. The test file is then added to each of the train files. These new files are compressed, with their sizes compared to the sized of the train files’ compressed sizes. The closest match determines the author of the test file. Several compression algorithms can be tested to see which gives the best results. These methods were used on emails and forum posts.
Keethan Kleiner is a senior Computer Science major with a double minor in Physics & Astronomy and Mathematics. He has worked with Dr. Stanley Ahalt for the past two years. In his spare time, he enjoys playing video and board games with friends and snowboarding. He will return to UNC next year as a Master’s degree student.
Optimal Control of Focus in a 12-way Parallel Microscope System
Patrick Heenan, supervised by Russ Taylor
Panoptes is a 12-way parallel microscope being developed by the Center for Integrated Mi- croscopy and Manipulation (CISMM) at Chapel Hill. Currently, the drivers of the lenses, which operate at high voltage, interfere with each other. The voltages driving adjacent channels con- tribute significantly to their neighbors voltage amplitudes; nearest neighbors were found to contribute up to 25% of their signal to each other. It is desired that these lenses be indepen- dent for optimum imaging. In order to prevent crosstalk between the drivers of the lenses, this project produced an algorithm which phase locks the sinusoidal drivers, maintains a consistent frequency between the drivers, and implements a PI feedback loop to maintain the desired con- trol voltage and hence focus. Using micro controllers with pulse width modulators, interrupts, and analog-to-digital converters, the desired root mean squared (RMS) voltage in the range 0-60 Volts was maintained within an error of 30 mV, RMS. The overhead of a single iteration of the interrupt-driven PI loop is approximately 41 microseconds, with a settling time of approximately 10 iterations.
Patrick Heenan is a senior double majoring in Physics & Astronomy, and Computer Science. Patrick plans on attending graduate school in Physics at CU Boulder. His hobbies include baking bread, making instruments, and hiking.