Erik Saule's Homepage

Available Projects

I am interested in advising students on various projects. They can benefit you in multiple ways. For instance, you can do one (or more) for academic credit as part of a Bachelor or Master Thesis, a senior project, or as independent studies. Contact me for details.

Fine Visualization of Modern Hardware Platform

With the advent of modern processors, it becomes difficult to know which aspect of the computing system is under stress. Questions like Is the memory bandwidth saturated? Is the instruction pipeline saturated?'' are difficult to answer by just looking at code. Modern Hardware platform are no longer the black boxes of the 90s. Nowadays, all computing system have some reporting feature that allow to have some idea of what is happening in the system. In particular modern processors have hardware counters that can be used to infer which aspect of the processor is being used. The central question of the project is: can we use these hardware counters to obtain a graphical depiction of the system and of the load of the components?

Ideal skillset:C or C++ programming and system programming. Good understanding of Hardware system. Simple Graphical User Interface.

Designing Engaging Assignments for Early CS Students

Assignments and projects in early CS education tend not to particularly engaging. For instance a typical data structure assignment might read something like "write codes to add and remove integers from a binary search tree. Push 1000 integers in that tree." Can we come up with more interesting assignment?

There are strategies to make assignments more interesting. Can we frame the problem as a game (e.g., using trees to represent the state of a game and increase FPS)? Can we use the main point of the assignment to solve a real-world problem (e.g., using binary search trees to organize a contact list)? Can we use the main point of the assignment to solve a societal problem (e.g., can help fight cancer with binary search trees)? Can we use real data (e.g., using the trees to quickly identify earthquakes in a region)?

The project is meant to be exploratory. We will try to identify interesting assignment that could be designed. And we will fully design a few of them.

Ideal skillset:Capacity to programing C++ or Java.

Designing Engaging Assignments for Parallel Computing

Assignments and projects in CS education tend not to particularly engaging. For instance a typical data structure assignment might read something like "write codes to add and remove integers from a binary search tree. Push 1000 integers in that tree." In particular, the class ITCS 3145 and ITCS 5145 tend to fall in that trap. Can we come up with more interesting assignment?

Here the challenge is to find a replacement for classic parallel computing assignment that are more interesting but do not detract much from the class itself. Some ideas: Build a real heat diffusion model to model a room of woodward hall that can be computed in parallel; Investigating whatever tiny titan is doing; Looking at variant of numerical integration, maybe something monte carlo like poker analysis; large scale textual analysis.

The project is meant to be exploratory. We will try to identify interesting assignment that could be designed. And we will fully design a few of them.

Ideal skillset:Good at C++. Having a firm grasp of parallel computing and performance problem. Having taken ITCS 3145, ITCS 4182, ITCS 5182, or ITCS 5145 is prefered

Visualizing Academic Search Result

In the past we have developped a webservice to perform academic litterature search. We are now looking at smarter ways to visualize the results. While we have codes to perform some of the analysis, the result of the analysis can not be visualized dynamically by a user but rather by a cumbersome process.

The purpose of the project is to bring to the user of the webservice various analysis that have been developped in a useful and intuitive way.

Ideal skillset:Understanding and being able to produce standard web technology and visualization. User side: Javascript, D3, jQuery. Server side: PHP, RESTful APIs. Back-end: C++.

Compression of Streaming Centrality Analyses

More and more of the graph data that we see nowadays are streaming data. With time passing, vertices and edges and added and removed from graphs. Think of the friendship graph of Facebook, new users come everyday, new friendships are recorded every day. Developping new smart and fast analytics on these streaming graph is important.

In this particular project, we are interested in considering how well one can compress the result of streaming graph analytics. With new connections being made, the importance of all the node may not change. By looking at one particular analytics, closeness centrality, we will investigate how to compress closeness centrality efficiently

Ideal skillset:Knowledge of basic compression techniques. Understanding of how graph are processed.

High Performance Shortest Path Algorithms

Given a graph, computing the shortest path from one vertex to the rest of the graph is a common problem in analysis that span from routing cars to identifying important actor in a social network. With the size of the graphs in this Big Data era, it is necessary to have fast implementation of these algorithms. When the edges of the graph are unweighted, the BFS algorithm can be used and has been extensively investigated on parallel high performing platforms. But when edges are weighted, the problem of computing shortest paths on a parallel system is challenging since in the worst case, the edges must be processed sequentially.

This project is about analyzing the literature for modern approaches (such as Delta stepping) and developing novel techniques for computing shortest path on weighted graphs in parallel.

Ideal skillset: Solid C or C++ programming. Some Parallel Computing experience. Solid algorithm background.

Deploying Fast Centrality Algorithms in Gephi

Finding the center of a network is a key problem in social networks, biological networks and even traffic networks. This discovery is often powered by metrics called Centrality. Recently techniques to compute centrality metrics quickly were developped. To broaden the impact of these techniques methods, we would like to integrate the methods into a well known Graph analysis software, such as Gephi.

Ideal skillset: Solid C or C++ programming. Some JAVA programming experience.

Hardware Accelerated Detection of Clusters of Social Media Events

Social media brought us the ability to quickly discuss various topics and events. A fundamental question is to understand where are people discussing topics and when they are talking about; fortunately social media give us location and time at which topics are discussed. These can be represented by 3D points (latitude, longitude, time). We are interested in building space-time kernel density map of these points in real-time. It is a voxelized 3D map that essentially indicate the number of nearby points.

The computation of these 3D map can be quite slow when considering large number of points. In this project, we will consider CPU and GPU hardware acceleration to help in building these space-time kernel density map as fast as possible.

Ideal skillset:C or C++ programming. CUDA, OpenCL, or OpenMP programming.

Finding the Center of a Network and Vectorization

Finding the center of a network is a key problem in social networks, biological networks and even traffic networks. This discovery is often powered by a metric called Betweenness Centrality. Unfortunately, computing the Betweenness Centrality of all nodes in a networks is an expensive operation. Recently, new techniques to compute betweenness centrality have developed on GPUs. They rely on computing multiple graph traversal simultaneously to reduce the pressure on the memory bandwidth and to leverage coalescing. This project is about porting this technique onto CPU architectures and study its efficacy.

Ideal skillset: Solid C or C++ programming. Some Parallel Computing experience. Solid algorithm background.

temporal graph analyssi

More and more of the data that get collected and analyzed is organized as a graph. Entities are represented as vertices and related to each other. (Think for instance of the facebook friendship that can be represented as a graph.) The relation between entities are also typically located in time, and can come and go. (There is a time when two people became friend on Facebook, and a time where they were no longer friends.) The graphs that represent these information are called temporal graphs. These rich data can be used to get a better understanding of social dynamics, (for instance by detecting when someone moved may triger a wave of new friendships.)

The purpose of this project is to understand the scope of analysis that can be done on temporal graph. We have some dataset for temporal graph, for instance around academic network and hollywood movies. We will identify datasets for temporal graphs. We will also perform standard analysis on snapshots of the graph at different points in time (such as page rank, centrality, clustering) and investigate how we can recognize important event out of these standard analysis.

Ideal skillset: Programming skills (python, java, or c++).