Large-Scale Graph Computing: From "Think Like a Vertex" to "Think Like a Task"

May 9, 2023
1:00 PM – 2:00 PM
Building 241
Conference Room D172
Also Online

Speaker: Da Yan, University of Alabama at Birmingham (UAB)

CS Seminar

Abstract: Big graphs are ubiquitously used for data modeling in modern applications, such as online social networks, knowledge graphs and biological networks. Unlike relational databases where queries can be formulated in a uniform manner with relational algebra, operations on graphs are highly diversified. For example, PageRank and SimRank that are based on random walks, shortest distance and reachability that are based on path calculation, subgraph matching and dense subgraph mining that are based on pattern matching, frequent subgraph pattern mining that are based on Apriori-based data mining, graph neural networks and graph transformers that are based on backpropagation. Currently, the mainstream frameworks for big graph processing are vertex-centric (i.e., following a think-like-a-vertex computing model). While vertex-centric systems are easy to program, they are dominantly designed for IO-bound execution and are only suitable for graph problems with a low time complexity. However, a lot of graph mining problems such as subgraph matching and finding dense/frequent subgraph structures usually have a very high time complexity, and when IO-bound systems are applied, the performance is a catastrophe with CPU cores severely underutilized.

To address the above problem, I proposed a new parallel programming framework called T-thinker, or think like a task. T-thinker is able to effectively utilize the available CPU resources in a computing cluster to achieve near ideal speedup, and a number of graph processing systems have been developed on top such as (1) G-thinker for finding qualified subgraphs from a big data graph with applications such as dense subgraph mining and subgraph matching; (2) PrefixFPM for mining general frequent patterns from a transaction database, such as subsequences, subtrees, subgraphs and submatrices; (3) T-FSM for mining frequent patterns from a big data graph. We have also developed and are currently developing many other T-thinker based systems beyond the domain of graph processing, towards broader applications including spatial data processing and general data mining. Finally, I will briefly summarize my other research directions such as Data Science and Deep Learning.

Bio: Da Yan is currently an Assistant Professor of the Department of Computer Sciences at the University of Alabama at Birmingham (UAB).

Argonne National Laboratory

Large-Scale Graph Computing: From “Think Like a Vertex” to “Think Like a Task”

Argonne National Laboratory