Files

Abstract

Various forms of real-world data, such as social, financial, and biological networks, can be represented using graphs. An efficient method of analysing this type of data is to extract subgraph patterns, such as cliques, cycles, and motifs, from graphs. For instance, finding cycles in financial graphs enables the detection of financial crimes such as money laundering and circular stock trading. In addition, extracting cliques from social network graphs enables the detection of communities and could help predict the spread of epidemics. However, extracting such patterns can be time-consuming, especially in larger graphs, because the number of patterns can grow exponentially with the graph size. Therefore, fast and scalable parallel algorithms are required to make the enumeration of these subgraph patterns tractable for real-world graphs. This thesis presents fast parallel algorithms for the enumeration of maximal cliques and simple cycles. We focus on accelerating the asymptotically-optimal sequential algorithms for enumerating the aforementioned patterns by parallelising them on manycore CPUs. To enable scalable parallelisation of clique and cycle enumeration algorithms, the algorithms presented in this thesis rely on fine-grained parallelisation, in which recursive calls are executed independently of each other using several software threads. However, simply applying the fine-grained parallelisation method to the aforementioned asymptotically-optimal algorithms leads to suboptimal solutions. Parallelising maximal clique enumeration using this method results in increased overhead caused by multithreaded memory management and task scheduling, as well as increased dynamic memory usage. In addition, the asymptotically-optimal algorithms for simple cycle enumeration rely on strict depth-first traversal of their recursion tree, making the fine-grained parallelisation of these algorithms challenging. This thesis addresses these problems and presents parallel algorithms that lead to an almost linear scaling of performance with the number of CPU cores utilised. As a result, the parallel algorithms presented in this thesis are able to achieve an order of magnitude speedup compared to the state-of-the-art parallel algorithms when executed on manycore CPU systems. To demonstrate the applicability of our accelerated algorithms, this thesis presents the Graph Feature Preprocessor library, which can be used to detect financial crime. This library expands the feature set of financial transactions by enumerating well-known money laundering and financial fraud subgraph patterns, such as simple cycles, in financial transaction graphs. When used in combination with gradient-boosting-based machine learning models, the expanded feature set produced by the library enables significant improvements in prediction accuracy for the problems of money laundering and phishing detection. Furthermore, the efficiency of the subgraph pattern mining algorithms presented in this thesis enables this library to operate in real time. As financial fraud schemes become more complex, fast algorithms that can detect suspicious behaviour are required. This thesis demonstrates that the parallel graph pattern mining algorithms introduced here can be used to enable fast and accurate detection of such suspicious behaviour.

Details

PDF