Using Cloud Functions as Accelerator for Elastic Data Analytics

Bian, Haoqiong; Sha, Tiannan; Ailamaki, Anastasia

doi:10.1145/3589306

Bian, Haoqiong; Sha, Tiannan; Ailamaki, Anastasia

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Cloud function (CF) services, such as AWS Lambda, have been applied as the new computing infrastructure in implementing analytical query engines. For bursty and sparse workloads, CF-based query engine is more elastic than the traditional query engines running in servers, i.e., virtual machines (VMs), and might provide a higher performance/price ratio. However, it is still controversial whether CF services are good suites for general analytical workloads, in respect of the limitations of CFs in storage, network, and lifetime, as well as the much higher resource unit prices than VMs. In this paper, we first present micro-benchmark evaluations of the features of CF and VM. We reveal that for query processing, though CF is more elastic than VM, it is less scalable and is more expensive for continuous workloads. Then, to get the best of both worlds, we propose Pixels-Turbo - a hybrid query engine that processes queries in a scalable VM cluster by default and invokes CFs to accelerate the processing of unpredictable workload spikes. In the query engine, we propose several optimizations to improve the performance and scalability of the CF-based operators and a cost-based optimizer to select the appropriate algorithm and parallelism for the physical query plan. Evaluations on TPC-H and real-world workload show that our query engine has a 1-2 orders of magnitude higher performance/price ratio than state-of-the-art serverless query engines for sustained workloads while not compromising the elasticity for workload spikes.

Details

Title Using Cloud Functions as Accelerator for Elastic Data Analytics

Author(s) Bian, Haoqiong ; Sha, Tiannan ; Ailamaki, Anastasia

Published in SIGMOD '23: Companion of the 2023 International Conference on Management of Data

Pagination 27

Conference ACM SIGMOD/PODS International Conference on Management of Data, Seattle, Washington USA, June 18-23, 2023

Date 2023-06-20

Publisher New York, ACM

ISBN 978-1-4503-9507-6

Keywords

OLAP; QaaS; serverless; query processing; query optimization; cloud databases; data lake; data warehouse; FaaS; cloud function; column store; cloud storage; elasticity; cost efficiency

DOI https://doi.org/10.1145/3589306

Laboratories DIAS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > DIAS - Data-Intensive Applications and Systems Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL

Record creation date 2023-07-12

Files

Abstract

Details

PDF