E-Scan: Consuming Contextual Data with Model Plugins

Sanca, Viktor; Ailamaki, Anastasia

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

Extracting value and insights from increasingly heterogeneous data sources involves multiple systems combining and consuming the data. With multi-modal and context-rich data such as strings, text, videos, or images, the problem of standardizing the data model and format for interchangeable use is further exacerbated by a non-uniform way of processing, extracting, and preserving content and context from the data. This makes the data movement, reuse, and exchange between different systems a non-composable, manual process. On the other hand, increasingly powerful and popular machine learning-driven data representation models map the input data into uniform high-dimensional vector embeddings for further processing, informed by particular models. However, using models is expensive, and the manual integration effort might exacerbate unnecessary costs. Thus, we propose E-Scan, a contextual data exchange plugin for using, exchanging, and caching context-rich data. We outline the need for a common interface that separates the concerns and allows smooth and cost-effective data exchange. First, while vector embeddings are context-less, the model information is saved to preserve the context and preprocessing steps. Next, a lightweight vector engine caches and stores the uniform intermediate data representation in a lazy way to lower the transformation and data access, exchange, and retrieval cost. Finally, a pull-based interface allows uniform data consumption between components under a common plugin interface. This way, various context-rich data types are stored, processed, and exchanged in a standardized way while allowing plugin-based customization for subsequent context interpretation.

Détails

Titre E-Scan: Consuming Contextual Data with Model Plugins

Auteur(s) Sanca, Viktor ; Ailamaki, Anastasia

Publié dans Joint Workshops at 49th International Conference on Very Large Data Bases (VLDBW’23)

Pagination 8

Présenté à Second International Workshop on Composable Data Management Systems (CDMS’23), Vancouver, Canada, August 28 - September 1, 2023

Date 2023-08-01

Mots-clés (libres)

Vector Data Management; Embeddings; ML for DB; Data Movement; Caching; Context-Rich Data; Composability

Laboratoires DIAS

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > DIAS - Laboratoire de systèmes et applications de traitement de données massives
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL

Date de création de la notice 2023-08-14

Files

Résumé

Détails

PDF