HyperMixer: An MLP-based Low Cost Alternative to Transformers

Mai, Florian; Pannatier, Arnaud; Fehr, Fabio; Chen, Haolin; Marelli, Francois; Fleuret, Francois; Henderson, James

Mai, Florian; Pannatier, Arnaud; Fehr, Fabio; Chen, Haolin; Marelli, Francois; Fleuret, Francois; Henderson, James

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

Détails

Titre HyperMixer: An MLP-based Low Cost Alternative to Transformers

Auteur(s) Mai, Florian ; Pannatier, Arnaud ; Fehr, Fabio ; Chen, Haolin ; Marelli, Francois ; Fleuret, Francois ; Henderson, James

Publié dans Proceedings Of The 61St Annual Meeting Of The Association For Computational Linguistics (Acl 2023): Long Papers, Vol 1

Editeur(s)

Rogers, A ; Boyd-Graber, J ; Okazaki, N

Pages 15632-15654

Présenté à 61st Annual Meeting of the the Association-for-Computational-Linguistics (ACL), JUL 09-14, 2023, Toronto, CANADA

Date 2023-01-01

Editeur Stroudsburg, Assoc Computational Linguistics-Acl

ISBN 978-1-959429-72-2

Autres identifiant(s) Afficher la publication dans Web of Science

Laboratoires LIDIAP

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LIDIAP - Laboratoire de l'IDIAP
Production scientifique et compétences > Euler Center for Signal Processing
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL
Publié

Grant Swiss National Science Foundation under the project LAOS: 200021_178862
Swiss Innovation Agency Innosuisse: 32432.1 IP-ICT
Swiss National Centre of Competence in Research (NCCR): 51NF40_180888
Swiss National Science Foundation under the project NAST: 185010
Swiss National Science Foundation under the project COMPBIO: 179217

Date de création de la notice 2024-05-01