Andrea Gesmundo
Research Areas
Authored Publications
Sort By
Flexible Multi-task Networks by Learning Parameter Allocation
Krzysztof Maziarz
Jesse Berent
ICLR 2021 Workshop on Neural Architecture Search (2021)
Preview abstract
Multi-task neural networks, when trained successfully, can learn to leverage related concepts from different tasks by using weight sharing. Sharing parameters between highly unrelated tasks can hurt both of them, so a strong multi-task model should be able to control the amount of weight sharing between pairs of tasks, and flexibly adapt it to their relatedness. In recent works, routing networks have shown strong performance in a variety of settings, including multi-task learning. However, optimization difficulties often prevent routing models from unlocking their full potential. In this work, we propose a novel routing method, specifically designed for multi-task learning, where routing is optimized jointly with the model parameters by standard backpropagation. We show that it can discover related pairs of tasks, and improve accuracy over strong baselines. In particular, on multi-task learning for the Omniglot dataset our method reduces the state-of-the-art error rate by $17\%$.
View details
Routing Networks with Co-training for Continual Learning
Mark Patrick Collier
Jesse Berent
ICML 2020 Workshop on Continual Learning (to appear)
Preview abstract
Many continual learning methods can be characterized as either altering the learning algorithm in a fixed capacity neural network or dynamically growing the capacity of the network to handle new tasks. We propose to use fixed capacity sparse routing networks for continual learning. We retain the advantages of architectural solutions to the continual learning problem, in that different paths through the network can be learned for different tasks. However, we stay within the regime of fixed capacity networks which are more realistic for real-world use cases. We find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. In initial experiments, when combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.
View details
Temporal coding in spiking neural networks with alpha synaptic function
Krzysztof Potempa
Luca Versari
Thomas Fischbacher
arXiv:1907.13223 (2019)
Preview abstract
The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic dimension of temporal coding present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. An image can be encoded in this manner by an input layer where each neuron spikes at a time proportional to the brightness of an individual pixel. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. By encoding information in time in this manner, we are able to train the network to perform supervised learning with backpropagation, using exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add more flexibility during the training process and allow the exploitation of the decay part of the alpha function. We show that such spiking networks can be trained successfully on noisy temporal Boolean logic problems. Moreover, they perform better than comparable spiking models on the MNIST benchmark when encoded in time. During training, we find that the network spontaneously discovers two operating regimes: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient, state-based and more complex biologically-inspired neural architectures.
View details
Parameter Efficient Transfer Learning for NLP
Neil Houlsby
Andrei Giurgiu
Stanisław Kamil Jastrzębski
Bruna Halila Morrone
Mona Attariyan
Sylvain Gelly
ICML (2019)
Preview abstract
Fine-tuning large pretrained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.8% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task.
View details
Fast Task-Aware Architecture Inference
Anja Hauth
Jesse Berent
https://arxiv.org/abs/1902.05781 (2019)
Preview abstract
Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for efficient architecture search by sharing information across several tasks. We start by training many model architectures on several related (training) tasks. When a new unseen task is presented, the framework performs architecture inference in order to quickly identify a good candidate architecture, before any model is trained on the new task. At the core of our framework lies a deep value network that can predict the performance of input architectures on a task by utilizing task meta-features and the previous model training experiments performed on related tasks. We adopt a continuous parametrization of the model architecture which allows for efficient gradient-based optimization. Given a new task, an effective architecture is quickly identified by maximizing the estimated performance with respect to the model architecture parameters with simple gradient ascent. It is key to point out that our goal is to achieve reasonable performance at the lowest cost. We provide experimental results showing the effectiveness of the framework despite its high computational efficiency.
View details
Ranking architectures using meta-learning
Alina Dubatovka
Jesse Berent
NeurIPS Workshop on Meta-Learning (MetaLearn 2019) (to appear)
Preview abstract
Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor.
View details
Preview abstract
We reduce the computational cost of Neural AutoML with transfer learning. AutoML
relieves human effort by automating the design of ML algorithms. Neural
AutoML has become popular for the design of deep learning architectures, however,
this method has a high computation cost.To address this we propose Transfer
Neural AutoML that uses knowledge from prior tasks to speed up network design.
We extend RL-based architecture search methods to support parallel training on
multiple tasks and then transfer the search strategy to new tasks. On language and
image classification data, Transfer Neural AutoML reduces convergence time over
single-task training by over an order of magnitude on many tasks.
View details
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
Neil Houlsby
Wei Wang
Sixth International Conference on Learning Representations (2018)
Preview abstract
We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient. We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a state-of-the-art base model, playing the role of the environment, and other benchmarks. We also analyze the language that the agent has learned while interacting with the question answering system. We find that successful question reformulations look quite different from natural language paraphrases. The agent is able to discover non-trivial reformulation strategies that resemble classic information retrieval techniques such as term re-weighting (tf-idf) and stemming.
View details
Analyzing Language Learned by an Active Question Answering Agent
Neil Houlsby
Wei Wang
Emergent Communication Workshop @ NIPS (2017)
Preview abstract
We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of machine-machine communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming.
View details
Projecting the Knowledge Graph to Syntactic Parsing
Preview
Keith Hall
EACL 2014: 15th Conference of the European Chapter of the Association for Computational Linguistics