With my recent excitement for abstract neural machines (see e.g. the related NAMPI workshop), I skimmed the long list of submission of the ICRL conference for interesting papers.
(I might have missed one or the other people [please let me know], but this is what I found.)
I will post a follow-up post as I worked my way through all the papers.
Accepted Papers
Making Neural Programming Architectures Generalize via Recursion
Abstract: Empirically, neural networks that attempt to learn programs from data have exhibited poor generalizability. Moreover, it has traditionally been difficult to reason about the behavior of these models beyond a certain level of input complexity. In order to address these issues, we propose augmenting neural architectures with a key abstraction: recursion. As an application, we implement recursion in the Neural Programmer-Interpreter framework on four tasks: grade-school addition, bubble sort, topological sort, and quicksort. We demonstrate superior generalizability and interpretability with small amounts of training data. Recursion divides the problem into smaller pieces and drastically reduces the domain of each neural network component, making it tractable to prove guarantees about the overall system’s behavior. Our experience suggests that in order for neural architectures to robustly learn program semantics, it is necessary to incorporate a concept like recursion.Lie-Access Neural Turing Machines
Abstract: Recent work has demonstrated the effectiveness of employing explicit external memory structures in conjunction with deep neural models for algorithmic learning (Graves et al. 2014; Weston et al. 2014). These models utilize differentiable versions of traditional discrete memory-access structures (random access, stacks, tapes) to provide the variable-length storage necessary for computational tasks. In this work, we propose an alternative model, Lie-access memory, that is explicitly designed for the neural setting. In this paradigm, memory is accessed using a continuous head in a key-space manifold. The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and soft memory access is performed by considering the distance to keys associated with each memory. We argue that Lie groups provide a natural generalization of discrete memory structures, such as Turing machines, as they provide inverse and identity operators while maintain differentiability. To experiment with this approach, we implement several simplified Lie-access neural Turing machine (LANTM) with different Lie groups. We find that this approach is able to perform well on several algorithmic experiments, and outperforms RNN-based methods.Program Synthesis for Character Level Language Modeling
Abstract: We propose a statistical model applicable to character level language modeling and show that it is a good fit for both, program source code and English text. The model is parameterized by a program from a domain-specific language (DSL) that allows expressing non-trivial data dependencies. Learning is done in two phases: (i) we synthesize a program from the DSL, essentially learning a good representation for the data, and (ii) we learn parameters from the training data - the process is done via counting, as in simple language models such as n-gram. Our experiments show that the precision of our model is comparable to that of neural networks while sharing a number of advantages with n-gram models such as fast query time and the capability to quickly add and remove training data samples. Further, the model is parameterized by a program that can be manually inspected, understood and updated, addressing a major problem of neural networks.
From a software-engineering point of view, I feel that this approach is very promising, as it offers somewhat interpretable output in form of a domain-specific language:
Neuro-Symbolic Program Synthesis
Abstract: Recent years have seen the proposal of a number of neural architectures for the problem of Program Induction. Given a set of input-output examples, these architectures are able to learn mappings that generalize to new test inputs. While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network). In this paper, we propose a novel technique, Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems. Once trained, our approach can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time. Our method is based on two novel neural modules. The first module, called the cross correlation I/O network, given a set of input-output examples, produces a continuous representation of the set of I/O examples. The second module, the Recursive-Reverse-Recursive Neural Network (R3NN), given the continuous representation of the examples, synthesizes a program by incrementally expanding partial programs. We demonstrate the effectiveness of our approach by applying it to the rich and complex domain of regular expression based string transformations. Experiments show that the R3NN model is not only able to construct programs from new input-output examples, but it is also able to construct new programs for tasks that it had never observed before during training.
Neural Program Lattices
Abstract: We propose the Neural Program Lattice (NPL), a neural network that learns a hierarchical program structure from a mixture of strong supervision and weak supervision. Our starting point is the recent work of Neural Programmer-Interpreters (NPI), which can only learn from strong supervision (full program execution traces). NPLs can additionally learn from weak supervision consisting of flat sequences of elementary operations. We demonstrate the capability of NPL to learn to perform long-hand addition and arrange blocks in a grid-world environment. Experiments show that it performs on par with NPI while using weak supervision in place of most of the strong supervision, thus indicating its ability to impute latent program abstraction structure from examples containing only weak supervision.
Rejected Papers
Getting rejected from a conference is by no means a sign that the paper doesn't have good ideas; it might just not fit the reviewers' taste, or the advances it proposes might be more on a level of engineering than conceptually. The NIPS experience also revealed a large amount of randomness in the decision-making progress.
Thanks to the open review process, everyone can see what was criticized and led to the rejection.
Abstract: In this paper, we extend neural Turing machine (NTM) into a dynamic neural Turing machine (D-NTM) by introducing a trainable memory addressing scheme. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing strategies including both linear and nonlinear ones. We implement the D-NTM with both continuous, differentiable and discrete, non-differentiable read/write mechanisms. We investigate the mechanisms and effects for learning to read and write to a memory through experiments on Facebook bAbI tasks using both a feedforward and GRU-controller. The D-NTM is evaluated on a set of Facebook bAbI tasks and shown to outperform NTM and LSTM baselines. We also provide further experimental results on sequential MNIST, associative recall and copy tasks.
Abstract: Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task.
Not about Neural Abstract Machines, but related Papers
This is not a typical neural abstract machine, but it follows the idea of using differentiable elements to make things trainable that otherwise wouldn't be:Neural Functional Programming
Abstract: We discuss a range of modeling choices that arise when constructing an end-to-end differentiable programming language suitable for learning programs from input-output examples. Taking cues from programming languages research, we study the effect of memory allocation schemes, immutable data, type systems, and built-in control-flow structures on the success rate of learning algorithms. We build a range of models leading up to a simple differentiable functional programming language. Our empirical evaluation shows that this language allows to learn far more programs than existing baselines.
This is slightly different, but since I had quite some exposure to graph transformation and related formalisms in my undergrad, I find this fascinating, too:
Learning Graphical State Transitions
Abstract: Graph-structured data is important in modeling relationships between multiple entities, and can be used to represent states of the world as well as many data structures. Li et al. (2016) describe a model known as a Gated Graph Sequence Neural Network (GGS-NN) that produces sequences from graph-structured input. In this work I introduce the Gated Graph Transformer Neural Network (GGT-NN), an extension of GGS-NNs that uses graph-structured data as an intermediate representation. The model can learn to construct and modify graphs in sophisticated ways based on textual input, and also to use the graphs to produce a variety of outputs. For example, the model successfully learns to solve almost all of the bAbI tasks (Weston et al., 2016), and also discovers the rules governing graphical formulations of a simple cellular automaton and a family of Turing machines.