Artificial Intelligence Preprint | 2019-07-05
Artificial Intelligence 
 ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language (1901.11515v2)
Willie Neiswanger, Kirthevasan Kandasamy, Barnabas Poczos, Jeff Schneider, Eric Xing
2019-01-31
Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that could be used to capture complex systems and reduce the number of queries in BO. Probabilistic programming languages (PPLs) are modern tools that allow for flexible model definition, prior specification, model composition, and automatic inference. In this paper, we develop ProBO, a BO procedure that uses only standard operations common to most PPLs. This allows a user to drop in a model built with an arbitrary PPL and use it directly in BO. We describe acquisition functions for ProBO, and strategies for efficiently optimizing these functions given complex models or costly inference procedures. Using existing PPLs, we implement new models to aid in a few challenging optimization settings, and demonstrate these on model hyperparameter and architecture search tasks.
On Validating, Repairing and Refining Heuristic ML Explanations (1907.02509v1)
Alexey Ignatiev, Nina Narodytska, Joao Marques-Silva
2019-07-04
Recent years have witnessed a fast-growing interest in computing explanations for Machine Learning (ML) models predictions. For non-interpretable ML models, the most commonly used approaches for computing explanations are heuristic in nature. In contrast, recent work proposed rigorous approaches for computing explanations, which hold for a given ML model and prediction over the entire instance space. This paper extends earlier work to the case of boosted trees and assesses the quality of explanations obtained with state-of-the-art heuristic approaches. On most of the datasets considered, and for the vast majority of instances, the explanations obtained with heuristic approaches are shown to be inadequate when the entire instance space is (implicitly) considered.
A global constraint for the capacitated single-item lot-sizing problem (1907.02405v1)
Grigori German, Hadrien Cambazard, Jean-Philippe Gayon, Bernard Penz
2019-07-04
The goal of this paper is to set a constraint programming framework to solve lot-sizing problems. More specifically, we consider a single-item lot-sizing problem with time-varying lower and upper bounds for production and inventory. The cost structure includes time-varying holding costs, unitary production costs and setup costs. We establish a new lower bound for this problem by using a subtle time decomposition. We formulate this NP-hard problem as a global constraint and show that bound consistency can be achieved in pseudo-polynomial time and when not including the costs, in polynomial time. We develop filtering rules based on existing dynamic programming algorithms, exploiting the above mentioned time decomposition for difficult instances. In a numerical study, we compare several formulations of the problem: mixed integer linear programming, constraint programming and dynamic programming. We show that our global constraint is able to find solutions, unlike the decomposed constraint programming model and that constraint programming can be competitive, in particular when adding combinatorial side constraints.
Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units (1906.09926v2)
Prathamesh Deshpande, Sunita Sarawagi
2019-06-24
We present ARU, an Adaptive Recurrent Unit for streaming adaptation of deep globally trained time-series forecasting models. The ARU combines the advantages of learning complex data transformations across multiple time series from deep global models, with per-series localization offered by closed-form linear models. Unlike existing methods of adaptation that are either memory-intensive or non-responsive after training, ARUs require only fixed sized state and adapt to streaming data via an easy RNN-like update operation. The core principle driving ARU is simple --- maintain sufficient statistics of conditional Gaussian distributions and use them to compute local parameters in closed form. Our contribution is in embedding such local linear models in globally trained deep models while allowing end-to-end training on the one hand, and easy RNN-like updates on the other. Across several datasets we show that ARU is more effective than recently proposed local adaptation methods that tax the global network to compute local parameters.
Experience Management in Multi-player Games (1907.02349v1)
Jichen Zhu, Santiago Ontañón
2019-07-04
Experience Management studies AI systems that automatically adapt interactive experiences such as games to tailor to specific players and to fulfill design goals. Although it has been explored for several decades, existing work in experience management has mostly focused on single-player experiences. This paper is a first attempt at identifying the main challenges to expand EM to multi-player/multi-user games or experiences. We also make connections to related areas where solutions for similar problems have been proposed (especially group recommender systems) and discusses the potential impact and applications of multi-player EM.
Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction (1809.09332v2)
Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, Li Wang
2018-09-25
Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.
Hill Climbing on Value Estimates for Search-control in Dyna (1906.07791v3)
Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White
2019-06-18
Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.
Proximal algorithms for large-scale statistical modeling and optimal sensor/actuator selection (1807.01739v3)
Armin Zare, Hesameddin Mohammadi, Neil K. Dhingra, Mihailo R. Jovanović, Tryphon T. Georgiou
2018-07-04
Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second seeks to optimally select a subset of available sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers. We establish linear convergence of the proximal gradient algorithm, draw contrast between the proposed proximal algorithms and alternating direction method of multipliers, and provide examples that illustrate the merits and effectiveness of our framework.
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation (1806.07377v6)
Shani Gamrian, Yoav Goldberg
2018-05-31
Despite the remarkable success of Deep RL in learning control policies from raw pixels, the resulting models do not generalize. We demonstrate that a trained agent fails completely when facing small visual changes, and that fine-tuning---the common transfer learning paradigm---fails to adapt to these changes, to the extent that it is faster to re-train the model from scratch. We show that by separating the visual transfer task from the control policy we achieve substantially better sample efficiency and transfer behavior, allowing an agent trained on the source task to transfer well to the target tasks. The visual mapping from the target to the source domain is performed using unaligned GANs, resulting in a control policy that can be further improved using imitation learning from imperfect demonstrations. We demonstrate the approach on synthetic visual variants of the Breakout game, as well as on transfer between subsequent levels of Road Fighter, a Nintendo car-driving game. A visualization of our approach can be seen in
What deep learning can tell us about higher cognitive functions like mindreading? (1803.10470v2)
Jaan Aru, Raul Vicente
2018-03-28
Can deep learning (DL) guide our understanding of computations happening in biological brain? We will first briefly consider how DL has contributed to the research on visual object recognition. In the main part we will assess whether DL could also help us to clarify the computations underlying higher cognitive functions such as Theory of Mind. In addition, we will compare the objectives and learning signals of brains and machines, leading us to conclude that simply scaling up the current DL algorithms will most likely not lead to human level Theory of Mind.