Artificial Intelligence Preprint | 2019-03-15

Artificial Intelligence


Deep Reinforcement Learning with Feedback-based Exploration (1903.06151v1)

Jan Scholten, Daan Wout, Carlos Celemin, Jens Kober

2019-03-14

Deep Reinforcement Learning has enabled the control of increasingly complex and high-dimensional problems. However, the need of vast amounts of data before reasonable performance is attained prevents its widespread application. We employ binary corrective feedback as a general and intuitive manner to incorporate human intuition and domain knowledge in model-free machine learning. The uncertainty in the policy and the corrective feedback is combined directly in the action space as probabilistic conditional exploration. As a result, the greatest part of the otherwise ignorant learning process can be avoided. We demonstrate the proposed method, Predictive Probabilistic Merging of Policies (PPMP), in combination with DDPG. In experiments on continuous control problems of the OpenAI Gym, we achieve drastic improvements in sample efficiency, final performance, and robustness to erroneous feedback, both for human and synthetic feedback. Additionally, we show solutions beyond the demonstrated knowledge.

Generating and Sampling Orbits for Lifted Probabilistic Inference (1903.04672v2)

Steven Holtzen, Todd Millstein, Guy Van den Broeck

2019-03-12

Lifted inference scales to large probability models by exploiting symmetry. However, existing exact lifted inference techniques do not apply to general factor graphs, as they require a relational representation. In this work we provide a theoretical framework and algorithm for performing exact lifted inference on symmetric factor graphs by computing colored graph automorphisms, as is often done for approximate lifted inference. Our key insight is to represent variable assignments directly in the colored factor graph encoding. This allows us to generate representatives and compute the size of each orbit of the symmetric distribution. In addition to exact inference, we use this encoding to implement an MCMC algorithm that explores the space of orbits quickly by uniform orbit sampling.

Inferring Personalized Bayesian Embeddings for Learning from Heterogeneous Demonstration (1903.06047v1)

Rohan Paleja, Matthew Gombolay

2019-03-14

For assistive robots and virtual agents to achieve ubiquity, machines will need to anticipate the needs of their human counterparts. The field of Learning from Demonstration (LfD) has sought to enable machines to infer predictive models of human behavior for autonomous robot control. However, humans exhibit heterogeneity in decision-making, which traditional LfD approaches fail to capture. To overcome this challenge, we propose a Bayesian LfD framework to infer an integrated representation of all human task demonstrators by inferring human-specific embeddings, thereby distilling their unique characteristics. We validate our approach is able to outperform state-of-the-art techniques on both synthetic and real-world data sets.

Incremental Learning of Discrete Planning Domains from Continuous Perceptions (1903.05937v1)

Luciano Serafini, Paolo Traverso

2019-03-14

We propose a framework for learning discrete deterministic planning domains. In this framework, an agent learns the domain by observing the action effects through continuous features that describe the state of the environment after the execution of each action. Besides, the agent learns its perception function, i.e., a probabilistic mapping between state variables and sensor data represented as a vector of continuous random variables called perception variables. We define an algorithm that updates the planning domain and the perception function by (i) introducing new states, either by extending the possible values of state variables, or by weakening their constraints; (ii) adapts the perception function to fit the observed data (iii) adapts the transition function on the basis of the executed actions and the effects observed via the perception function. The framework is able to deal with exogenous events that happen in the environment.

Reinforcement Learning with Dynamic Boltzmann Softmax Updates (1903.05926v1)

Ling Pan, Qingpeng Cai, Qi Meng, Longbo Huang, Tie-Yan Liu

2019-03-14

Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hard' updating scheme results in pure exploitation and may lead to misbehavior due to noise in stochastic environments. Thus, it is critical to balancing exploration and exploitation in value function estimation. The Boltzmann softmax operator has a greater capability in exploring potential action-values. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator in value function estimation, which has good convergence property in the setting of planning and learning. Moreover, we prove that dynamic Boltzmann softmax updates can eliminate the overestimation phenomenon introduced by the hard max operator. Experimental results on GridWorld show that the DBS operator enables convergence and a better trade-off between exploration and exploitation in value function estimation. Finally, we propose the DBS-DQN algorithm by generalizing the dynamic Boltzmann softmax update in deep Q-network, which outperforms DQN substantially in 40 out of 49 Atari games.

Can Robot Attract Passersby without Causing Discomfort by User-Centered Reinforcement Learning? (1903.05881v1)

Yasunori Ozaki, Tatsuya Ishihara, Narimune Matsumura, Tadashi Nunobiki

2019-03-14

The aim of our study was to develop a method by which a social robot can greet passersby and get their attention without causing them to suffer discomfort.A number of customer services have recently come to be provided by social robots rather than people, including, serving as receptionists, guides, and exhibitors. Robot exhibitors, for example, can explain products being promoted by the robot owners. However, a sudden greeting by a robot can startle passersby and cause discomfort to passersby.Social robots should thus adapt their mannerisms to the situation they face regarding passersby.We developed a method for meeting this requirement on the basis of the results of related work. Our proposed method, user-centered reinforcement learning, enables robots to greet passersby and get their attention without causing them to suffer discomfort (p<0.01) .The results of an experiment in the field, an office entrance, demonstrated that our method meets this requirement.

Axiomatizations of inconsistency indices for triads (1801.03355v2)

László Csató

2018-01-10

Pairwise comparison matrices often exhibit inconsistency, therefore many indices have been suggested to measure their deviation from a consistent matrix. A set of axioms has been proposed recently that is required to be satisfied by any reasonable inconsistency index. This set seems to be not exhaustive as illustrated by an example, hence it is expanded by adding two new properties. All axioms are considered on the set of triads, pairwise comparison matrices with three alternatives, which is the simplest case of inconsistency. We choose the logically independent properties and prove that they characterize, that is, uniquely determine the inconsistency ranking induced by most inconsistency indices that coincide on this restricted domain. Since triads play a prominent role in a number of inconsistency indices, our results can also contribute to the measurement of inconsistency for pairwise comparison matrices with more than three alternatives.

Deep learning for time series classification: a review (1809.04356v3)

H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, P. Muller

2018-09-12

Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

MILDNet: A Lightweight Single Scaled Deep Ranking Architecture (1903.00905v2)

Anirudha Vishvakarma

2019-03-03

Multi-scale deep CNN architecture [1, 2, 3] successfully captures both fine and coarse level image descriptors for visual similarity task, but they come up with expensive memory overhead and latency. In this paper, we propose a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times). Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, we compressed our deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer. Trained on the famous Street2shop dataset [4], we demonstrate that our approach performs as good as the current state-of-the-art models with only one third of the parameters, model size, training time and significant reduction in inference time. The significance of intermediate layers on image retrieval task has also been shown to be performing on popular datasets Holidays, Oxford, Paris [5]. So even though our experiments are done on ecommerce domain, it is applicable to other domains as well. We further did an ablation study to validate our hypothesis by checking the impact on adding each intermediate layer. With this we also present two more useful variants of MILDNet, a mobile model (12 times smaller) for on-edge devices and a compactly featured model (512-d feature embeddings) for systems with less RAMs and to reduce the ranking cost. Further we present an intuitive way to automatically create a tailored in-house triplet training dataset, which is very hard to create manually. This solution too can also be deployed as an all-inclusive visual similarity solution. Finally, we present our entire production level architecture which currently powers visual similarity at Fynd.

Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning (1903.05766v1)

Raunak P. Bhattacharyya, Derek J. Phillips, Changliu Liu, Jayesh K. Gupta, Katherine Driggs-Campbell, Mykel J. Kochenderfer

2019-03-14

Recent developments in multi-agent imitation learning have shown promising results for modeling the behavior of human drivers. However, it is challenging to capture emergent traffic behaviors that are observed in real-world datasets. Such behaviors arise due to the many local interactions between agents that are not commonly accounted for in imitation learning. This paper proposes Reward Augmented Imitation Learning (RAIL), which integrates reward augmentation into the multi-agent imitation learning framework and allows the designer to specify prior knowledge in a principled fashion. We prove that convergence guarantees for the imitation learning process are preserved under the application of reward augmentation. This method is validated in a driving scenario, where an entire traffic scene is controlled by driving policies learned using our proposed algorithm. Further, we demonstrate improved performance in comparison to traditional imitation learning algorithms both in terms of the local actions of a single agent and the behavior of emergent properties in complex, multi-agent settings.



Coin Marketplace

STEEM 0.24
TRX 0.11
JST 0.032
BTC 61482.47
ETH 2990.09
USDT 1.00
SBD 3.67