Artificial Intelligence Preprint | 2019-03-20

in #artificial5 years ago

Artificial Intelligence


Hyper-Parameter Sweep on AlphaZero General (1903.08129v1)

Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat

2019-03-19

Since AlphaGo and AlphaGo Zero have achieved breakground successes in the game of Go, the programs have been generalized to solve other tasks. Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the literature, the algorithms are explained well. However, AlphaZero contains many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there is sufficient discussion about how to set parameter values in these algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero and evaluate how these parameters contribute to training. We focus on three objectives~(training loss, time cost and playing strength). For each parameter, we train 3 models using 3 different values~(minimum value, default value, maximum value). We use the game of play 66 Othello, on the AlphaZeroGeneral open source re-implementation of AlphaZero. Overall, experimental results show that different values can lead to different training results, proving the importance of such a parameter sweep. We categorize these 12 parameters into time-sensitive parameters and time-friendly parameters. Moreover, through multi-objective analysis, this paper provides an insightful basis for further hyper-parameter optimization.

A New Perspective on Machine Learning: How to do Perfect Supervised Learning (1901.02046v3)

Hui Jiang

2019-01-07

In this work, we introduce the concept of bandlimiting into the theory of machine learning because all physical processes are bandlimited by nature, including real-world machine learning tasks. After the bandlimiting constraint is taken into account, our theoretical analysis has shown that all practical machine learning tasks are asymptotically solvable in a perfect sense. Furthermore, the key towards this solvability almost solely relies on two factors: i) a sufficiently large amount of training samples beyond a threshold determined by a difficulty measurement of the underlying task; ii) a sufficiently complex and bandlimited model. Moreover, for some special cases, we have derived new error bounds for perfect learning, which can quantify the difficulty of learning. These generalization bounds are not only asymptotically convergent but also irrelevant to model complexity. Our new results on generalization have provided a new perspective to explain the recent successes of large-scale supervised learning using complex models like neural networks.

Hierarchy-based Image Embeddings for Semantic Image Retrieval (1809.09924v4)

Björn Barz, Joachim Denzler

2018-09-26

Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class embeddings whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding does not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds, and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.

Truly Proximal Policy Optimization (1903.07940v1)

Yuhui Wang, Hao He, Xiaoyang Tan

2019-03-19

Proximal policy optimization (PPO) is one of the most successful deep reinforcement learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the probability ratio as it devotes nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Trust Region-based PPO with Rollback (TR-PPO-RB). Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the ratio between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, which is theoretically justified according to the trust region theorem. It seems, by adhering more truly to the "proximal" property - restricting the policy within the trust region, the new algorithm improves the original PPO on both stability and sample efficiency.

Translation of Algorithmic Descriptions of Discrete Functions to SAT with Applications to Cryptanalysis Problems (1805.07239v3)

Alexander Semenov, Ilya Otpuschennikov, Irina Gribanova, Oleg Zaikin, Stepan Kochemazov

2018-05-17

In the present paper we describe the technology for translating algorithmic descriptions of discrete functions to SAT. The proposed technology is aimed at applications in algebraic cryptanalysis. We describe how cryptanalysis instances are reduced to SAT in such a way that it should be perceived as natural by the cryptographic community. Therefore, in the theoretical part of the paper we justify the main principles of general reduction to SAT for discrete functions from a class containing the majority of functions employed in cryptography. Based on these principles we describe the Transalg software system, developed with SAT-based cryptanalysis specifics in mind. We show the results of applications of Transalg to construction of a number of attacks on various cryptographic functions. Some of the corresponding attacks are state of the art. We also compare the functional capabilities of the proposed system with that of other software systems that can be used to reduce cryptanalysis instances to SAT, and also with the CBMC system widely employed in symbolic verification. In the paper we also present vast experimental data, obtained using the SAT-solvers that took first places at the SAT-competitions in the recent several years.

Turing-Completeness of Dynamics in Abstract Persuasion Argumentation (1903.07837v1)

Ryuta Arisaka

2019-03-19

Abstract Persuasion Argumentation (APA) is a dynamic argumentation formalism that extends Dung argumentation with persuasion relations. In this work, we show through two-counter Minsky machine encoding that APA dynamics is Turing-complete.

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation (1903.07826v1)

Yong Liu, Yinan Zhang, Qiong Wu, Chunyan Miao, Lizhen Cui, Binqiang Zhao, Yin Zhao, Lu Guan

2019-03-19

Interactive recommendation that models the explicit interactions between users and the recommender system has attracted a lot of research attentions in recent years. Most previous interactive recommendation systems only focus on optimizing recommendation accuracy while overlooking other important aspects of recommendation quality, such as the diversity of recommendation results. In this paper, we propose a novel recommendation model, named \underline{D}iversity-promoting \underline{D}eep \underline{R}einforcement \underline{L}earning (DRL), which encourages the diversity of recommendation results in interaction recommendations. More specifically, we adopt a Determinantal Point Process (DPP) model to generate diverse, while relevant item recommendations. A personalized DPP kernel matrix is maintained for each user, which is constructed from two parts: a fixed similarity matrix capturing item-item similarity, and the relevance of items dynamically learnt through an actor-critic reinforcement learning framework. We performed extensive offline experiments as well as simulated online experiments with real world datasets to demonstrate the effectiveness of the proposed model.

Agent Embeddings: A Latent Representation for Pole-Balancing Networks (1811.04516v4)

Oscar Chang, Robert Kwiatkowski, Siyuan Chen, Hod Lipson

2018-11-12

We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We investigate a pole-balancing task, Cart-Pole, as a case study and show that multiple new pole-balancing networks can be generated from their agent embeddings without direct access to training data from the Cart-Pole simulator. In general, the learned embedding space is helpful for mapping out the space of solutions for a given task. We observe in the case of Cart-Pole the surprising finding that good agents make different decisions despite learning similar representations, whereas bad agents make similar (bad) decisions while learning dissimilar representations. Linearly interpolating between the latent embeddings for a good agent and a bad agent yields an agent embedding that generates a network with intermediate performance, where the performance can be tuned according to the coefficient of interpolation. Linear extrapolation in the latent space also results in performance boosts, up to a point.

Lemotif: Abstract Visual Depictions of your Emotional States in Life (1903.07766v1)

Devi Parikh

2019-03-18

We present Lemotif. Lemotif generates a motif for your emotional life. You tell Lemotif a little bit about your day -- what were salient events or aspects and how they made you feel. Lemotif will generate a lemotif -- a creative abstract visual depiction of your emotions and their sources. Over time, Lemotif can create visual motifs to capture a summary of your emotional states over arbitrary periods of time -- making patterns in your emotions and their sources apparent, presenting opportunities to take actions, and measure their effectiveness. The underlying principles in Lemotif are that the lemotif should (1) separate out the sources of the emotions, (2) depict these sources visually, (3) depict the emotions visually, and (4) have a creative aspect to them. We verify via human studies that each of these factors contributes to the proposed lemotifs being favored over corresponding baselines.

Deep Reinforcement Learning with Decorrelation (1903.07765v1)

Borislav Mavrin, Hengshuai Yao, Linglong Kong

2019-03-18

Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep reinforcement learning (DRL) such as Deep Q networks (DQN) achieves remarkable success in computer games by learning deeply encoded representation from convolution networks. In this paper, we propose a simple yet very effective method for representation learning with DRL algorithms. Our key insight is that features learned by DRL algorithms are highly correlated, which interferes with learning. By adding a regularized loss that penalizes correlation in latent features (with only slight computation), we decorrelate features represented by deep neural networks incrementally. On 49 Atari games, with the same regularization factor, our decorrelation algorithms perform in terms of human-normalized scores, which is better than DQN. In particular, ours performs better than DQN on 39 games with 4 close ties and lost only slightly on games. Empirical results also show that the decorrelation method applies to Quantile Regression DQN (QR-DQN) and significantly boosts performance. Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.



Sort:  

Congratulations @wholesome-post! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 20 posts. Your next target is to reach 30 posts.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Do not miss the last post from @steemitboard:

Carnival Challenge - Here are the winners
Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.17
TRX 0.13
JST 0.027
BTC 59007.49
ETH 2658.92
USDT 1.00
SBD 2.44