Challenging the memory of RL agents

Reinforcement learning agents are usually trained to maximize their rewards by taking actions in an environment following a Markov Decision Process (MDP). A Markov Decision Process is simply a model that defines the state of an environment by its current state, actions, and rewards, including also its possible future states. The key point is that agents know information from the present and can approximately predict … Continue reading Challenging the memory of RL agents

Exploring Transformer Model for Reinforcement Learning

MLP is widely used in RL to implement a learnable agent in a certain environment trained according to a specific algorithm. Recent works in NLP have already proved that Transformer can replace and outperform MLP in most tasks leading to expanding its utilization in areas outside of NLP such as Computer Vision. However, in RL the Transformer architecture is still not widely adopted, and agents … Continue reading Exploring Transformer Model for Reinforcement Learning

Speed benchmark einsum vs matmul in XL-Attention

The original Transformer can only attend to a fixed and limited segment of the input to compute its attention. The major drawback of this architecture is that no information can flow across separated segments which prevents the Transformer to model long-term dependencies. Transformer-XL is an enhancement to the vanilla Transformer which enables the latter to store the most recent hidden states in a fixed-length memory … Continue reading Speed benchmark einsum vs matmul in XL-Attention

Visualizing Loss Landscape of GAIL

This post aims to visualize the loss landscape of some imitation policies (IL policies) trained with GAIL, and their discriminator trained in three common environments: Cartpole, Lunarlander, and Walker2d from Mujoco. The expert policy of Cartpole and Lunarlander is a simple Double DQN while the expert of Walker2d, which supports continuous actions, is a DDPG policy. The imitation policies are the same policies employed by their … Continue reading Visualizing Loss Landscape of GAIL

Learning to imitate: using GAIL to imitate PPO

Usually, in reinforcement learning, the agent is provided with a reward according to the action it executes to interact with the environment and its goal is to optimize its total cumulative reward over multiple steps. Actions are selected according to some observations the agent has to learn to interpret. In this post, we are going to explore a new field called imitation learning: the agent … Continue reading Learning to imitate: using GAIL to imitate PPO

Automatic code generator for training Reinforcement Learning policies

Generate custom template code to train your reinforcement learning policy using a simple web UI built with Streamlit. It includes different environments and can be expanded to support multiple policies and frameworks with a high level of flexible hyperparameters customization. The generated code can be easily downloaded as .py file or Jupyter Notebook so as to immediately start training your model or use it as a … Continue reading Automatic code generator for training Reinforcement Learning policies

How Genify used a Transformer-based model to build a recommender system that outperforms industry benchmarks

The rapid ascension of AI, and more recently of deep learning, comported a succession of many breakthroughs in the field of computer science. These have had a profound impact on both the academic and the business world. In particular, modern deep learning techniques applied to the pre-existing concept of recommender systems have given birth to a new, superior class of neural recommender systems, which are … Continue reading How Genify used a Transformer-based model to build a recommender system that outperforms industry benchmarks

Genify’s experience testing Amazon Personalize: learnings and limitations

Challenges of machine learning Machine learning is a complex field that borrows elements from different areas such as computer science, algebra and statistics. Hence, it is not immediate, even for experts in the field, to build strong machine learning models to solve predefined task. Furthermore, those models should also be optimized with a time-consuming and repetitive hyper-parameters search in order to find the best set … Continue reading Genify’s experience testing Amazon Personalize: learnings and limitations

SeqGAN: text generation with generative models

In this post, we propose to review the recent history of research in the Natural Language Generation (NLG) tasks of the Natural Language Processing domain. Realistic human-like language generation has been a challenge for researchers that has recently come into greater focus with the release of large neural models for NLP like the GPT and BERT models. In this post, we propose to focus on … Continue reading SeqGAN: text generation with generative models

Adversarial policies: attacking TicTacToe multi-agent environment

In a previous post we discussed about the possibility for an attacker to fool image classification models by injecting adversarial noise directly to the input images. Similarly, in this post we are going to see how is it possible to attack deep reinforcements learning agents on multi-agent environments (where two or more agents interact within the same environment) such that one or more agents are … Continue reading Adversarial policies: attacking TicTacToe multi-agent environment