Pranav Agarwal

I am a Ph.D. candidate at École de technologie supérieure and Mila working on Deep Reinforcement Learning for robotic applications and character animation. I am fascinated by the potential of leveraging human knowledge to guide reinforcement learning policies for solving complex real-world applications.

Previously, I was a student researcher at Inria where I collaborated with Natalia Díaz-Rodríguez and Raoul de CHARETTE. I completed my Bachelors in Electronics and Communication Engineering at IIIT Guwahati, where I was awarded the President's Gold Medal. During my bachelor's I worked as a research intern at SUTD with Professor Gemma Roig.

[ Email  /  CV  /  Github  /  Twitter  /  Google Scholar  /  Linkedin ]

profile photo



elign Transformers in Reinforcement Learning: A Survey
Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J.D. Prince, Samira Ebrahimi Kahou
In submission (2023).
[ Paper ]

Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, improving performance compared to other neural networks. This survey explores their use in reinforcement learning (RL), where they address challenges such as unstable training, credit assignment, interpretability, and partial observability. It provides an overview of RL, discusses challenges faced by classical RL algorithms, and examines how transformers are well-suited to tackle these challenges. The survey covers the application of transformers in representation learning, transition and reward function modeling, and policy optimization within RL. It also discusses efforts to enhance interpretability and efficiency through visualization techniques and tailored adaptations for specific applications. Limitations and potential for future breakthroughs are assessed as well.

elign Empowering Clinicians with MeDT: A Framework for Sepsis Treatment
Aamer Abdul Rahman, Pranav Agarwal, Vincent Michalski, Rita Noumeir, Philippe Jouvet, Samira Ebrahimi Kahou
NeurIPS 2023 Goal-Conditioned Reinforcement Learning Workshop (Spotlight).
[ Paper / Code / Webpage / Slides ]

Offline reinforcement learning is promising for safety-critical tasks like clinical decision support, but faces challenges of interpretability and clinician interactivity. To overcome these, the proposed Medical Decision Transformer (MeDT) utilizes a goal-conditioned RL paradigm for sepsis treatment recommendations. MeDT employs the decision transformer architecture, considering factors like treatment outcomes, patient acuity scores, dosages, and current/past medical states to provide a holistic view of the patient's history. This enhances decision-making by allowing MeDT to generate actions based on user-specified goals, ensuring clinician interactability and addressing sparse rewards. Results from the MIMIC-III dataset demonstrate MeDT's effectiveness in producing interventions that either outperform or compete with existing methods, offering a more interpretable, personalized, and clinician-directed approach.

hpp Automatic Evaluation of Excavator Operators using Learned Reward Functions
Pranav Agarwal, Marek Teichmann, Sheldon Andrews, Samira Ebrahimi Kahou
NeurIPS 2022 Reinforcement Learning for Real Life Workshop.
[ Paper / Code / Video / Slides ]

Training novice users to operate an excavator for learning different skills requires the presence of expert teachers. Considering the complexity of the problem, it is comparatively expensive to find skilled experts as the process is timeconsuming and requires precise focus. Moreover, since humans tend to be biased, the evaluation process is noisy and will lead to high variance in the final score of different operators with similar skills. In this work, we address these issues and propose a novel strategy for the automatic evaluation of excavator operators. We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance.

kts Goal-constrained Sparse Reinforcement Learning for End-to-End Driving
Pranav Agarwal, Pierre de Beaucorps, Raoul de Charette
In submission (2021).
[ Paper / Code / Video ]

Deep reinforcement Learning for end-to-end driving is limited by the need of complex reward engineering. Sparse rewards can circumvent this challenge but suffers from long training time and leads to sub-optimal policy. In this work, we explore full-control driving with only goal-constrained sparse reward and propose a curriculum learning approach for end-toend driving using only navigation view maps that benefit from small virtual-to-real domain gap. To address the complexity of multiple driving policies, we learn concurrent individual policies selected at inference by a navigation system. We demonstrate the ability of our proposal to generalize on unseen road layout, and to drive significantly longer than in the training.

elign Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Pranav Agarwal, Alejandro Betancourt, Vana Panagiotou, Natalia Diaz-Rodriguez
Machine Learning in Real Life (ML-IRL) ICLR 2020 Workshop.
[ Paper / Code / Video / Slides ]

In this paper, we attempt to show the biased nature of the currently existing image captioning models and present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions. We further exploit the state of the art pre-trained image captioning and object recognition networks to annotate our images and show the limitations of existing works. Furthermore, in order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF). Existing image captioning metrics can evaluate a caption only in the presence of their corresponding annotations; however, SF allows evaluating captions generated for images without annotations, making it highly useful for real life generated captions.

elign Learning to synthesize faces using voice clips for Cross-Modal biometric matching
Pranav Agarwal, Soumyajit Poddar, Anakhi Hazarika, Hafizur Rahaman
2019 IEEE Region 10 Symposium (TENSYMP).
[ Paper / Code ]

In this paper, a framework for cross-modal biometric matching is presented, where faces of an individual are generated using his/her voice clips and further the synthesized faces are tested using a face classification network. We explore the advancements of Convolutional Neural Network (CNN) for feature extraction and generative networks for image synthesis. In the experiment, we compare the performance of Variational Autoencoders(VAE), Conditional Generative Adversarial Networks(C-GAN) and Regularized Conditional Generative Adversarial Networks(RC-GAN) and show that RC-GAN that is C-GAN with a regularization factor added to its loss is able to generate faces corresponding to the true identity of the voice clips with the best accuracy of 84.52% while VAE generates a less noise prone image with the highest PSNR of 28.276 decibels but with an accuracy of 72.61%.

Research Student
Jan 2022
PhD Student
Mila, Québec
Jan 2022
Research Assistant
Inria, Paris
May 2019 - April 2021
B.Tech ECE
IIIT Guwahati
Aug 2015 - May 2019
Research Intern
Singapore University of Technology and Design
May 2018 - Aug 2018
Research Intern
Indian Institute of Science, Bangalore
May 2017 - Aug 2017