Pranav Agarwal
I am a Ph.D. candidate at Mila, advised by Sheldon Andrews and Samira Ebrahimi Kahou, focusing on reinforcement learning for robotics and character animation. My research explores efficient reinforcement learning algorithms using improved prior modeling for robotic applications. Toward this goal, I have worked on generative models (Transformers) for sample-efficient reinforcement learning and automating reward modeling (from large offline trajectories) for complex robotic applications like excavator automation.
Previously, I was a student researcher at Inria where I collaborated with Natalia Díaz-Rodríguez and
Raoul de CHARETTE. I completed my Bachelors in Electronics and Communication Engineering at IIIT Guwahati, where I was awarded the President's Gold Medal. During my bachelor's I worked as a research intern at SUTD with Professor Gemma Roig.
I'm looking for research positions and open to collaborations. Feel free to reach out!
[
Email  / 
CV  / 
Github  / 
Twitter  / 
Google Scholar  / 
Linkedin / 
Projects
]
|
|
News
Research
|
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal,
Aamer Abdul Rahman,
Pierre-Luc St-Charles,
Simon J.D. Prince,
Samira Ebrahimi Kahou
Under Review (2023)
Paper /
Webpage
This survey explores the impact of transformers in reinforcement learning, addressing common RL challenges, while examining their applications in representation learning, policy optimization, and interpretability.
|
|
Learning to Play Atari in a World of Tokens
Pranav Agarwal,
Sheldon Andrews,
Samira Ebrahimi Kahou
International Conference on Machine Learning (ICML), 2024.
Paper /
Code /
Webpage /
Slides
This work introduces Discrete Abstract Representations for Transformer-based Learning (DART), a sample-efficient method that utilizes discrete representations to improve world modeling and learning behavior in reinforcement learning, achieving superior performance on the Atari 100k benchmark compared to existing methods.
|
|
Empowering Clinicians with MeDT: A Framework for Sepsis Treatment
Aamer Abdul Rahman,
Pranav Agarwal,
Vincent Michalski,
Rita Noumeir,
Philippe Jouvet,
Samira Ebrahimi Kahou
NeurIPS 2023 Goal-Conditioned Reinforcement Learning Workshop (Spotlight).
Paper /
Code /
Webpage /
Slides
The Medical Decision Transformer (MeDT) leverages the transformer architecture to enhance offline reinforcement learning for sepsis treatment recommendations, utilizing a goal-conditioned RL paradigm that improves interpretability and clinician interactivity, while achieving competitive results on the MIMIC-III dataset.
|
|
TPTO: A Transformer-PPO based Task Offloading Solution for Edge Computing Environments
Niloofar Gholipour,
Marcos Dias de Assuncao,
Pranav Agarwal,
Julien Gascon-Samson,
Rajkumar Buyya,
IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS).
Paper /
Code /
Webpage /
Slides
This paper introduces TPTO, a Deep Reinforcement Learning approach that utilizes Transformer and Proximal Policy Optimization to efficiently offload dependent IoT tasks to edge servers, significantly reducing latency for IoT applications compared to state-of-the-art methods.
|
|
Automatic Evaluation of Excavator Operators using Learned Reward Functions
Pranav Agarwal,
Marek Teichmann,
Sheldon Andrews,
Samira Ebrahimi Kahou
NeurIPS 2022 Reinforcement Learning for Real Life Workshop.
Paper /
Code /
Video /
Slides
A novel automatic evaluation strategy for excavator operators is proposed, utilizing machine dynamics and safety criteria, which is then validated through reinforcement learning in a simulation, resulting in safer and more realistic excavator maneuvering policies.
|
|
Goal-constrained Sparse Reinforcement Learning for End-to-End Driving
Pranav Agarwal,
Pierre de Beaucorps,
Raoul de Charette
In submission (2021).
Paper /
Code /
Video
A curriculum-based deep reinforcement learning approach for end-to-end driving is proposed, using sparse rewards and navigation view maps to achieve generalization on unseen roads and longer distances.
|
|
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Pranav Agarwal,
Alejandro Betancourt,
Vana Panagiotou,
Natalia Diaz-Rodriguez
Machine Learning in Real Life (ML-IRL) ICLR 2020 Workshop.
Paper /
Code /
Video /
Slides
A new image captioning dataset, Egoshots, is introduced alongside a novel evaluation metric, Semantic Fidelity, to address biases in existing models and enable caption assessment without annotations.
|
|
Learning to synthesize faces using voice clips for Cross-Modal biometric matching
Pranav Agarwal,
Soumyajit Poddar,
Anakhi Hazarika,
Hafizur Rahaman
2019 IEEE Region 10 Symposium (TENSYMP).
Paper /
Code
A framework for cross-modal biometric matching is proposed, generating faces from voice clips using various generative networks, with RC-GAN achieving the best identity accuracy of 84.52% and VAE producing the highest quality images.
|
|