reports

2022

CMU

Self-Supervised Representation Learning via Curiosity-Driven Exploration

Alvin Shek, Ellis Brown, Nilay Pande, and David Noursi

Robotics Institute, Carnegie Mellon University, May 2022

16-824: Visual Learning and Recognition

Abs Website

“The performance of machine learning methods is heavily dependent on the choice of data representation” — Bengio et al, 2012.
As machine learning continues to be applied to more complex and important tasks, this dependence on the data representation will only increase. While current machine learning methods are bottlenecked by representation quality, current methods for learning representations are bottlenecked on the dataset size. But this process of creating large static datasets, as is the mainstream practice, is expensive, time consuming, and heavily prone to human bias.
Machine learning practitioners have increasingly been focusing on paradigms such as unsupervised and self-supervised learning to help alleviate the expense of supervision in working with bigger datasets; however, these methods still suffer from the issues of static datasets. One promising approach to learn good representations without a fixed datasets is by directly interacting with the environment. The visual state space of real environments/simulators can be quite huge and intractable to explore fully. Hence, in this project, we investigate intelligent curiosity driven exploration strategies to learn good representations from a simulator using self supervised learning objectives. We discuss the effectiveness of different strategies, issues and future directions of research in this field.

2021

CMU

Scaling Interpretable Reinforcement Learning via Decision Trees to Minecraft

Ellis Brown, and Aaron M. Roth

Robotics Institute, Carnegie Mellon University, Dec 2021

16-811: Math Fundamentals for Robotics

Abs PDF

Deep reinforcement learning is a powerful tool for learning complex control tasks; however, neural networks are notoriously “black boxes” and lack many properties desirable of autonymous systems deployed in safety critical environments. In this project, we focus on methods that result in a final control policy specified via a decision tree—which is thus interpretable and verifiable. We build upon a prior method, VIPER, that first learns a high-performing “expert” policy via any standard Deep RL technique, and then distills the expert policy into a decision tree. Our method, called MSVIPER, is specifically designed to scale to complex environements that greatly benefit from (or require) curriculum learning to be solved; we leverage the structure in the currculum stages to enable more efficient learning and a smaller (and thus more interpretable) decision tree. To demonstrate the ability of our method to succeed in complex environments, we apply it to Minecraft—a challenging open-world environment. We highlight that our method is amennable to post-training verification and modification or improvement.

2020

Stanford

Securities Lending Policy Optimization

Ellis Brown

Computer Science Department, Stanford University, Jun 2020

CS 361: Engineering Design Optimization

Abs PDF

This paper presents a method to determine an optimal policy for the lending of securities by large institutions in the securities finance market as a final project for the Stanford University AA222 Engineering Design Optimization class. The securities lending process is formulated as a Markov decision process in which the lender decides whether to accept or reject incoming offers from borrowers. This formulation allows for a policy that maximizes the expected return with each decision to be derived using dynamic programming. The framework presented is easily extensible through the creation of more realistic models of the dynamics of the securities lending market.

2019

Columbia

Modeling Uncertainty in Bayesian Neural Networks with Dropout

Ellis Brown^*, Melanie Manko^*, and Ethan Matlin^*

EECS Department, Columbia University, May 2019

E6699: Mathematics of Deep Learning

Abs PDF Slides

While neural networks are quite successful at making predictions, these predictions are usually point estimates lacking any notion of uncertainty. However, when fed data very different from its training data, it is useful for a neural network to realize that its predictions could very well be wrong and encode that information through uncertainty bands around its point estimate prediction. Bayesian Neural Networks trained with Dropout are a natural way of modeling this uncertainty with theoretical foundations relating them to Variational Inference approximating Gaussian Process posteriors. In this paper, we investigate the effects of weight prior selection and network architecture on uncertainty estimates derived from Dropout Bayesian Neural Networks.