Learning to Drive by Differentiating through Scenes

differentiable-programming autonomous-driving

Building differentiable renderers enables using differentiable programming to train agents for navigating through simulations. In this post, we explore a differentiable self-driving car environment and demonstrate learning policies without relying on reinforcement learning.

Avik Pal https://avik-pal.github.io (IIT Kanpur)
2019-08-23

Differentiable Programming (M. Innes 2019) is a recent trend in the world of Machine Learning. It has showcased a dynamic shift to using simpler models (smaller neural networks) as opposed to the heavily parameterized models as used in deep learning. M. Innes et al. (2019) touches upon a variety of applications which demonstrates the robustness of this paradigm. Differentiable Programming (DP) exploits the idea that we can express a lot of popular models from domains like scientific computing (ODEs, PDEs, etc.), computer graphics (RayTracers, Rasterizers, etc.) and so on, as differentiable functions. This allows us to make this already known model do the heavy lifting and as a result the neural network models end up with a lot fewer parameters.

Innes Mike, Joy, and Karmali (2019) demonstrated the use of differentiable programming in simple control problem tasks like cartpole, inverted pendulum, etc. Moreover, it has been used in some more challenging areas where the controller is an ODE solver like in case of Trebuchet. All these examples portray the advantages of using a smaller model in DP as compared to the heavily parameterized models in Reinforcement Learning (RL). This work is in a way a step in the same direction. Only now we combine all those things together and make the “differentiable” program even more complicated than a few ODE equations.

Duckietown: A Test-Bed for Autonomous Driving

Simulators play an important role in training agents to perform a variety of tasks. Simulators give us access to an infinite amount of data and circumstances. This allows our agent to learn a very robust policy to solve the task. However, these policies which are learnt in a simulated environment, are generally not directly transferable to the real world. This has motivated a very active area of research - sim2real transfer. But this is not our focus here. Our primary goal is to show that we can have a completely differentiable pipeline and use it to train an agent to drive.

Firstly, let me introduce duckietown in short. Duckietown (Paull et al. 2017) is a simulator which allows us to train RL agents to drive autonomously. We chose duckietown because of the complexity of its tasks despite having a really simple internal structure. Duckietown offers a variety of tasks but we chose the lane following task as a proof of concept. Now, to make the simulator differentiable is totally a different ordeal. Thanks to Tejan Karmali who led the development of the differentiable Duckietown simulator and also collaborated on the training of the agent (more on that to come later).

Rendering of the Duckietown Environment

Duckietown provides two important modules which are central to our differentiable pipeline. The first one is the differentiable physics engine. In duckietown, the physics of the environment is much simpler than the examples I had mentioned previously (like the trebuchet model which had ODEs in it). Here, the equations are as simple as calculating angular momentum and torque and then finally computing the new position of the agent. The other interesting thing which duckietown is able to do is it can very easily differentiate the effect of the change in the scene dynamics with respect to the motion of the agent. These things will become clearer in the upcoming sections.

Differentiable RayTracing with RayTracer.jl

Our central algorithm, which is described in the following section, relies heavily on the fact that the renderer that we use is differentiable. For people, familiar with raytracing, it might not be exactly obvious as to why it should be differentiable. As a matter of fact, there are a lot of points in the 3D space at which the gradients are not well defined. But before getting into all those details let us have a brief look into what ray tracing is.

Rendering is a technique in which a 2D projection image is generated from 3D scene information. This projection can be photo-realistic or non photo-realistic and depending on the type of image that is needed a variety of techniques exist. Ray Tracing is one such technique for generating photo-realistic images from 3D scenes. It takes into account the complex interactions a light ray has several objects that it encounters along its path. The entire pipeline of rendering begins with representing the 3D scene according to the rendering framework and specifying some parameters according to which the rendering needs to be done. In the code snippet below, we use the RayTracer.jl framework which was designed particularly to tackle the issues with differentiable rendering.

scene = load_obj("teapot.obj")
                
light = DistantLight(
    color     = Vec3(1.0f0),
    intensity = 100.0f0,
    position  = Vec3(0.0f0, 1.0f0, 0.0f0)
)

origin, direction = get_primary_rays(Camera(
    lookfrom = Vec3(1.0f0, 10.0f0, -1.0f0),
    lookat   = Vec3(0.0f0),
    vup      = Vec3(0.0f0, 1.0f0, 0.0f0),
    vfov     = 45.0f0,
    focus    = 1.0f0,
    width    = screen_size.w,
    height   = screen_size.h
))

color = raytrace(origin, direction, scene, light, origin, 2)
Rendered Teapot

A lot of prior work has been done in the area of differentiable rendering. Some of the most notable works in the recent years include Liu et al. (2019), Li et al. (2018), and Loper and Black (2014). However, all these techniques are very application specific and neither of them exactly catered to our usecase. For example, soft raster learns a 3D mesh reconstruction however, can only output a fixed mesh. Hence, to the best of our knowledge, it is not exactly extendable to complicated scenes like a full autonomous driving map. Also, techniques like edge sampling even through very novel, make the rendering step quite slow and would bottleneck our training. This motivated our design of differentiable raytracing.

Our design of a differentiable ray tracer (renderer) was built on the idea that we should not make any fundamental change to the raytracing algorithm. So we built a general purpose rendering software and finally using the source-to-source automatic differentiation (AD) framework, Zygote.jl. This allows us to extend our framework in an easy manner. This also allows anyone who is interested in only rendering to use the software and incur no penalty. The code snippet below demonstrates how simple it is to actually get the gradients with respect to any arbitrary scene parameter.

Zygote.gradient(light) do
    color = raytrace(origin, direction, scene, light, origin, 2)
    return loss_function(color, target)
end

However, this design principle comes with a bunch of challenges. The biggest drawback of this technique is that there are several places in the 3D scene which are non-differentiable. Simply consider a pixel which captures an edge between two triangles. However, our technique would give a gradient for even those points in space and unfortunately those gradients won’t make much sense and in some sense is a coarse approximation of the action value. Hence, if the application is very gradient sensitive like in case of inverse rendering (something that we will visit in a moment) in such cases our method will fail to work. However, these doesn’t affect convergence in case of neural networks (which is afterall our main area of focus).

Brief Overview of the Inverse Rendering Problem

One of the interesting problems that we can tackle with differentiable rendering is inverse rendering. Even though it is a problem where our rendering technique is immensely valuable, it should be noted that we can’t handle a lot of problems which fall under this category. A detailed discussion on this topic is available in Pal (2020). Here we shall only take up a very simple subproblem and show that our method indeed works.

Let us look into a problem called inverse lighting. Like in case of inverse rendering we try to infer the scene parameters from a given view (2D image) of the scene, in case of inverse lighting this scene parameter is the lighting present in the scene. In this example we don’t know the exact location of the light source in the 3D space and its intensity.

Starting Configuration Optimizing Lighting Parameters
Starting Configuration Optimizing Lighting Parameters

As it can seen that our method can optimize the parameters quite fast. However, this brings me to what our proposed method cannot handle. This method is particularly restricted to problems which are very static in nature. Say for example, if we would disorient our camera and try to reach the optima for that orientation, it would be very difficult. This is primarily due to the loss function that we are using. We try to optimize on the pixel space which might not be the ideal thing to do in this case. A better and more disciplined approach is demonstrated in Li et al. (2018).

Training an Autonomous Agent to Drive in Duckietown

Now that we are done talking about the individual components of our work let us try to formulate and understand how all these subparts finally come together. Our algorithm draws some inspiration from the traditional way RL agents are trained. For example, even we have the notion of episodes in training. We furthur divide 1 episode into multiple subepisodes. The intuition behind it is that the agent should learn to optimize its trajectory taking into account the future states not just the current state. It should also be clear that we can also introduce discount factor into the reward function but in our case it did not affect training much. The following figure shows the training method for 1 subepisode.

Proposed method to train a policy

Let us now go over the entire pipeline once. The duckietown environment provides us the scene configuration in the form of 3D mesh, camera and lighting parameters in a form that can be sent to the renderer. The differentiable renderer now converts this scene into a 2D image. The neural network which is a very simple convolutional neural network reads this image and outputs 2 scalars. These scalar values decide the velocities of the left and right wheels of the car. Finally this action is sent to the differentiable physics engine which updates the position of the car and also changes the dynamics of the other objects in the scene. Finally we also get a reward from the physics engine itself. The reward needs to be tuned for the task at hand. For our experiments we focused on lane following tasks.

In this case, the reward was simply the dot product of the current angle of the car and the tangent to the bezier curve of the yellow line at that coordinate. Also, we had to penalize nearing the edges of the road. Additionally, we had to penalize the outputs of the neural network to constrain it to be between -1 and 1. The strategy used by duckietown to clamp the values doen’t work in ourcase as it kills the gradients and the model fails to learn anything worthwhile. Hence, we simply penalized the outputs as squared euclidean distance from 1 or -1 if it exceeded in either direction. Finally let us take a look at two examples of the agent driving.

Straight Road Loop
Driving on a straight road Driving on a closed loop

Future Work and Conclusion

The results that we have demonstrated are indeed quite primitive and much work needs to be done to extend this method to more complicated examples, like what happens in case of pedestrians or maybe in case multiple cars are present on the road. However, what we have indeed shown is that the core idea of expressing models as differentiable programs instead of just a differentiable neural network, is extendable to a variety of problems. These problems might be as simple as the cartpole model or an inverted pendulum or as complex as a full 3D environment with multiple objects. Also we have explored the idea of how differentiability allows us to solve tricky problems like inverse lighting. Considering the current set of tasks that we have been able to accomplish it would be interesting to extend this idea to more complicated problems (maybe express an entire robot as a differentiable program and train it to do some arbitrary task).

Acknowledgements

I would like to thank my collaborator Tejan Karmali who designed the differentiable duckietown environment and also led the prior work on differentiable control problems. Also, thanks to my mentors Dhairya Gandhi and Mike Innes for guiding me through the project. Finally, I would like to thank Google for organising Google Summer of Code and the JuliaCon Organising Committee for funding my travel to JuliaCon 2019.

Innes, Mike. 2019. “What Is Differentiable Programming?” https://fluxml.ai/blog/2019/02/07/what-is-differentiable-programming.html.
Innes, Mike, Alan Edelman, Keno Fischer, Chris Rackauckas, Elliot Saba, Viral B Shah, and Will Tebbutt. 2019. “A Differentiable Programming System to Bridge Machine Learning and Scientific Computing.” arXiv Preprint arXiv:1907.07587.
Innes, Mike, Joy Neethu, and Tejan Karmali. 2019. “Differentiable Control Problems.” https://fluxml.ai/2019/03/05/dp-vs-rl.html.
Li, Tzu-Mao, Miika Aittala, Frédo Durand, and Jaakko Lehtinen. 2018. “Differentiable Monte Carlo Ray Tracing Through Edge Sampling.” ACM Transactions on Graphics (TOG) 37 (6): 1–11.
Liu, Shichen, Weikai Chen, Tianye Li, and Hao Li. 2019. “Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction.” arXiv Preprint arXiv:1901.05567.
Loper, Matthew M, and Michael J Black. 2014. “OpenDR: An Approximate Differentiable Renderer.” In European Conference on Computer Vision, 154–69. Springer.
Pal, Avik. 2020. “RayTracer.jl: A Differentiable Renderer That Supports Parameter Optimization for Scene Reconstruction.” Proceedings of the JuliaCon Conferences 1 (1): 37. https://doi.org/10.21105/jcon.00037.
Paull, Liam, Jacopo Tani, Heejin Ahn, Javier Alonso-Mora, Luca Carlone, Michal Cap, Yu Fan Chen, et al. 2017. “Duckietown: An Open, Inexpensive and Flexible Platform for Autonomy Education and Research.” In 2017 IEEE International Conference on Robotics and Automation (ICRA), 1497–1504. IEEE.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Pal (2019, Aug. 23). Avik's Blog: Learning to Drive by Differentiating through Scenes. Retrieved from https://avik-pal.github.io/blog/posts/2019-08-23-learningtodriveduckietownjlraytracerjl/

BibTeX citation

@misc{pal2019learning,
  author = {Pal, Avik},
  title = {Avik's Blog: Learning to Drive by Differentiating through Scenes},
  url = {https://avik-pal.github.io/blog/posts/2019-08-23-learningtodriveduckietownjlraytracerjl/},
  year = {2019}
}