In the 3-hour live broadcast of AI Day, there was half an hour of music without any visuals, which was expected for Tesla and Musk. The following two and a half hours were all about FSD, Dojo, Tesla Bot robots, and in-depth Q&A.
If you ask me what I remember from it, I only have one vivid picture in my mind. At 2 hours and 53 minutes into the AI Day live broadcast, Musk stylishly opened his jacket and responded to reporters’ questions, saying:
If you wear a T-shirt with a stop sign, the car will stop.
What he meant was that if you wear clothes with this type of command to stop sign, Tesla can recognize it as a real stop sign and stop.
From these words, combined with recent car accidents of a certain brand and the content of AI Day’s speeches, I have two conclusions.
-
At this stage, all “self-driving” cars that ordinary people can buy on the market are actually assisted driving. One can only be responsible for their own life, and this cannot be forgotten.
-
The human brain is strong in recognizing semantic information, while computers are strong in precise and repetitive calculation and target tracking.
Tesla’s car computer surprisingly has a metaverse?
I believe that most friends already know that Tesla is currently using a pure visual perception solution. By relying solely on the perceptual ability of 8 cameras with 1.3 million pixels, it has indeed reached a new level. The traditional perception algorithm draws a frame around each relevant object in each frame of the 8 cameras’ images to evaluate distance information and output rules and control results based on this.
But the all-new FSD beta currently utilizes a technology called “Vector Space,” which essentially involves stitching together the 8 video streams into a unified physical model as the basis for route planning. To put it simply, Tesla has created a sandbox in the car that can simulate real-world environments and allows for the simulation of its operations.
Vector Space is 4D. Essentially, it incorporates three-dimensional spatial information (length, width, height), along with the temporal dimension (i.e., time). As such, this sandbox is a three-dimensional model with a timeline, and provides much greater amounts of information and accuracy than traditional 2D perception.
Tesla’s AI Director Andrej Karpathy explains:
We are developing a synthetic animal from the ground up.
This is essentially the difference between “automation” and “intelligence.” What FSD hopes to achieve is breaking away from the “dead mode” reactive patterns that rely on specific visual cues, and instead developing a more “context-based” decision-making process that takes into account the entire context of the surrounding space and objects. Through its ability to remember and anticipate events in spacetime, Tesla’s FSD will not only be able to identify and anticipate obscured pedestrians but also better adjust its route on the fly.
The Three Objectives of Intelligent Driving
Frankly speaking, as someone who is only concerned with the final result, I found many of the AI Day’s talks to be difficult to understand. Nevertheless, I discovered that their research & development goals for autonomous driving are also driven by user experience.
According to first principles reasoning, the car is essentially a traffic tool that transports people from point A to point B. In order for a qualified autonomous driving vehicle to ensure that it can arrive at its destination, it must also optimize efficiency, comfort, and safety.
In other words, Tesla wants to make sure your journey is not only fast, but also smooth, comfortable, and worry-free – in other words, “smooth.”The regulation part of this AI Day is presented by Ashok Elluswamy, a veteran of Tesla’s autonomous driving who graduated from the prestigious Carnegie Mellon University in the United States and is currently responsible for software at AP.
Smooth driving comes from reasonable planning
According to Elluswamy, there are two challenges to path planning – non-convexity and high dimensionality.
Non-convexity refers to the function on the right in the following figure, which has multiple local extremes, unlike the convex function on the left that has only one extreme.
Such problems are more suitable for discrete search optimization rather than continuous function optimization. The former is like a digital process, while the latter can be understood as finding extremes through differentiation. If a differential equation is used to solve the non-convex function on the right in the above figure, it may yield an incorrect answer.
High dimensionality means that the problem has many parameters. Unlike non-convex problems, high dimensionality is more suitable for continuous optimization.
In fact, whenever faced with these decisions, Tesla can make 2,500 simulations in 1.5 milliseconds. Compared with the human brain, these scenarios are actually stationary for the computer.
In the above figure, the vehicle needs to make two consecutive left lane changes before turning left at the next intersection. If the car chooses to insert between two nearby vehicles, an abrupt brake is inevitable. However, if the car accelerates to the front to avoid other vehicles, it faces the risk of missing the intersection. FSD can select the acceleration (related to comfort) and path (related to efficiency) that are most stable and shortest among multiple solutions (each line corresponds to a different action command) on the right of the figure to implement.
Consideration for others“`
We all hate conflicts, and so do machines. Conflict means at least one party will get hurt, and it’s not uncommon for both parties to suffer losses equally. The losses can be as big as a car accident or as small as traffic congestion. But they are all caused by conflicts of varying sizes.
The road in the picture above is a two-way lane in a residential area, but due to too many cars parked on the side of the road, the local road is only wide enough for one car. If you are a driver who likes to take chances, you may hope that the other car will avoid you; if you are a conservative driver, you may stop at an open spot to let the other car pass first. However, either way, you need to anticipate the other party’s actions in order to make decisions for yourself. Otherwise, if you wait for the other party to come to a stop and the other party waits for you to stop, the autonomous driving system that is very conservative for both parties will enter a dead loop.
At this time, Tesla’s FSD algorithm behaves like a person and calculates the path of the oncoming vehicle based on factors such as speed, acceleration, angular velocity, and angular acceleration (green represents the avoidance trajectory, but with lower probability). Based on the result, the vehicle will take the strategy with a lower probability of conflict avoidance.
But then, the unexpected happened, and the oncoming car backed off as well. Faced with this situation, the vehicle quickly responds to the new situation and decisively switches to the strategy of actively passing. Therefore, planning the route for other vehicles is not a waste of computing power, but to reduce conflicts and improve efficiency. Ultimately, altruism is also self-interest.
Reduce the number of search times
For example, in a parking scene, different logical and algorithmic solutions can have a huge impact on the calculation of the path.
The following content comes from Wikipedia, explaining the two algorithms of A* and Monte Carlo tree search:
Among them, the A* search algorithm is a pathfinding algorithm that is used to find the lowest cost path in a graph plane with multiple nodes. It is used for calculating the movement of NPCs in games or the movement of bots in network games.
“`The Monte Carlo Tree Search, as used by Google’s AlphaGo, is a heuristic search algorithm for certain decision-making processes. Each iteration includes four steps:
- Selection: Starting from the root node R, continuously select child nodes downward to the leaf node L. The following gives a method for selecting child nodes, allowing the game tree to expand in the optimal direction, which is the essence of Monte Carlo tree search.
- Expansion: Unless one side wins or loses at L, create one or more child nodes and select one node C.
- Simulation: Starting from node C, play the game with a random policy, also known as playout or rollout.
- Backpropagation: Use the results of random games to update the node information from C to R.
- The content of each node represents the number of victories / game times.
Elluswamy stated that they compared A* Search (Euclidean distance algorithm), A* Search based on Euclidean distance algorithm + semantic navigation information, and Monte Carlo Tree Search based on neural network policy + value equation.
Under three different algorithmic strategies, 398,320, 22,224, and 288 searches were required to find a suitable path into the parking spot. MCTS (Monte Carlo tree search) is the most efficient among them.
Regarding the importance of Monte Carlo tree search for AI technology, we can look at the history of AlphaGo iterations. In addition to playing games such as Go, Chess, Shogi, and Atari, the latest version of MuZero can also learn how to play games and understand rules through observation in a completely blind manner, thus defeating human players. This result provides a great imagination space for the future of AI ability generalization, and the era when machines are capable of learning is coming.
Your grandpa is just your grandpa.Actually, Tesla has not yet started using the parking strategy mentioned above. And many of these features also need to be implemented after the D1 chip is mass-produced and the Dojo is truly landed to work.
But one undeniable fact is that while everyone is still struggling with whether to use lidar or millimeter-wave radar, Tesla, which relies purely on vision, has taken the war from perception hardware to perception algorithms and regulatory algorithms, and is considering how to use the massive data generated by users to train their neural network models more efficiently.
Not long ago, FSD Beta V10 was released. It is indeed a new milestone to pass through Jiucuohuajie under zero intervention in nighttime lighting conditions.
Moreover, in this process, Tesla has not only become the company with the richest real autonomous driving data feedback but also the only one that relies on its own research and development capabilities to handle self-developed barriers to higher autonomous driving chips and supercomputing centers. Of course, this can only be achieved with high product sales. Tesla, which is even too lazy to wait for the mass production of lidar, although its CEO’s ability to talk and turn around is also top-notch, Tesla is definitely 2-3 years ahead of the industry in the layout of autonomous driving technology.
Through crazy computing, it can find the smoothest, most energy-efficient, and safest path planning to improve the efficiency of passing and the game with oncoming cars; it can output a stereo sandbox space through the image information output by the camera; and Tesla has a dynamic memory of the surrounding vehicles and characters. It may be closer to autonomous driving than you think.
This article is a translation by ChatGPT of a Chinese report from 42HOW. If you have any questions about it, please email bd@42how.com.