Autonomous driving is an amazing real-world application that has come about with the culmination of advances in various technologies. The first step in this complex system is to perceive the environment, for the complex onboard system in the car to understand what is around it. This perception is achieved by various sensors like Cameras, Ultrasonic, LiDAR and RADAR. While cameras give a visual perception of the environment, a LiDAR or RADAR sensor is very critical for depth perception.
An Autonomous drive car that will drive you in the future will have a powerful onboard computer that will perceive the environment and makes decisions. But for this Environment perception to work properly, Deep learning algorithms need to be trained with large amount of annotated data from the multiple sensors. This data could come in from driving a car with all sensors for a million kilometers or more!
LiDAR Annotation Challenge
This brings us to the two challenges that the ADAS industry faces today. The first is the sheer magnitude of data. For example, the 1 million kilometers of data will translate to around 600 million LiDAR frames to be annotated. Assuming around 15 to 20 objects to be annotated per frame, this will translate to around 10 billion+ annotations to be done!
The second challenge is more from the perspective of a human who annotates the objects. Annotation of a LiDAR point cloud for different objects like cars, trucks, pedestrians, etc., is not an intuitively obvious task like annotating a car on a video. This makes the LiDAR annotation more time consuming. If you extrapolate the effort to annotate a LiDAR frame over the 600 million frames of data, it not only runs for years but can be very expensive.
The industry has come up with a lot of different ways to address this challenge.
The car collecting the data will have multiple cameras installed around the car in addition to the LiDAR. Since the data from all sensors are getting collected simultaneously, how about showing the images corresponding to the LiDAR frame to the annotator? Better still, how about showing a bounding box around the object on the image, when a cuboid is drawn around the LiDAR points? This is precisely what gets done in Sensor Fusion. This way the annotator will get a visual reference to what he is annotating in all the directions making it easy for them to annotate objects with just a few LiDAR points as well. This needs specific Internal and External calibration parameters to be provided to the annotation platform.
When a car is moving, especially on a highway, the relative movements of the different vehicles is easy to calculate and “predict”. How about allowing the annotator to work only on alternate frames (better still one in 3 or 4 frames) and have an algorithm “Intrapolate” the annotations between the frames based on relative speeds of the objects as perceived. This way the automated annotations can only be audited by annotators for any errors and corrected. This reduced the annotation time significantly.
While the intrapolation method reduces the number of frames to be entirely annotated, can the same be used to predict the object while doing the annotation? The object may not be very accurately predicted, but it will still allow for only minor corrections by the annotator, instead of annotating the object entirely once again. Especially for fixed size objects like cars, this can be done by just moving the entire object to fit the point cloud.
Can a well-defined object like a truck or a sedan be automatically detected by drawing an approximate box around it? This will mean that even the few objects being explicitly annotated can be fast-tracked.
All these innovations are happening rapidly in annotation platforms to help with accelerating the annotation time and reducing the cost.
While all this automation is exciting, it introduces a few paradigm shifts because of some tasks being done by algorithms and some by human annotators.
- Difference in the rate of annotations (automated vs human)
- Auditing of automated annotations
- Raw data arrival rates
- Task Homogenization
This complicates the workflow and ensuring that the annotation team delivers regular flow of annotated data for the Data science team becomes more complex. This reminds me of a complex manufacturing scenario that involves multiple subsystems to be integrated to deliver, say, several cars from a factory!
This calls for an annotation partner who understands managing the “rhythm” of such a complex workflow using these Process transformation principles from the Manufacturing domain. We at NextWealth have been involved in multiple projects where we have reduced the handling time of such micro tasks by orders of magnitude!