A Cost-Effective Approach to Random Bin Picking

By Keith Larson

Contributed By DigiKey's North American Editors

2026-02-10

As the industry moves to increasingly automated manufacturing lines, many complex tasks once reserved for human operators are now performed by machines. Among the most complex of these is random bin picking. That is, the ability to peer into a trayful of randomly arranged components, then identify and retrieve the one that matches the line’s next task—and it may be halfway hidden under a stack of other components.

Applications for random bin picking range from machine loading to kitting and sorting, and the technology is widely used in the automotive, electronics, e-commerce, and medical device industries. While the task is relatively straightforward for a person, the robotic arm assigned to the job must leverage high-speed 3D machine vision, pattern recognition, and path-planning algorithms to succeed. More recently, machine learning approaches are also helping to refine the identification and successful retrieval of bin components.

Structured light versus laser scanning

While the use of laser light to methodically scan and map surfaces is well known, most modern random-bin-picking systems leverage “structured light” approaches that are faster, safer, and more cost-effective than laser mapping. Beyond bin picking, structured-light scanning is widely employed in fields such as industrial design, quality control, augmented reality gaming, and medical imaging. Ambient lighting conditions and reflective component surfaces are potentially complicating factors.

Structured light involves rapidly projecting a series of patterns, such as stripes and grids (Figure 1), onto the bin’s contents. From any angle other than the projector’s, the patterns are distorted. These distortions reveal the three-dimensional complexity of the bin’s contents and are captured in a series of still images, which is where the need for high-speed connectivity and high-power computing enters the picture.

Image of structured-light scanning reveals the identity, location, and orientation Figure 1: By capturing and analyzing the images created by light-and-dark patterns projected into a bin full of components, structured-light scanning reveals the identity, location, and orientation of the various parts within the bin. (Image source: Lattice Semiconductor)

FPGAs take on repetitive tasks

Most structured-light solutions are composed of two modules connected through Ethernet: a sensor module and a computing module. The sensor module is connected to a projector and initiates the projection of a series of structured light patterns into the bin. A camera that is positioned off-axis relative to the projector captures the resultant images. In the case of Lattice Semiconductor’s structured-light solution, a series of 41 discrete images is generated, including positive, negative, horizontal, and vertical patterns. The sequence of images captured by the camera comes back to the sensor module over an MIPI Camera Serial Interface (CSI) link.

The sensor module also includes field programmable gate array (FPGA) resources that encode the series of 41 images into a single, 10-bit coded image, with the location of a common “corresponding pixel” from the generated images indicated. This coded image is then passed to the computing module over an Ethernet link. This encoding significantly increases the speed of transmission to the computing module, as well as the responsiveness and performance of the overall system. For example, sending 41 raw images of 1920 x 1080-pixel resolution represents 680 MB of data traffic, whereas the single encoded image represents only 41 MB of data. This represents a 16-to-1 reduction in data volume, along with a corresponding increase in system performance.

Additional FPGAs in the sensor module can further offload computing module tasks by generating the pixel-by-pixel depth map that effectively outlines individual objects in the bin, and helps the computing module calculate an optimal pick-point target for the associated robot arm. This is a very repetitive task that can be done in parallel for each pixel. Alternatively, the user can use the same computing module resource but add extra capabilities. Similarly, the FPGA can perform all or part of machine-learning-based object detection and segmentation to further offload the computing module.

Hardware versus software

The reason a combination of FPGAs in the sensor module and CPUs/GPUs in the computing module works so well in this application lies in each platform’s complementary strengths. FPGAs excel at highly repetitive tasks, such as the sensor-specific processing and frame-level synchronization required to consolidate 41 images’ worth of information into a single encoded image. It’s a task well-suited to a configurable hardware implementation. Meanwhile, the strength of the CPU/GPU lies in complex, high-level computations, such as for optimization and decision-making, which is most readily implemented in software (Figure 2).

Figure 2: By appropriately dividing the computational workload between FPGA and CPU/GPU resources, the Lattice Semiconductor approach to random bin picking both optimizes system performance and reduces system costs from a bill-of-materials perspective. (Image source: Lattice Semiconductor)

In the case of the random-bin-picking application, the local FPGA encoding at the sensor module dramatically reduces the data that must be sent to the computing module, increasing the speed of pick execution. Meanwhile, the FPGA also reduces the calculation demands on the CPU/GPU housed in the computing module, allowing a lower-cost processor to be used.

The small form factor and the low power consumption of the FPGAs also mean that the sensor module can be housed in a relatively small plastic enclosure without the need for power dissipation accommodations, such as a fan or heatsink. The overall net effect is a lower bill of materials for the total solution.

Closing the loop

Once the encoded image is transferred from the sensor module to the computing module, the CPU/GPU uses triangulation to generate a depth image from the encoded image much like a topographical map of the bottom of the ocean. This depth image is then used for object detection (segmentation) and subsequent pick-point calculations. While computer vision plays the main guiding role in object identification and the calculation of picking points, in more complex applications, CAD models are sometimes used to facilitate object detection via geometrical matching. More recently, machine learning-based approaches have been developed to handle more complicated scenarios; some leverage deep learning to improve performance based on the results of each progressive pick.

Finally, once the 3D rendering of the bin’s contents has been completed and an appropriate pick-point for retrieving the next component selected, instructions are communicated to the robot for execution. Once the random pick is completed, the cycle starts again.

Conclusion

Structured light is both safer and higher performance than lasers when it comes to random bin picking applications. Further, a hybrid approach that leverages both FPGA and CPU/GPU resources performs best and is also most cost-effective from a bill-of-materials perspective. This is due both to the appropriate division of labor between the two semiconductor technologies and the relatively low power consumption of the FPGAs in the sensor module, which in turn eliminates the need for auxiliary cooling measures.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of DigiKey or official policies of DigiKey.