Retailers suffer billions of dollars in lost sales due to persistent out of stock products on their shelves. In this work, we devise a system based on explainability, computer vision and deep learning to leverage the power of AI into detecting out of stock labels by processing images of the aisle. The system is a combination of 5 deep learning models performing detection of products and labels, classification of labels into various types, and segmentation of shelves to get fixture information, along with other heuristic modules based on logic and inference. The system was deployed in 550 Walmart stores across the US where ~100 aisles from each store were processed to get the OOS data and delivered to the store in 15 minutes. This work is part of my Phd thesis and I will post a link to the thesis once I am done if anybody is interested in the details.
In this project, the goal was to extract information from the price labels we commonly find in stores. In order to so, we adopted a multistage approach in which the initial 3 stages detect the regions of interest as shown by yellow, red, and green bounding boxes, and the final stage identifies the numbers in the region. We also perform a super resolution step on the red ROI in order to enhance the visual quality for identification. I was one of the members in a team of 5 Phd students and 2 masters students.
A classification model was trained to check if the person is compliant with helmets and vests with an aim of monitoring construction workers for safety. As the detector detects the person(s), the bounding box crops are then fed into our model to make a prediction if the person is wearing a helmet and/or a vest. In the video here, we were doing some internal testing in the lab to check the performance. A big thank you to Fangyi and Han for being a sport and helping me film the video.
During my masters at UIC, I worked with Prof.Zefran at the Computer Vision and Robotics Library. This was a pre-deep learning era where the huge language models hadn't taken off quite yet. We worked on developing a multi-modal communication interface with dialogue act classification, and haptic force feedback from the hand to assist the elderly in their daily activities. In the video, I integrated a Dialogue act classifier into the ROS framework and can be heard toying with it through a speech to text generator from CMU called pocketsphinx. I used the rviz tool to simulate arbitrary robot responses to each predicted dialogue act tag for the spoken sentence.