
Monitoring employee performance in restaurants is a challenging and resource-intensive task. How can businesses maintain service standards, prevent violations, and reduce the need for constant supervision? The startup EasyVision offers a solution powered by computer vision, automatically detecting key events and alerting managers about potential issues. In an interview with the project’s founder, Maksim Tsygankov, we discussed how this technology works, how it differs from competitors, and how AI is set to revolutionize the HoReCa industry.
How did the idea for EasyVision come about, and what market problems were you aiming to solve?
Many businesses face the challenge of monitoring employee performance. For instance, a restaurant manager needs to know how often tables are cleaned, whether hygiene regulations are followed in the kitchen, and whether employees have attempted theft. This can be tracked via security cameras, and artificial intelligence can be integrated to detect such violations automatically.
For a long time, implementing computer vision was a complex and expensive task—it required assembling a dataset, training the model, and deploying it. Additionally, each restaurant needed a custom-trained system to adapt to different camera angles. This made the service unprofitable both for restaurants and IT companies.
However, last year, multimodal large models—Video Language Models—began to emerge. These models can recognize actions from just a single frame instead of requiring hundreds of images as before. I realized that the time was finally right to democratize computer vision for small and medium-sized businesses—those who need it but lack the technological and financial means to implement it.
We started promoting our service in the HoReCa sector because that’s where we found our first clients. However, computer vision can also be useful in retail stores, hotels, and other industries.
What Technologies Power EasyVision, and How Does It Work?
EasyVision is built on computer vision technology. Our system leverages zero-shot and few-shot learning models, enabling accurate recognition with minimal training data. We train them on data specific to each restaurant and implement a cascade of different detectors.
For instance, if we need the model to recognize whether an employee is drinking alcohol on the job, we first configure a motion detector, then layer an action detector on top of it, and so on.
Once the model is ready, we integrate it with existing restaurant security cameras, streaming RTSP feeds to our cloud-based processing system. The next morning, managers receive a report with screenshots of detected violations. If a particular screenshot raises concerns, they can follow a link to watch a short video clip. Based on this analysis, the manager can then take appropriate actions, such as scheduling additional training for an employee.
The technology pipeline varies depending on the type of detection needed. For example:
- We start with a motion detector and then filter the footage to identify specific events.
- In some cases, we first apply a human detector before running an action classifier.
- For tasks like checking for trash in hallways, we analyze the entire image instead of focusing on individual people.
How Does EasyVision Compare to Competitors Like Chooch, Viso, and Roboflow?
Our main advantage is the ease of implementation. Many competing services are toolkits rather than ready-to-use solutions, requiring customization and technical expertise. In contrast, EasyVision is a plug-and-play product that does not require an in-house IT team—we work directly with restaurant managers or owners.
Additionally, we specialize specifically in restaurants. By training our model on data from multiple establishments, we continuously refine it to be highly optimized for the restaurant industry.
- Our solution is faster and more cost-effective to deploy.
- It does not require extensive customization or UI adjustments.
- We provide pre-configured dashboards designed specifically for restaurant managers.
- We can even offer guidance on how to optimize monitoring processes.
As a result, our model delivers more accurate and reliable data than competing solutions, making it a smarter, more efficient choice for the restaurant business.
How Did You Define the Key Use Cases for EasyVision, and What Insights Shaped the Final Product?
We gathered insights by engaging directly with potential clients. Our team held numerous meetings with restaurant owners, identifying their pain points and refining the product based on their feedback.
The key use cases revolved around theft prevention and monitoring alcohol consumption in the workplace. Additionally, we focused on ensuring compliance with internal standards—such as dress code adherence, workplace cleanliness, and customer service timing. Initially, we started with ten restaurants, built an MVP, and continuously adjusted it as new data emerged. This process is ongoing, and our model is still evolving. Right now, we’re working on evaluating individual employee performance, so restaurant owners can better understand who deserves a raise, a promotion, or additional training.
Do You Plan to Make the Product Universal So It Can Be Used Across Different Industries?
For now, restaurants represent a massive market, and we have more than enough opportunities within this sector. In the U.S. alone, there are about 3,000 large restaurants that fit our target audience. We may decide to remain focused solely on this niche.
However, even within one industry, achieving full universality is a challenge. One of our current goals is to develop a ready-to-use solution that doesn’t require customization. Once we achieve that, we can start considering expansion into other industries. We believe we might be able to accomplish this within the next six months.
That being said, our product can be applied to any offline business with security cameras. As I mentioned earlier, this includes retail stores, hotels, and various manufacturing facilities. These businesses would share a similar reporting system that tracks critical events, but some level of customization would still be necessary. Different clients will have unique needs—some may require specific object detection, while others will need integration with existing systems like POS terminals or fire safety systems. Naturally, we will also introduce several pricing plans to accommodate different business requirements.
How Do You Plan to Develop the Product Further? What New Features or Capabilities Will EasyVision Introduce in the Coming Years?
Our main focus is on fully optimizing and automating the backend to ensure even greater accuracy and speed in our system’s performance. Additionally, based on customer feedback, we will refine the user interface—some features may be added, while others may be removed to enhance usability.
We are also considering implementing an onboarding system, so new clients can quickly understand how to use the product without requiring our direct assistance during setup. This would significantly streamline the adoption process.
One of our most ambitious plans is to develop our own Video Management System (VMS) specifically for restaurants. This would enable video archiving while allowing our AI-powered tool to simultaneously analyze and process footage in real-time. By introducing this functionality, we could also collaborate with system integrators to expand the reach of our solution and integrate it into larger business ecosystems.