Network Optix News

Reconfigurable, Prompt-Based Video Detection at the Edge with OpenAI's CLIP and Nx

Written by Joesf Jobert, Senior Software Engineer | May 13, 2025 4:44:19 PM

Reconfigurable, Prompt-Based Video Detection at the Edge with OpenAI's CLIP and Nx

Vision-based AI has revolutionized traditional security and surveillance, but a major challenge continues to limit its full potential: flexibility.

Today, pre-trained object detection models are widely available and work well for recognizing common objects like “person” or “car.” But what if you need to detect something more specific? A car crash. An uncontrolled fire. A person smoking. Traditionally, creating a model for each of these use cases requires collecting data, training a new model, and deploying it to your system—a process that takes time, budget, and technical expertise. 

But what if you could simply describe what you want to detect, in plain language, and your system would understand? That's exactly what OpenAI's CLIP enables. CLIP (Contrastive Language–Image Pretraining) is a vision-based language model that can evaluate how well an image matches a given text prompt. In simple terms: if you show CLIP an image and provide two options—say, “car crash” and “empty highway”—it can tell you which description best fits the image. No retraining necessary.

Sounds good, but how do you put it into practice at scale? That's where Nx AI Manager comes in.

CLIP + Nx AI Manager: Natural Language Detection at the Edge

Nx Toolkit's newest addition, Nx AI Manager is our universal AI inference pipeline designed to help developers deploy, manage, and optimize AI models across a wide range of hardware accelerators. When paired with CLIP, it enables flexible natural language detection directly at the edge.

You can connect numerous cameras to a single server, with each stream assigned a different prompt. One camera might be looking for a crash, another for a person lying down, another for an open flame. You configure it all with natural language—no retraining or coding required. Once CLIP recognizes something that matches a prompt, the Events Rules Engine takes over, enabling the system to automatically trigger an alarm, send an email, or display a notification in real time.

 
 

From Real-Time Monitoring to Archive Search

But the value of the integration doesn’t stop there. Many of our clients operate installations with hundreds of cameras, storing months of recorded video. If something goes missing—say, a passport in an airport or a bag in a hotel lobby—security personnel could spend hours poring over footage to find it.

Creating and training a model for every possible missing item isn’t feasible. But with CLIP and Nx AI Manager, you can simply describe what you want to search for using natural language, apply the prompt to your archive, and sit back as the AI model does the hard work for you. Quite the time saver!

A New Era of AI-Driven Video Insight

This CLIP integration moves us closer to a future where AI models aren't just deployed—they’re directed. The ability to dynamically reconfigure what your system looks for across live and recorded streams, using natural language, breaks down one of the biggest hurdles to practical AI adoption. Paired with Nx AI Manager’s ability to deploy and scale AI models across nearly any hardware, and manage it all through the cloud, it enables a new era of intelligent video applications that are more accessible, more adaptable, and more efficient than ever before.

 

Want to see it in action?
Join us at Computex in Taipei or Embedded Vision Summit in Santa Clara, CA later this month for an in-person demo.
 
Looking to build your own scalable, AI-powered video application?
The CLIP integration will be available for developer testing later this year. Until then, get started with Nx EVOS and Nx Toolkit: