Opening the black box for Generative Vision AI

Modern manufacturing increasingly relies on visual inspection, quality control, and process monitoring—yet most AI vision systems remain opaque black boxes. This session demystifies the core technologies powering today's generative vision AI, providing manufacturing leaders with the technical foundation to effectively deploy and scale visual AI solutions.

Drawing from experience developing Gemini models at Google DeepMind and optimizing vision-language models for resource-constrained environments, we'll explore the architectural foundations that make modern vision AI possible. We'll dissect vision transformers—the backbone of systems like GPT-4V and Gemini Vision—explaining how they process visual information differently from traditional CNNs and why this matters for manufacturing applications.

The session will compare critical training approaches: SigLIP's efficiency in learning visual-text associations versus DINO's self-supervised learning for understanding visual patterns without labels. Understanding these differences is crucial when selecting or customizing models for specific manufacturing use cases, from defect detection to process optimization.

We'll examine real-world applications including automated quality inspection, predictive maintenance through visual monitoring, and process optimization via computer vision. You'll learn how to evaluate model performance beyond accuracy metrics, considering factors like computational requirements, deployment constraints, and interpretability—essential considerations for manufacturing environments.

A key focus will be practical deployment strategies, including techniques for fitting powerful vision-language models into edge devices with minimal memory footprint. We'll discuss the trade-offs between model capability and resource constraints, helping you make informed decisions about on-premise versus cloud deployment.

The session concludes with a framework for rapidly prototyping vision AI applications, leveraging pre-trained models and fine-tuning approaches to accelerate time-to-value. Attendees will leave with actionable insights for implementing vision AI solutions that are both technically sound and operationally viable.

Whether you're evaluating vision AI vendors, planning internal development, or seeking to understand the technology behind the hype, this session provides the technical depth needed to make informed decisions about generative vision AI in manufacturing.