Talk details
From Product Images to Structured Data: VLMs at Marketplace Scale
GPU budgets do not have to scale with the number of images they process. At Mirakl, we’ve built a cloud-native inference stack for our Catalog Transformer that processes product images at scale and extracts structured facts for downstream use cases such as image ordering and background removal. Catalogs with thousands of products are preprocessed with Apache Spark, then served through vision language models on KServe with a vLLM backend, optimized with fine-tuned LoRAs, and amortized in cost with caching. We will unpack the core building blocks we chose and the trade offs we met in production, as a blueprint other teams can reuse. We will close with two operational pillars for scale: parallelizing and regulating traffic with event-driven queues, and the introduction of an AI gateway on our roadmap.
