Vertex AI and LLMs Drive Product Catalog Automation and Enhance Retail Data Quality
The Challenge
The digital-first retailer’s purchasing team struggled with organizing product data from numerous vendors, many of whom supplied unstructured or incomplete information in varying formats. Catalog creation relied heavily on manual tagging, which was time-consuming, inconsistent, and prone to errors. This slowed down product onboarding, compromised data quality, and made real-time updates nearly impossible—creating inefficiencies across departments and impeding operational agility.
The Solution
To address this challenge, a GenAI-powered system was built on Google Cloud Platform to automate product catalog management. The solution ingests vendor PDFs via an API, converts them into images, and uses Gemini 1.5 models to extract key data like product titles, attributes, and descriptions. Data is validated against reference tables in BigQuery to ensure accuracy, formatted into structured CSVs, and warehoused for downstream use. This architecture—built using FastAPI, Cloud Run, and prompt-engineered LLMs—runs at scale, supports concurrency, and includes robust logging and error handling. As a result, the manual load on the purchasing team is significantly reduced, while catalog data becomes far more accurate and consistently structured, enabling faster and more reliable product listings.
The Result
Manual Reduction: Automated extraction and tagging drastically cut manual effort in catalog creation workflows.
Improved Accuracy: Data validation and structured output ensured consistent and reliable product catalog entries.
Faster Onboarding: Real-time ingestion and processing accelerated product launches and vendor integration timelines.
Scalable Architecture: Built to handle concurrent extractions, enabling growth without adding operational strain.
Vertex AI and LLMs Drive Product Catalog Automation and Enhance Retail Data Quality
The Challenge
The digital-first retailer’s purchasing team struggled with organizing product data from numerous vendors, many of whom supplied unstructured or incomplete information in varying formats. Catalog creation relied heavily on manual tagging, which was time-consuming, inconsistent, and prone to errors. This slowed down product onboarding, compromised data quality, and made real-time updates nearly impossible—creating inefficiencies across departments and impeding operational agility.
The Solution
To address this challenge, a GenAI-powered system was built on Google Cloud Platform to automate product catalog management. The solution ingests vendor PDFs via an API, converts them into images, and uses Gemini 1.5 models to extract key data like product titles, attributes, and descriptions. Data is validated against reference tables in BigQuery to ensure accuracy, formatted into structured CSVs, and warehoused for downstream use. This architecture—built using FastAPI, Cloud Run, and prompt-engineered LLMs—runs at scale, supports concurrency, and includes robust logging and error handling. As a result, the manual load on the purchasing team is significantly reduced, while catalog data becomes far more accurate and consistently structured, enabling faster and more reliable product listings.
The Result
Manual Reduction: Automated extraction and tagging drastically cut manual effort in catalog creation workflows.
Improved Accuracy: Data validation and structured output ensured consistent and reliable product catalog entries.
Faster Onboarding: Real-time ingestion and processing accelerated product launches and vendor integration timelines.
Scalable Architecture: Built to handle concurrent extractions, enabling growth without adding operational strain.