Vertex AI and LLMs Drive Product Catalog Automation and Enhance Retail Data Quality

The Challenge

The digital-first retailer’s purchasing team struggled with organizing product data from numerous vendors, many of whom supplied unstructured or incomplete information in varying formats. Catalog creation relied heavily on manual tagging, which was time-consuming, inconsistent, and prone to errors. This slowed down product onboarding, compromised data quality, and made real-time updates nearly impossible—creating inefficiencies across departments and impeding operational agility.

The Solution

To address this challenge, a GenAI-powered system was built on Google Cloud Platform to automate product catalog management. The solution ingests vendor PDFs via an API, converts them into images, and uses Gemini 1.5 models to extract key data like product titles, attributes, and descriptions. Data is validated against reference tables in BigQuery to ensure accuracy, formatted into structured CSVs, and warehoused for downstream use. This architecture—built using FastAPI, Cloud Run, and prompt-engineered LLMs—runs at scale, supports concurrency, and includes robust logging and error handling. As a result, the manual load on the purchasing team is significantly reduced, while catalog data becomes far more accurate and consistently structured, enabling faster and more reliable product listings.

The Result

Manual Reduction: Automated extraction and tagging drastically cut manual effort in catalog creation workflows.

Improved Accuracy: Data validation and structured output ensured consistent and reliable product catalog entries.

Faster Onboarding: Real-time ingestion and processing accelerated product launches and vendor integration timelines.

Scalable Architecture: Built to handle concurrent extractions, enabling growth without adding operational strain.

About the Client

An internet retailer and software company focused on healthy living, operating popular e-commerce platforms for aquatics, yoga, and fitness. They also develop software solutions that support industries like agriculture with tools for operations and sales management. Headquartered in California, the company is committed to promoting wellness and leveraging technology to enhance customer experiences across sectors.

Vertex AI and LLMs Drive Product Catalog Automation and Enhance Retail Data Quality

The Challenge

The digital-first retailer’s purchasing team struggled with organizing product data from numerous vendors, many of whom supplied unstructured or incomplete information in varying formats. Catalog creation relied heavily on manual tagging, which was time-consuming, inconsistent, and prone to errors. This slowed down product onboarding, compromised data quality, and made real-time updates nearly impossible—creating inefficiencies across departments and impeding operational agility.

The Solution

To address this challenge, a GenAI-powered system was built on Google Cloud Platform to automate product catalog management. The solution ingests vendor PDFs via an API, converts them into images, and uses Gemini 1.5 models to extract key data like product titles, attributes, and descriptions. Data is validated against reference tables in BigQuery to ensure accuracy, formatted into structured CSVs, and warehoused for downstream use. This architecture—built using FastAPI, Cloud Run, and prompt-engineered LLMs—runs at scale, supports concurrency, and includes robust logging and error handling. As a result, the manual load on the purchasing team is significantly reduced, while catalog data becomes far more accurate and consistently structured, enabling faster and more reliable product listings.

The Result

Manual Reduction: Automated extraction and tagging drastically cut manual effort in catalog creation workflows.

Improved Accuracy: Data validation and structured output ensured consistent and reliable product catalog entries.

Faster Onboarding: Real-time ingestion and processing accelerated product launches and vendor integration timelines.

Scalable Architecture: Built to handle concurrent extractions, enabling growth without adding operational strain.

About the Client

An internet retailer and software company focused on healthy living, operating popular e-commerce platforms for aquatics, yoga, and fitness. They also develop software solutions that support industries like agriculture with tools for operations and sales management. Headquartered in California, the company is committed to promoting wellness and leveraging technology to enhance customer experiences across sectors.

Success is Predictable