Google Generative AI Evaluation Service

A service to evaluate the performance of Generative AI Models using metrics like BLEU or ROUGE among others.

5 min readNov 28, 2023

The evaluation service allows the evaluation of the PaLM 2 (text-bison) foundation and tuned models. This evaluation uses a set of metrics against an evaluation dataset you provided.

The process involves creating an evaluation dataset containing prompts and their ideal responses (ground truth pairs).

The model evaluation is a post-tuning process. And it evaluates your model's quality based on your actual LLM response and an ideal ground truth.

We use the evaluation service with the sarcasm text generator and classification model we fine-tuned in my previous article.

Generative AI - How to Fine Tune LLMs

Vertex AI allows you to fine-tune PaLM models for text, chat, code, and embeddings intuitively and easily

medium.com

Jump Directly to the Notebook and Code

All the code for this article is ready to use in a Google Colab notebook. If you have questions, don’t hesitate to contact me via LinkedIn.

Google Generative AI Evaluation Service

A service to evaluate the performance of Generative AI Models using metrics like BLEU or ROUGE among others.

Generative AI - How to Fine Tune LLMs

Vertex AI allows you to fine-tune PaLM models for text, chat, code, and embeddings intuitively and easily

Jump Directly to the Notebook and Code

Written by Sascha Heyer