Google Generative AI Evaluation Service

A service to evaluate the performance of Generative AI Models using metrics like BLEU or ROUGE among others.

Sascha Heyer
5 min readNov 28, 2023

The evaluation service allows the evaluation of the PaLM 2 (text-bison) foundation and tuned models. This evaluation uses a set of metrics against an evaluation dataset you provided.

The process involves creating an evaluation dataset containing prompts and their ideal responses (ground truth pairs).

The model evaluation is a post-tuning process. And it evaluates your model's quality based on your actual LLM response and an ideal ground truth.

We use the evaluation service with the sarcasm text generator and classification model we fine-tuned in my previous article.

Jump Directly to the Notebook and Code

All the code for this article is ready to use in a Google Colab notebook. If you have questions, don’t hesitate to contact me via LinkedIn.

--

--

Sascha Heyer

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support