Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B

6 min readJan 3, 2025

This tutorial was designed using the Weights & Biases (W&B) Library.

Hello, ı hope everything is going well 😊. Today is so exciting for me 🎉. I am starting a new tutorial that touches on how we can monitor our LLM or GenAI model during fine-tuning, serving, and evaluating 💡. These kinds of cases are crucial nowadays because due to AI, everything is becoming so easier to handle day by day. However, evaluating and monitoring part is still extremely important to specify customer market, average cost, and so on. Many different kinds of apps help us in this kind of situation. I prefer to continue with Weight & Biases (W&B) which is more effective than others, ı think.

Let’s start our tutorial by mentioning how can we log our model during training.

What is W&B?

Weights & Biases (W&B) is your new best friend in the world of machine learning. It’s like having a personal assistant who keeps track of all your experiments, visualizes your model’s performance, and even helps you collaborate with your team. Imagine having a superhero sidekick who makes sure your AI adventures are smooth and successful.

Key Features of W&B:

Experiment Tracking: Never lose track of your experiments again. W&B logs and tracks everything, so you can compare runs and figure out what’s working (and what’s not).
Visualization: See your model’s performance metrics, loss curves, and more in real time. It’s like having a crystal ball for your AI projects.
Collaboration: Share your experiments with your team, get feedback, and work together more effectively. Teamwork makes the dream work, right?
Version Control: Keep tabs on different versions of your models and datasets. No more “Wait, which version was this again?” moments.
Integration: W&B plays nicely with popular machine learning frameworks like PyTorch and TensorFlow. It’s like the social butterfly of AI tools.

Why Use W&B?

Using W&B can supercharge your workflow. Here’s how:

Efficiency: Automate the logging of your experiments and say goodbye to manual errors. It’s like having a robot butler for your AI projects.
Insight: Get deep insights into your model’s performance with detailed visualizations and analytics. Knowledge is power!
Collaboration: Work better with your team by sharing and discussing experiments. Two heads are better than one, after all.
Reproducibility: Ensure your experiments are reproducible, making debugging and improvements a breeze. No more “It worked on my machine” excuses.

Okay, enough theory. Let’s dive in and have some fun!

Let’s Create a Basic MLP Model and Log it with W&B for Sprite Classification

Step 1: Define Our Libraries

First things first, let’s import the necessary libraries:

import math
from pathlib import Path
from types import SimpleNamespace
from tqdm.auto import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from utilities import get_dataloaders

import wandb

Step 2: Define Constants and Create the `get_model` Method

Let’s set up our constants and build a simple model:

INPUT_SIZE = 3 * 16 * 16
OUTPUT_SIZE = 5
HIDDEN_SIZE = 256
NUM_WORKERS = 2
CLASSES = ["hero", "non-hero", "food", "spell", "side-facing"]
DATA_DIR = Path('./data/')
DEVICE = torch.device("cuda" if torch.cuda.is_available()  else "cpu")

def get_model(dropout):
    "Simple MLP with Dropout"
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(INPUT_SIZE, HIDDEN_SIZE),
        nn.BatchNorm1d(HIDDEN_SIZE),
        nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)
    ).to(DEVICE)

Step 3: Define Hyperparameters

Let’s store our hyperparameters in a config object:

config = SimpleNamespace(
    epochs = 2,
    batch_size = 128,
    lr = 1e-5,
    dropout = 0.5,
    slice_size = 10_000,
    valid_pct = 0.2,
)

Step 4: Define Train and Evaluate Methods

Now, let’s define our training and evaluation methods:

def train_model(config):
    "Train a model with a given config"
    
    wandb.init(
        project="dlai_intro",
        config=config,
    )

    # Get the data
    train_dl, valid_dl = get_dataloaders(DATA_DIR, 
                                         config.batch_size, 
                                         config.slice_size, 
                                         config.valid_pct)
    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

    # A simple MLP model
    model = get_model(config.dropout)

    # Make the loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=config.lr)

    example_ct = 0

    for epoch in tqdm(range(config.epochs), total=config.epochs):
        model.train()

        for step, (images, labels) in enumerate(train_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            outputs = model(images)
            train_loss = loss_func(outputs, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            example_ct += len(images)
            metrics = {
                "train/train_loss": train_loss,
                "train/epoch": epoch + 1,
                "train/example_ct": example_ct
            }
            # To log training metrics on W&B dashboard
            wandb.log(metrics)
            
        # Compute validation metrics, log images on last epoch
        val_loss, accuracy = validate_model(model, valid_dl, loss_func)
        # Compute train and validation metrics
        val_metrics = {
            "val/val_loss": val_loss,
            "val/val_accuracy": accuracy
        }
        # To log validation metrics on W&B dashboard
        wandb.log(val_metrics)
     
    # Ending process
    wandb.finish()

def validate_model(model, valid_dl, loss_func):
    "Compute the performance of the model on the validation dataset"
    model.eval()
    val_loss = 0.0
    correct = 0

    with torch.inference_mode():
        for i, (images, labels) in enumerate(valid_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            # Forward pass
            outputs = model(images)
            val_loss += loss_func(outputs, labels) * labels.size(0)

            # Compute accuracy and accumulate
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
            
    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)

Step 5: Configure W&B and Bake Our Model

Before we shine, let’s configure W&B. You can see your logs without an account, or use your W&B API key to save them.

If you want to continue without signing in, enter 1. If not, enter 2 and paste your W&B API key. You can find it here. I'll continue with my API key.

And… Time to bake our model!

train_model(config)

Our baked model is ready! You can see your logs in Jupyter Notebook. But that’s not all. You can also access your project link that comes after “View Project.”

We trained our model and saw our logs on our accounts. However, we often don’t get what we want in one training process. Let’s do other training processes by changing the learning rate.

config.lr = 1e-4
train_model(config)

config.lr = 1e-4
train_model(config)

config.dropout = 0.1
config.epochs = 1
train_model(config)

config.lr = 1e-3
train_model(config)

After running these commands, we trained four models with different hyperparameters. When you click on our project on W&B again, you can compare them like this:

That’s it! We have trained many models and compared them. Therefore, we can easily decide which hyperparameter configuration is more suitable.

Conclusion

In this blog, we introduced W&B by training simple MLP models with different hyperparameters and logged their metrics into W&B.

This is just the beginning. I will continue with training diffusion models, evaluating them, tracing our LLM models, and fine-tuning them by always using W&B.

Your responses are so valuable for me to continue this tutorial. Play the waiting game. See you soon, bye-bye!

Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B

What is W&B?

Key Features of W&B:

Why Use W&B?

Let’s Create a Basic MLP Model and Log it with W&B for Sprite Classification

Step 1: Define Our Libraries

Step 2: Define Constants and Create the `get_model` Method

Step 3: Define Hyperparameters

Step 4: Define Train and Evaluate Methods

Step 5: Configure W&B and Bake Our Model

Conclusion

Written by Halil İbrahim Hatun

No responses yet

Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B

What is W&B?

Key Features of W&B:

Why Use W&B?

Let’s Create a Basic MLP Model and Log it with W&B for Sprite Classification

Step 1: Define Our Libraries

Step 2: Define Constants and Create the get_model Method

Step 3: Define Hyperparameters

Step 4: Define Train and Evaluate Methods

Step 5: Configure W&B and Bake Our Model

Conclusion

Written by Halil İbrahim Hatun

No responses yet

Step 2: Define Constants and Create the `get_model` Method