Web UI Guide¶

This guide covers creating and managing datasets through the Logfire web interface. For programmatic access, see the SDK Guide.

Navigating Evals¶

Click Evals in the sidebar to open the datasets list. From there:

Click a dataset name to see its detail page (experiments, cases, schema).
Click Edit to modify the dataset's name, description, and schemas.
Click <> SDK to view code snippets for working with the dataset programmatically.
Click + Add case to add a new test case.
Select experiments and click Compare to compare multiple runs side by side.

Breadcrumbs at the top of each page show your current location (e.g., Evals / Datasets / my-dataset / Edit).

Creating a New Dataset¶

Click + New dataset in the top right and enter a name for your dataset. If you don't have any datasets yet, you can also type a name directly into the empty state and click Create.

Once created, you can edit the dataset to add a description and define schemas.

Schema generation from code

If you are using Python types (dataclasses, Pydantic models, etc.) for your schemas, it is easier to create the dataset via the SDK, which generates JSON schemas automatically from your type definitions. See Creating a Dataset with Typed Schemas in the SDK Guide.

SDK equivalent

from dataclasses import dataclass

from logfire.experimental.api_client import LogfireAPIClient


@dataclass
class QuestionInput:
    question: str
    context: str | None = None


@dataclass
class AnswerOutput:
    answer: str
    confidence: float


with LogfireAPIClient(api_key='your-api-key') as client:
    dataset = client.create_dataset(
        name='qa-golden-set',
        description='Golden test cases for the Q&A system',
        input_type=QuestionInput,
        output_type=AnswerOutput,
    )

See the SDK Guide for full details on creating and managing datasets programmatically.

Editing a Dataset¶

From the dataset detail page, click Edit to modify the dataset's configuration. The edit form has two sections:

General: Name and description.
Schemas: Define JSON schemas for inputs, expected outputs, and metadata. Use the Generate schema toggle to have Pydantic AI create schemas from a natural language description of your data shape.

Managing Cases¶

From the dataset detail page, click the Cases tab to see all hosted cases for the dataset.

Add a case: Click + Add case to open the case editor. Fill in name, inputs, expected output, and metadata. When the dataset has schemas defined, fields render as labeled inputs with type information; otherwise you edit raw JSON.
Edit a case: Click the pencil icon on any case row to open the editor pre-populated with that case's data. Make your changes and save.
Delete a case: Click the trash icon on any case row and confirm deletion.

SDK equivalent

from pydantic_evals import Case

client.add_cases(
    'qa-golden-set',
    cases=[
        Case(
            name='capital-question',
            inputs=QuestionInput(question='What is the capital of France?'),
            expected_output=AnswerOutput(answer='Paris', confidence=0.99),
        ),
    ],
)

See Adding Cases in the SDK Guide for more options.

Adding Cases from Traces¶

You can create test cases directly from production data:

Open Live View and find a trace or span that represents a good test case.
Click the database icon (+) on the span details panel.
Select an existing dataset or create a new one.
The AI can automatically extract inputs and outputs from the span data --- review and edit the extracted values before saving.

This preserves a link back to the source trace, so you always know where a test case came from.

SDK equivalent

You can use add_cases with plain dicts to programmatically create the same trace linkage:

client.add_cases(
    'qa-golden-set',
    cases=[
        {
            'inputs': {'question': 'What color is the sky?'},
            'name': 'sky-color',
            'expected_output': {'answer': 'Blue'},
            'source_trace_id': 'trace-uuid-from-live-view',
            'source_span_id': 'span-uuid-from-live-view',
        },
    ],
)

See Adding Cases in the SDK Guide for more details.

Exporting a Dataset¶

From the dataset detail page, click Export to download the dataset in one of two formats:

JSON: Raw JSON representation of all cases.
pydantic-evals: A YAML format compatible with pydantic_evals.Dataset.from_file().

Viewing Experiments¶

The Experiments tab on the dataset detail page shows all evaluation runs against this dataset. You can:

Click any experiment row to see detailed results (inputs, outputs, scores, assertions).
Select multiple experiments and click Compare to view them side by side.
See pass rates, scores, labels, and metrics at a glance.

What's Next?¶

Once you have cases in a dataset, you can export them and run evaluations. See Running Evaluations to get started.