Pydantic v2, A Practical Guide from FastAPI to v1 Migration

In the modern Python ecosystem, it's impossible to discuss data validation and settings management without mentioning Pydantic. It has become the de-facto standard, empowering countless developers to increase productivity by clearly defining data structures with type hints and enforcing them at runtime.

This article will serve as your practical guide to mastering Pydantic v2. We will explore its core use cases with hands-on examples, from API development with FastAPI to settings management. Most importantly, we provide a detailed v1-to-v2 migration checklist to help you upgrade your codebase with confidence.

1. API Development (Web API Development)

This is one of Pydantic's most well-known and powerful use cases. It plays a core role, especially when used with the FastAPI framework.

Request Validation: It automatically checks if the data sent from a client (like a web browser or mobile app) to the server has the correct format and values. For example, it can validate that an 'age' field contains a number instead of a string during user signup, or that an 'email' field is in a valid email format.
Response Serialization: It neatly formats the data the server sends back to the client according to a specific model (or schema). It transforms complex objects from a database into clean JSON containing only the necessary information.
Automatic API Documentation: FastAPI automatically generates API documentation based on Pydantic models, significantly increasing development efficiency.

from fastapi import FastAPI
from pydantic import BaseModel, EmailStr, Field

# Create FastAPI application
app = FastAPI()

# --- Define Pydantic Models ---

# 1. Request Model: The "Entry Guard"
# This defines the rules for the data a client must send to create a user.
class UserIn(BaseModel):
    username: str = Field(..., min_length=3, description="Username must be at least 3 characters long.")
    email: EmailStr  # Pydantic automatically validates the email format.
    age: int | None = None # An optional field that can be omitted.

# 2. Response Model: The "Exit Guard"
# This defines the rules for the data the server will send back to the client.
class UserOut(BaseModel):
    id: int
    username: str
    email: EmailStr

# --- Define API Endpoint ---

@app.post("/users/", response_model=UserOut, summary="Create User")
async def create_user(user: UserIn):
    """
    Creates a new user, saves it to the database,
    and returns the created user's information.
    """
    # 1. Request Validation (Handled automatically)
    # Before this function runs, FastAPI automatically validates the 'user' parameter
    # against the UserIn model.
    # If the username is less than 3 characters or the email format is invalid,
    # FastAPI automatically sends a 422 error back to the client.

    # This part simulates "saving to the database."
    # In a real application, you would save to a DB and get a unique ID.
    new_user_id = 1 

    # 2. Response Serialization (Handled by response_model)
    # The return value combines the user object and the new id.
    # FastAPI automatically formats this return value to match the UserOut model.
    # The 'age' field from the input model (UserIn) is not in the response model (UserOut),
    # so it will be excluded from the final JSON response.
    return {
        "id": new_user_id,
        "username": user.username,
        "email": user.email,
    }

2. Settings Management

Pydantic is very useful for managing various configuration values needed to run an application (database addresses, API keys, secret values, etc.). This is done using an extension library called pydantic-settings.

Loading from Multiple Sources: It can load configuration values scattered across different sources, like environment variables and .env files, and manage them as a single object.
Type and Validity-Checking: It validates that settings have the correct type and that required values are not missing when the application starts, preventing runtime errors.

# .env file

# Database connection string
APP_DB_URL="postgresql://user:password@localhost/mydatabase"

# External API key
APP_API_KEY="abc-123-def-456"

# config.py file

from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    """
    A class to manage the application's settings.
    """

    # 1. Define settings (with type hints)
    # Automatically finds and populates the value from the APP_DB_URL
    # in the .env file or environment variables.
    db_url: str

    # Automatically finds and populates the value for APP_API_KEY.
    api_key: str

    # If the value is not found in environment variables or the .env file,
    # it uses the default value below.
    log_level: str = "INFO"

    # 2. Define the behavior of the settings model
    # Use model_config to control the behavior of pydantic-settings.
    model_config = SettingsConfigDict(
        env_prefix="APP_",  # Look for environment variables prefixed with 'APP_'
        env_file=".env",       # Read a .env file and treat its contents as environment variables
        env_file_encoding="utf-8" # Specify .env file encoding
    )

# 3. Create and use the settings object
# The moment this code runs, Pydantic reads and validates the settings.
settings = Settings()

# Import this settings object elsewhere in your application.
print("Database URL:", settings.db_url)
print("API Key:", settings.api_key)
print("Log Level:", settings.log_level)

# print(settings.model_dump_json(indent=2))

3. Defining Complex Data Structures

Instead of handling data with simple dict and list types, you can use Pydantic's BaseModel to create data objects with a clear structure.

Improved Code Readability and Maintainability: The 'blueprint' of the data is clearly defined in code, making it easier for other developers to understand. IDE autocompletion support also helps reduce mistakes like typos.
Type Safety: When used with static type checkers (e.g., Mypy), you can find type-related errors even before running the code.

Problems Without Using Pydantic

Using simple dict and list types to represent data structures leads to the following problems.

# Handling data with only standard dicts and lists

def print_post_summary(post_data: dict):
    # Problem 1: It's easy to make typos in key names.
    # Typing 'auther' instead of 'author' doesn't raise an immediate error.
    # print(f"'{post_data['title']}' by {post_data['auther']['name']}") # Raises KeyError!

    # Problem 2: You have to guess the data structure.
    # It's hard to know if the 'author' key contains a 'name', or if 'comments' is a list,
    # until you inspect the actual data or run the code.

    # Problem 3: Data can be missing.
    # if 'comments' in post_data and post_data['comments']:
    #     print(f"  - There are a total of {len(post_data['comments'])} comments.")
    # else:
    #     print("  - No comments.")

# Example data
post_1 = {
    "title": "My First Post",
    "content": "Hello world!",
    "author": {"name": "John Doe", "email": "john@example.com"},
    "comments": [
        {"author_name": "Alice", "text": "Great post!"},
        {"author_name": "Bob", "text": "Welcome to the blog."}
    ]
}

# Case where the 'comments' key is missing
post_2 = {
    "title": "Another Post",
    "content": "...",
    "author": {"name": "Jane Doe", "email": "jane@example.com"}
    # 'comments' key is missing
}

# print_post_summary(post_1)
# print_post_summary(post_2)

Improved Code Using Pydantic

Now, let's create a "blueprint" for our data using Pydantic's BaseModel.

from pydantic import BaseModel, EmailStr
from typing import List

# --- Define the data's 'blueprint' ---

class Author(BaseModel):
    name: str
    email: EmailStr  # Even validates the email format

class Comment(BaseModel):
    author_name: str
    text: str

class BlogPost(BaseModel):
    title: str
    content: str
    author: Author  # Use the Author model as a nested type
    comments: List[Comment] = [] # Defaults to an empty list if no comments


# --- The improved function ---

def print_post_summary_pydantic(post: BlogPost):
    # Advantage 1: Autocompletion and type safety
    # When you type 'post.', the IDE will suggest title, content, author, etc.
    # The chance of a typo is almost zero. post.auther (X)
    print(f"'{post.title}' by {post.author.name} ({post.author.email})")

    # Advantage 2: No need to worry about missing data
    # The BlogPost model guarantees that 'comments' is always a list.
    if post.comments:
        print(f"  - There are a total of {len(post.comments)} comments.")
    else:
        print("  - No comments.")

# --- Creating and using the data ---

# Convert dictionary data into a Pydantic model (parsing and validation)
try:
    # post_1 is the same dict used above.
    blog_post_1 = BlogPost.model_validate(post_1)
    print_post_summary_pydantic(blog_post_1)

    print("-" * 20)

    # post_2 is missing 'comments', but it will be handled by the model's default value ([]).
    blog_post_2 = BlogPost.model_validate(post_2)
    print_post_summary_pydantic(blog_post_2)

except Exception as e:
    print("Data validation failed:", e)

4. Data Processing & ETL Pipelines

Pydantic is used to ensure data quality during the process of extracting, transforming, and loading (ETL) data from various sources like CSV files, databases, or external APIs.

Data Cleansing: It provides a first pass of cleansing for messy raw data by running it through a Pydantic model to enforce the required format and types.
Maintaining Data Integrity: It verifies that the data maintains its expected structure at each step of the processing pipeline, preventing data corruption midway through.

A Pipeline to Clean Messy User Data

Consider a situation where we need to process raw user data (e.g., from a CSV or external API) with the following problems:

IDs are a mix of strings and numbers.
Email formats are invalid, or the key name is email_address.
A required value (name) is missing.
The status value is missing.

import datetime
from typing import List, Optional
from pydantic import BaseModel, EmailStr, Field, ValidationError, field_validator

# 1. Raw Data (Extract phase)
# Assume this is messy data from a CSV, DB, or API.
raw_user_data = [
    {'user_id': '1', 'name': 'John Doe', 'email_address': 'john.doe@example.com'},
    {'user_id': 2, 'name': 'jane doe', 'email_address': 'jane.doe@example.com', 'status': 'active'},
    {'user_id': '3', 'name': 'Peter Pan', 'email_address': 'invalid-email'}, # Invalid email
    {'user_id': 4, 'name': None, 'email_address': 'peter@example.com'}, # Missing name
]

# 2. Define Data Blueprint (Rules for the Transform phase)
# We will clean and validate data by passing it through this model.
class CleanUser(BaseModel):
    user_id: int  # Automatically converts string '1' to number 1
    name: str      # Name must be a string (None is not allowed)

    # Map the value from the key 'email_address' to the 'email' field (alias)
    email: EmailStr = Field(..., alias='email_address')

    # If 'status' is missing, set 'inactive' as the default value
    status: str = 'inactive'

    # The registration date is automatically set to the time the Pydantic model is created.
    created_at: datetime.datetime = Field(default_factory=datetime.datetime.now)

    # A validator to further process a specific field
    @field_validator('name')
    @classmethod
    def clean_name(cls, v: str):
        # Capitalize the first letter of the name (e.g., 'jane doe' -> 'Jane Doe')
        return v.title()

# 3. Run the ETL Pipeline
def process_users(raw_data: List[dict]):
    clean_data = []
    invalid_data = []

    print("Starting data processing pipeline...")
    for i, record in enumerate(raw_data):
        try:
            # Validate and clean the data with the Pydantic model.
            clean_user = CleanUser.model_validate(record)
            clean_data.append(clean_user)
            print(f"  - Record {i+1} processed successfully: {clean_user.name}")
        except ValidationError as e:
            # If validation fails, store it separately with error information.
            invalid_data.append({'original_record': record, 'error': e.errors()})
            print(f"  - Record {i+1} failed to process!")

    print("...Data processing pipeline finished\n")
    return clean_data, invalid_data

# Run the pipeline and check the results
processed_users, failed_records = process_users(raw_user_data)

print("--- ✅ Successfully Processed Data ---")
for user in processed_users:
    print(user.model_dump_json(indent=2))

print("\n--- ❌ Failed Data and Reasons ---")
import json
print(json.dumps(failed_records, indent=2, default=str))

Code Explanation and Advantages

Data Cleansing
- The CleanUser model isn't just a data structure; it's the set of rules for cleansing data.
- Type Conversion: user_id: int automatically converts the string '1' to the number 1.
- Using Aliases: Field(alias='email_address') maps the value to the model's email field even if the input key name is different (email_address).
- Setting Defaults: status: str = 'inactive' automatically fills in a default value if the original data is missing it, improving data consistency.
- Value Transformation: @field_validator transforms data like 'jane doe' into a consistent format like 'Jane Doe'.
Maintaining Data Integrity
- The try...except ValidationError block is the core of the ETL pipeline.
- Only clean and reliable data that passes all the rules of the CleanUser model is added to the clean_data list.
- Data that fails validation doesn't stop the entire pipeline. Instead, it's recorded in invalid_data along with the reason for failure. This allows you to track and fix problematic data later.

Migration Checklist (v1 → v2)

1) Installation & Compatibility

pip install "fastapi[standard]" "uvicorn[standard]" pydantic>=2 pydantic-settings

FastAPI already officially supports Pydantic v2. Using an older version of FastAPI might cause dependency conflicts, so upgrading to the latest version is recommended. (GitHub Discussion)
Check the official release notes here: (FastAPI Release Notes)

2) Core Changes Summary: v1 → v2

Topic	v1	v2
Serialization	`model.dict()`	`model.model_dump()` (FastAPI Docs, Pydantic Docs)
Deserialization/Parsing	`parse_obj`, `from_orm`	`BaseModel.model_validate(...)` (+ `from_attributes=True`) (Pydantic Docs)
JSON Parsing	`parse_raw`	`model_validate_json` (Pydantic Docs)
Configuration Class	`class Config:`	`model_config = ConfigDict(...)` (Pydantic Docs)
Validation Decorators	`@validator`, `@root_validator`	`@field_validator`, `@model_validator` (Pydantic Docs, GitHub Discussion)
ORM Mode	`orm_mode = True`	`from_attributes = True` (in ConfigDict) (Pydantic Docs)
Computed Fields	Unofficial patterns	Official support with `@computed_field` (Pydantic Docs)
Arbitrary Type Validation	Limited	Validate/dump arbitrary types with `TypeAdapter` (Pydantic Docs)

4) Serialization/Deserialization (v2 style)

user = UserIn.model_validate({"username": "alice", "email": "a@b.com", "age": 20})
payload = user.model_dump(exclude_none=True)

model_dump() recursively converts nested models to dicts. dict(user) does not do this recursively. (Pydantic Docs)

5) Custom Validation (validators)

from pydantic import BaseModel, field_validator, model_validator, ValidationInfo

class Signup(BaseModel):
    password: str
    password_repeat: str

    @field_validator("password_repeat", mode="after")
    @classmethod
    def passwords_match(cls, v, info: ValidationInfo):
        if v != info.data["password"]:
            raise ValueError("Passwords do not match")
        return v

    @model_validator(mode="after")
    def strong_password(self):
        if len(self.password) < 8:
            raise ValueError("Password too short")
        return self

In v2, use @field_validator and @model_validator. Access other field values via ValidationInfo.data. (Pydantic Docs, Stack Overflow)

6) Aliases / Case Conversion

from pydantic import BaseModel, Field, AliasChoices
from pydantic.config import ConfigDict

class Item(BaseModel):
    # Allow multiple input keys (validation_alias)
    user_id: int = Field(validation_alias=AliasChoices("user_id", "userId"))

    # Alias generator for the whole model (e.g., snake -> camel)
    model_config = ConfigDict( # v2
        alias_generator=lambda s: ''.join([s.split('_')[0]] + [p.title() for p in s.split('_')[1:]]),
        populate_by_name=False, # In v2.11+, validate_by_name/alias is recommended
    )

In v2, you can increase input key flexibility with validation_alias and AliasChoices/AliasPath. Alias generators are also supported. To use aliases in a FastAPI response, set response_model_by_alias=True. (Pydantic Docs)
populate_by_name is scheduled for a spec change in v3—using validate_by_name/validate_by_alias is recommended. (Pydantic Docs)

7) ORM Integration (SQLAlchemy, etc.)

from pydantic import BaseModel
from pydantic.config import ConfigDict

class UserDTO(BaseModel):
    model_config = ConfigDict(from_attributes=True) # Replaces v1's orm_mode
    id: int
    username: str

# SQLAlchemy object -> DTO
dto = UserDTO.model_validate(sa_user) # Instead of from_orm

In v2, from_orm is gone. Use the combination of from_attributes=True + model_validate(). Be careful when accessing lazily-loaded collections. (Pydantic Docs, Stack Overflow)

8) Computed Fields (Including derived values in the response)

from pydantic import BaseModel, computed_field

class Rectangle(BaseModel):
    width: int
    length: int

    @computed_field
    @property
    def area(self) -> int:
        return self.width * self.length

With @computed_field, you can include computed results in the serialized response. (Pydantic Docs)

9) Customizing Request Examples/Schema (Documentation)

from pydantic import BaseModel

class Item(BaseModel):
    name: str
    price: float
    model_config = {
        "json_schema_extra": {
            "examples": [{"name": "Foo", "price": 35.4}]
        }
    }

In v2, add examples via model_config["json_schema_extra"]. (FastAPI Docs)

10) Models for Query/Form Data Too

from typing import Annotated, Literal
from fastapi import Query, Form
from pydantic import BaseModel, Field

class FilterParams(BaseModel):
    limit: int = Field(100, gt=0, le=100)
    order_by: Literal["created_at", "updated_at"] = "created_at"

@app.get("/items/")
def list_items(filter_query: Annotated[FilterParams, Query()]):
    return filter_query

class LoginForm(BaseModel):
    username: str
    password: str

@app.post("/login")
def login(form: Annotated[LoginForm, Form()]):
    return {"ok": True}

You can use models for querystrings/forms directly. (Form model constraints/validation improved starting from FastAPI 0.114.0). (FastAPI Docs)

11) Settings Management: pydantic-settings v2

from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_prefix="APP_", validate_default=False)
    db_url: str = "sqlite:///app.db"

settings = Settings() # Reflects APP_DB_URL=... environment variable

In v2, BaseSettings comes in a separate package, and settings like env_prefix and validate_default are configured in SettingsConfigDict. (Pydantic Docs)

12) Strict Mode (Type Coercion) & TypeAdapter

Strict Mode: If you have requirements like "do not automatically convert the string '123' to an int," you can enable strict mode at the model, field, or call level. (Pydantic Docs)
TypeAdapter: Useful when you need to validate/dump non-model types like List[UserOut] just once. (Pydantic Docs)

Search This Blog

Software Engineer's Blog

Changing the Default Terminal to Terminator on Ubuntu