Changing the Default Terminal to Terminator on Ubuntu

In the modern Python ecosystem, it's impossible to discuss data validation and settings management without mentioning Pydantic. It has become the de-facto standard, empowering countless developers to increase productivity by clearly defining data structures with type hints and enforcing them at runtime.
This article will serve as your practical guide to mastering Pydantic v2. We will explore its core use cases with hands-on examples, from API development with FastAPI to settings management. Most importantly, we provide a detailed v1-to-v2 migration checklist to help you upgrade your codebase with confidence.
This is one of Pydantic's most well-known and powerful use cases. It plays a core role, especially when used with the FastAPI framework.
from fastapi import FastAPI
from pydantic import BaseModel, EmailStr, Field
# Create FastAPI application
app = FastAPI()
# --- Define Pydantic Models ---
# 1. Request Model: The "Entry Guard"
# This defines the rules for the data a client must send to create a user.
class UserIn(BaseModel):
username: str = Field(..., min_length=3, description="Username must be at least 3 characters long.")
email: EmailStr # Pydantic automatically validates the email format.
age: int | None = None # An optional field that can be omitted.
# 2. Response Model: The "Exit Guard"
# This defines the rules for the data the server will send back to the client.
class UserOut(BaseModel):
id: int
username: str
email: EmailStr
# --- Define API Endpoint ---
@app.post("/users/", response_model=UserOut, summary="Create User")
async def create_user(user: UserIn):
"""
Creates a new user, saves it to the database,
and returns the created user's information.
"""
# 1. Request Validation (Handled automatically)
# Before this function runs, FastAPI automatically validates the 'user' parameter
# against the UserIn model.
# If the username is less than 3 characters or the email format is invalid,
# FastAPI automatically sends a 422 error back to the client.
# This part simulates "saving to the database."
# In a real application, you would save to a DB and get a unique ID.
new_user_id = 1
# 2. Response Serialization (Handled by response_model)
# The return value combines the user object and the new id.
# FastAPI automatically formats this return value to match the UserOut model.
# The 'age' field from the input model (UserIn) is not in the response model (UserOut),
# so it will be excluded from the final JSON response.
return {
"id": new_user_id,
"username": user.username,
"email": user.email,
}
Pydantic is very useful for managing various configuration values needed to run an application (database addresses, API keys, secret values, etc.). This is done using an extension library called pydantic-settings
.
.env
files, and manage them as a single object.# .env file
# Database connection string
APP_DB_URL="postgresql://user:password@localhost/mydatabase"
# External API key
APP_API_KEY="abc-123-def-456"
# config.py file
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
"""
A class to manage the application's settings.
"""
# 1. Define settings (with type hints)
# Automatically finds and populates the value from the APP_DB_URL
# in the .env file or environment variables.
db_url: str
# Automatically finds and populates the value for APP_API_KEY.
api_key: str
# If the value is not found in environment variables or the .env file,
# it uses the default value below.
log_level: str = "INFO"
# 2. Define the behavior of the settings model
# Use model_config to control the behavior of pydantic-settings.
model_config = SettingsConfigDict(
env_prefix="APP_", # Look for environment variables prefixed with 'APP_'
env_file=".env", # Read a .env file and treat its contents as environment variables
env_file_encoding="utf-8" # Specify .env file encoding
)
# 3. Create and use the settings object
# The moment this code runs, Pydantic reads and validates the settings.
settings = Settings()
# Import this settings object elsewhere in your application.
print("Database URL:", settings.db_url)
print("API Key:", settings.api_key)
print("Log Level:", settings.log_level)
# print(settings.model_dump_json(indent=2))
Instead of handling data with simple dict
and list
types, you can use Pydantic's BaseModel
to create data objects with a clear structure.
Using simple dict
and list
types to represent data structures leads to the following problems.
# Handling data with only standard dicts and lists
def print_post_summary(post_data: dict):
# Problem 1: It's easy to make typos in key names.
# Typing 'auther' instead of 'author' doesn't raise an immediate error.
# print(f"'{post_data['title']}' by {post_data['auther']['name']}") # Raises KeyError!
# Problem 2: You have to guess the data structure.
# It's hard to know if the 'author' key contains a 'name', or if 'comments' is a list,
# until you inspect the actual data or run the code.
# Problem 3: Data can be missing.
# if 'comments' in post_data and post_data['comments']:
# print(f" - There are a total of {len(post_data['comments'])} comments.")
# else:
# print(" - No comments.")
# Example data
post_1 = {
"title": "My First Post",
"content": "Hello world!",
"author": {"name": "John Doe", "email": "john@example.com"},
"comments": [
{"author_name": "Alice", "text": "Great post!"},
{"author_name": "Bob", "text": "Welcome to the blog."}
]
}
# Case where the 'comments' key is missing
post_2 = {
"title": "Another Post",
"content": "...",
"author": {"name": "Jane Doe", "email": "jane@example.com"}
# 'comments' key is missing
}
# print_post_summary(post_1)
# print_post_summary(post_2)
Now, let's create a "blueprint" for our data using Pydantic's BaseModel
.
from pydantic import BaseModel, EmailStr
from typing import List
# --- Define the data's 'blueprint' ---
class Author(BaseModel):
name: str
email: EmailStr # Even validates the email format
class Comment(BaseModel):
author_name: str
text: str
class BlogPost(BaseModel):
title: str
content: str
author: Author # Use the Author model as a nested type
comments: List[Comment] = [] # Defaults to an empty list if no comments
# --- The improved function ---
def print_post_summary_pydantic(post: BlogPost):
# Advantage 1: Autocompletion and type safety
# When you type 'post.', the IDE will suggest title, content, author, etc.
# The chance of a typo is almost zero. post.auther (X)
print(f"'{post.title}' by {post.author.name} ({post.author.email})")
# Advantage 2: No need to worry about missing data
# The BlogPost model guarantees that 'comments' is always a list.
if post.comments:
print(f" - There are a total of {len(post.comments)} comments.")
else:
print(" - No comments.")
# --- Creating and using the data ---
# Convert dictionary data into a Pydantic model (parsing and validation)
try:
# post_1 is the same dict used above.
blog_post_1 = BlogPost.model_validate(post_1)
print_post_summary_pydantic(blog_post_1)
print("-" * 20)
# post_2 is missing 'comments', but it will be handled by the model's default value ([]).
blog_post_2 = BlogPost.model_validate(post_2)
print_post_summary_pydantic(blog_post_2)
except Exception as e:
print("Data validation failed:", e)
Pydantic is used to ensure data quality during the process of extracting, transforming, and loading (ETL) data from various sources like CSV files, databases, or external APIs.
Consider a situation where we need to process raw user data (e.g., from a CSV or external API) with the following problems:
email_address
.name
) is missing.status
value is missing.import datetime
from typing import List, Optional
from pydantic import BaseModel, EmailStr, Field, ValidationError, field_validator
# 1. Raw Data (Extract phase)
# Assume this is messy data from a CSV, DB, or API.
raw_user_data = [
{'user_id': '1', 'name': 'John Doe', 'email_address': 'john.doe@example.com'},
{'user_id': 2, 'name': 'jane doe', 'email_address': 'jane.doe@example.com', 'status': 'active'},
{'user_id': '3', 'name': 'Peter Pan', 'email_address': 'invalid-email'}, # Invalid email
{'user_id': 4, 'name': None, 'email_address': 'peter@example.com'}, # Missing name
]
# 2. Define Data Blueprint (Rules for the Transform phase)
# We will clean and validate data by passing it through this model.
class CleanUser(BaseModel):
user_id: int # Automatically converts string '1' to number 1
name: str # Name must be a string (None is not allowed)
# Map the value from the key 'email_address' to the 'email' field (alias)
email: EmailStr = Field(..., alias='email_address')
# If 'status' is missing, set 'inactive' as the default value
status: str = 'inactive'
# The registration date is automatically set to the time the Pydantic model is created.
created_at: datetime.datetime = Field(default_factory=datetime.datetime.now)
# A validator to further process a specific field
@field_validator('name')
@classmethod
def clean_name(cls, v: str):
# Capitalize the first letter of the name (e.g., 'jane doe' -> 'Jane Doe')
return v.title()
# 3. Run the ETL Pipeline
def process_users(raw_data: List[dict]):
clean_data = []
invalid_data = []
print("Starting data processing pipeline...")
for i, record in enumerate(raw_data):
try:
# Validate and clean the data with the Pydantic model.
clean_user = CleanUser.model_validate(record)
clean_data.append(clean_user)
print(f" - Record {i+1} processed successfully: {clean_user.name}")
except ValidationError as e:
# If validation fails, store it separately with error information.
invalid_data.append({'original_record': record, 'error': e.errors()})
print(f" - Record {i+1} failed to process!")
print("...Data processing pipeline finished\n")
return clean_data, invalid_data
# Run the pipeline and check the results
processed_users, failed_records = process_users(raw_user_data)
print("--- ✅ Successfully Processed Data ---")
for user in processed_users:
print(user.model_dump_json(indent=2))
print("\n--- ❌ Failed Data and Reasons ---")
import json
print(json.dumps(failed_records, indent=2, default=str))
Data Cleansing
CleanUser
model isn't just a data structure; it's the set of rules for cleansing data.user_id: int
automatically converts the string '1'
to the number 1
.Field(alias='email_address')
maps the value to the model's email
field even if the input key name is different (email_address
).status: str = 'inactive'
automatically fills in a default value if the original data is missing it, improving data consistency.@field_validator
transforms data like 'jane doe'
into a consistent format like 'Jane Doe'
.Maintaining Data Integrity
try...except ValidationError
block is the core of the ETL pipeline.CleanUser
model is added to the clean_data
list.invalid_data
along with the reason for failure. This allows you to track and fix problematic data later.pip install "fastapi[standard]" "uvicorn[standard]" pydantic>=2 pydantic-settings
Topic | v1 | v2 |
---|---|---|
Serialization | model.dict() |
model.model_dump() (FastAPI Docs, Pydantic Docs) |
Deserialization/Parsing | parse_obj , from_orm |
BaseModel.model_validate(...) (+ from_attributes=True ) (Pydantic Docs) |
JSON Parsing | parse_raw |
model_validate_json (Pydantic Docs) |
Configuration Class | class Config: |
model_config = ConfigDict(...) (Pydantic Docs) |
Validation Decorators | @validator , @root_validator |
@field_validator , @model_validator (Pydantic Docs, GitHub Discussion) |
ORM Mode | orm_mode = True |
from_attributes = True (in ConfigDict) (Pydantic Docs) |
Computed Fields | Unofficial patterns | Official support with @computed_field (Pydantic Docs) |
Arbitrary Type Validation | Limited | Validate/dump arbitrary types with TypeAdapter (Pydantic Docs) |
user = UserIn.model_validate({"username": "alice", "email": "a@b.com", "age": 20})
payload = user.model_dump(exclude_none=True)
model_dump()
recursively converts nested models to dicts. dict(user)
does not do this recursively. (Pydantic Docs)from pydantic import BaseModel, field_validator, model_validator, ValidationInfo
class Signup(BaseModel):
password: str
password_repeat: str
@field_validator("password_repeat", mode="after")
@classmethod
def passwords_match(cls, v, info: ValidationInfo):
if v != info.data["password"]:
raise ValueError("Passwords do not match")
return v
@model_validator(mode="after")
def strong_password(self):
if len(self.password) < 8:
raise ValueError("Password too short")
return self
@field_validator
and @model_validator
. Access other field values via ValidationInfo.data
. (Pydantic Docs, Stack Overflow)from pydantic import BaseModel, Field, AliasChoices
from pydantic.config import ConfigDict
class Item(BaseModel):
# Allow multiple input keys (validation_alias)
user_id: int = Field(validation_alias=AliasChoices("user_id", "userId"))
# Alias generator for the whole model (e.g., snake -> camel)
model_config = ConfigDict( # v2
alias_generator=lambda s: ''.join([s.split('_')[0]] + [p.title() for p in s.split('_')[1:]]),
populate_by_name=False, # In v2.11+, validate_by_name/alias is recommended
)
validation_alias
and AliasChoices/AliasPath
. Alias generators are also supported. To use aliases in a FastAPI response, set response_model_by_alias=True
. (Pydantic Docs)populate_by_name
is scheduled for a spec change in v3—using validate_by_name/validate_by_alias
is recommended. (Pydantic Docs)from pydantic import BaseModel
from pydantic.config import ConfigDict
class UserDTO(BaseModel):
model_config = ConfigDict(from_attributes=True) # Replaces v1's orm_mode
id: int
username: str
# SQLAlchemy object -> DTO
dto = UserDTO.model_validate(sa_user) # Instead of from_orm
from_orm
is gone. Use the combination of from_attributes=True
+ model_validate()
. Be careful when accessing lazily-loaded collections. (Pydantic Docs, Stack Overflow)from pydantic import BaseModel, computed_field
class Rectangle(BaseModel):
width: int
length: int
@computed_field
@property
def area(self) -> int:
return self.width * self.length
@computed_field
, you can include computed results in the serialized response. (Pydantic Docs)from pydantic import BaseModel
class Item(BaseModel):
name: str
price: float
model_config = {
"json_schema_extra": {
"examples": [{"name": "Foo", "price": 35.4}]
}
}
model_config["json_schema_extra"]
. (FastAPI Docs)from typing import Annotated, Literal
from fastapi import Query, Form
from pydantic import BaseModel, Field
class FilterParams(BaseModel):
limit: int = Field(100, gt=0, le=100)
order_by: Literal["created_at", "updated_at"] = "created_at"
@app.get("/items/")
def list_items(filter_query: Annotated[FilterParams, Query()]):
return filter_query
class LoginForm(BaseModel):
username: str
password: str
@app.post("/login")
def login(form: Annotated[LoginForm, Form()]):
return {"ok": True}
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="APP_", validate_default=False)
db_url: str = "sqlite:///app.db"
settings = Settings() # Reflects APP_DB_URL=... environment variable
BaseSettings
comes in a separate package, and settings like env_prefix
and validate_default
are configured in SettingsConfigDict
. (Pydantic Docs)List[UserOut]
just once. (Pydantic Docs)
Comments
Post a Comment