Python: Reading and Writing JSON Files
JSON (JavaScript Object Notation) is a lightweight, text-based data used to store and exchange structured data, organized using key-value pairs and arrays. Despite its name, you don’t need to know JavaScript to learn or use JSON.
Whether you are working with REST APIs, managing configuration settings, or exchanging data between the frontend and backend, you will encounter JSON.
Python’s Built-in JSON module
Python provides a built-in json module that allows you to work with JSON files easily. This module enables you to convert Python objects (such as dictionaries and lists) into JSON format, and parse JSON data back into Python objects.
The json module provides four commonly used functions:
json.dump(): Converts a Python object to JSON and writes it directly to a filejson.dumps(): Converts a Python object to a JSON-formatted stringjson.load(): Reads JSON data from a file and converts it into a Python objectjson.loads(): Parses a JSON-formatted string and converts it into a Python object
Note: The 's' in dumps() and loads() stands for string. Use these functions when working with JSON data in memory, rather than with actual files.
Before working with a JSON file, you need to import the json module at the top of your file using the code below:
import json
JSON vs Python Data Types
When working with JSON data types in Python, the JSON data is automatically converted to its corresponding Python data types.
| JSON Type | Python Type |
| object | dict |
| array | list |
| string | str |
| number (int) | int |
| number (real) | float |
| true | True |
| false | False |
| null | None |
Reading JSON Files in Python
There are different ways to read JSON files in Python, each suited to various data formats and use cases. Let’s explore these methods with hands-on example to help you efficiently handle JSON data in your projects:
Reading JSON From a File Using json.load()
The json.load() method reads a JSON file and converts it into a Python object.
First let’s create a data.json file using VS Code or any other text editor (Notepad on Windows, TextEdit on macOS, Nano/Vim on Linux) and paste the following data:
{
"name": "James",
"age": 35,
"is_employed": true,
"skills": ["Python", "Data Analysis", "Machine Learning"],
"education": {
"degree": "Master's in Computer Science",
"university": "New York Institute of Technology",
"graduation_year": 2014
}
}
Here is how my project structure looks:
project_folder/
├── app.py
└── data.json
Now let’s read it using json.load():
import json
# Open and read the JSON file
with open('data.json', 'r') as file:
data = json.load(file)
# Print the data
print(data)
Output:
{'name': 'James', 'age': 35, 'is_employed': True, 'skills': ['Python', 'Data Analysis', 'Machine Learning'], 'education': {'degree': "Master's in Computer Science", 'university': 'New York Institute of Technology', 'graduation_year': 2014}}
Handling JSON Strings With json.loads()
Sometimes, you may need to parse a JSON string directly, such as when receiving data from a web API, rather than reading from a file. In these cases, you can use json.loads() to convert the JSON string into a Python object.
import json
# JSON response from an API response
json_string = '{"name": "James","age": 35, "city": "Barcelona"}'
# Parse the JSON string into a Python dictionary
data = json.loads(json_string)
print(data)
Output:
{'name': 'James', 'age': 35, 'city': 'Barcelona'}
Writing JSON Files in Python
Writing data to JSON files is just as straightforward as reading them. Let’s explore the best ways to save Python data as JSON.
Writing JSON Data to a File Using json.dump()
The json.dump() function saves Python data to a file in JSON format.
import json
user_data = {
"name": "Michael",
"age": 50,
"is_employed": True,
"skills": ["singing", "dancing", "acting"]
}
# Writing JSON data to a file
with open("user_data.json", "w", encoding="utf-8") as file:
json.dump(user_data, file)
After running the code above, you will see a user_data.json file created in your project directory containing the following compressed data:
{"name": "Michael", "age": 50, "is_employed": true, "skills": ["singing", "dancing", "acting"]}
Making JSON Files Readable in Python
By default, json.dump() writes JSON data in a compact form, which can be hard to read. To make your JSON file more readable, you can use the indent parameter to add indentation and the sort_keys parameter to organize the keys alphabetically.
import json
user_data = {
"name": "Michael",
"age": 50,
"is_employed": True,
"skills": ["singing", "dancing", "acting"]
}
# Writing JSON data to a file
with open("user_data.json", "w", encoding="utf-8") as file:
json.dump(user_data, file, indent=4, sort_keys=True)
After running the above code, the user_data.json file will be updated with neatly formatted data like this:
{
"age": 50,
"is_employed": true,
"name": "Michael",
"skills": [
"singing",
"dancing",
"acting"
]
}
Converting Python Objects to JSON Strings With json.dumps()
Sometimes, you need a JSON-formatted string rather than writing the data to a file. For this purpose, you can use json.dumps().
import json
user_data = {
"name": "Michael",
"age": 50,
"is_employed": True,
"skills": ["singing", "dancing", "acting"]
}
# Convert the Python dictionary to a JSON string
json_string = json.dumps(user_data)
print(json_string)
Output:
{"name": "Michael", "age": 50, "is_employed": true, "skills": ["singing", "dancing", "acting"]}
You can use the indent parameter to add indentation.
import json
user_data = {
"name": "Michael",
"age": 50,
"is_employed": True,
"skills": ["singing", "dancing", "acting"]
}
# Convert the Python dictionary to a JSON string
json_string = json.dumps(user_data, indent=4)
print(json_string)
Output:
{
"name": "Michael",
"age": 50,
"is_employed": true,
"skills": [
"singing",
"dancing",
"acting"
]
}
Using The separators Parameter for Compact JSON
You should use the separators parameter when you want more compact JSON output and reduce necessary whitespace.
By default, json.dumps() uses:
", ", ": "
That means it adds spaces after commas and colons for readibility.
When you specify:
json.dumps(data, separator=(",", ":"))
you remove those extra spaces, producing a minified JSON string.
Let’s see a working code example:
import json
user_data = {
"name": "Michael",
"age": 50,
"is_emloyed": True,
"skills": ["singing", "dancing", "acting"]
}
# Convert the Python dictionary to a JSON string
json_string = json.dumps(user_data, separators=(',', ':'))
print(json_string)
Output:
{"name":"Michael","age":50,"is_emloyed":true,"skills":["singing","dancing","acting"]}
Creating a Custom JSON Encoder
Many native Python types such as datetime, objects, Decimal values, sets, tuples, bytes, and custom classes cannot be serialized directly using Python’s built-in json module.
When you try to serialize these unsupported types with json.dumps(), Python raises a TypeError. To handle this and produce valid JSON output, you can create a custom JSON encoder. This allows you to define exactly how special or complex Python objects should be converted into JSON-serializable representations, giving you full control over the encoding process.
You can create a custom JSON encoder by either providing a custom function to the default parameter of json.dumps() (or json.dump()) or by creating a subclass of json.JSONEncoder class and overriding its default() method.
Using the default parameter (function-based)
You use this approach for occasional use or when you need to handle a few specific types.
In this approach, you define a function that takes an object and either returns a JSON-serialization version of it or raises a TypeError if the object cannot be handled and pass it to the default parameter of json.dumps() (or json.dump()).
Example: Custom JSON encoder for Sets and Tuples
import json
def custom_encoder(obj):
if isinstance(obj, set):
return list(obj)
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
if isinstance(obj, tuple):
return list(obj)
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
data = {
'unique_id': {1, 2, 3},
'coordinate':(35.67, 139.65)
}
json_data = json.dumps(data, default=custom_encoder, indent=4)
print(json_data)
Output:
{
"unique_id": [
1,
2,
3
],
"coordinate": [
35.67,
139.65
]
}
In this example, we convert the set and tuple into lists when serializing the data to JSON because the standard JSON format does not support these Python data types.
Subclassing json.JSONEncoder (Class-based)
You use this approach when you need a reusable, structured solution for handling multiple custom types.
In this approach, you create a custom class that inherits from json.JSONEncoder, override the default() method to define how unsupported objects should be converted into a JSON-serialized format, and then pass this encoder class to the cls parameter of json.dumps() (or json.dump()).
First, let’s look at a simple example of serializing a datetime objects to JSON:
Example: Serializing datetime objects using a custom JSON decoder
import json
from datetime import datetime
# Custom JSON encoder to handle datetime objects
class CustomJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat() # Convert datetime to ISO format string
return super().default(obj)
# Example data containing a datetime object
data = {
"name": "James",
"age": 35,
"joined": datetime.now()
}
with open('registration_data.json', 'w') as file:
json.dump(data, file, cls=CustomJSONEncoder, indent=4)
After running this code, you will see a registration_data.json file created in your project directory containing the following data:
{
"name": "James",
"age": 35,
"joined": "2026-02-14T20:26:49.438815"
}
Now let’s see how to serialize complex Python objects like datetime, Decimal, and even custom classes into JSON by using a custom JSONEncoder.
Example: Serializing complex Python objects with a custon JSON encoder
import json
from datetime import datetime
from decimal import Decimal
# Custom class
class User:
def __init__(self, name, age):
self.name = name
self.age = age
# Custom JSON Encoder handling multiple types
class CustomJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat() # Convert datetime to ISO format string
if isinstance(obj, Decimal):
return str(obj) # Convert Decimal to string to preserve precision
if isinstance(obj, User):
return {
'name': obj.name,
'age': obj.age
} # Convert User object to a dictionary
return super().default(obj)
# Sample data containing multiple types
data = {
"User": User("James", 35),
"created_at": datetime.now(),
"account_balance": Decimal("1000.567")
}
# Serialize the data to JSON using the custom encoder
json_data = json.dumps(data, cls=CustomJSONEncoder, indent=4)
print(json_data)
Output:
{
"User": {
"name": "James",
"age": 35
},
"created_at": "2026-02-15T19:23:39.183500",
"account_balance": "1000.567"
}
Creating a Custom JSON Decoder
You can also customize how JSON is converted back into Python objects. Python’s built-in json module allows us to define a custom decoder by using the object_hook parameter in json.loads().
import json
# Custom class
class User:
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return f"User(name={self.name}, age={self.age})"
# Custom JSON decoder
def custom_decoder(obj):
if "name" in obj and "age" in obj:
return User(obj["name"], obj['age'])
return obj
# Sample JSON data
json_data = '''{
"User": {
"name": "James",
"age": 35
}
}'''
# Deserialize JSON to User object
data = json.loads(json_data, object_hook=custom_decoder)
print(data)
Output:
{'User': User(name=James, age=35)}
Handling Large JSON Files
Loading large JSON files entirely into memory can lead to performance issues. Here are some strategies for handling big files efficiently.
Streaming JSON Data
For very large JSON files, consider using the ijson library, which allows you to parse JSON incrementally.
But first install ijson using the command below:
pip install ijson
Let’s create a large_file.json file using VS Code in your project directory and paste this JSON data:
{
"users": [
{
"id": 1,
"name": "James",
"email": "james@example.com",
"address": {
"street": "Fifth Avenue",
"city": "New York",
"state": "NY",
"zip": "10022"
}
},
{
"id": 2,
"name": "Bruce",
"email": "bruce@email.com",
"address": {
"street": "Sunset Boulevard",
"city": "Los Angeles",
"state": "CA",
"zip": "90026"
}
}
]
}
Let me give you a basic overview of the structure of the JSON data above. The root element contains a key called "users", which is an array. Each element of this array represents user object with several properties:
- id: A unique identifier for the user.
- name: The name of the user.
- email: The email address of the user.
- address: An object containing the user’s address details, including:
- street – The street name.
- city – The city name.
- state – The state abbreviation.
- zip – The ZIP code.
Now let’s stream and process each user item from large_file.json using ijson.
import ijson
# Stream large JSON file
with open('large_file.json', 'r') as file:
users = ijson.items(file, 'users.item') # Stream each user object in the 'users' array
for user in users:
# Process each user object as it is read
print(f"Name: {user['name']}")
print(f"Email: {user['email']}")
# Access and print the addresses details
address = user['address']
print(f"Street: {address['street']}")
print(f"City: {address['city']}")
print(f"State: {address['state']}")
print(f"Zip: {address['zip']}")
Output:
Name: James
Email: james@example.com
Street: Fifth Avenue
City: New York
State: NY
Zip: 10022
Name: Bruce
Email: bruce@email.com
Street: Sunset Boulevard
City: Los Angeles
State: CA
Zip: 90026
In the above code, ijson.items(file, 'users.item') streams one complete user dictionary at a time, including nested addresses, making it memory efficient, even for JSON files with thousands of users.
JSON Schema Validation
JSON schema allows us to validate the structure of JSON data, ensuring it adheres to a predefined format. It is essentially a blueprint for the data, specifying what properties are required, what data types are allowed, and any other constraints on the data.
To perform JSON schema validataion, you can use the jsonschema library, which provides an easy way to validate your JSON objects against a schema.
First you should install jsonschema using the following code in your terminal:
pip install jsonschema
Example:
Let’s assume we want to validate this JSON structure.
Sample JSON Data:
Copy and paste this JSON data to replace the contents of the large_file.json in your project directory.
{
"users": [
{
"id": 1,
"name": "James",
"email": "james@example.com"
},
{
"id": 2,
"name": "Bruce",
"email": "bruce@email.com"
}
]
}
JSON Schema:
You will define the expected structure for this data, including the properties and their types.
{
"type": "object",
"properties": {
"users": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"},
},
"required": ["id", "name", "email"]
}
}
},
"required": ["users"]
}
Validation Code:
import json
import jsonschema
from jsonschema import validate, exceptions
# Load the JSON data from a file
with open('large_file.json', 'r') as file:
json_data = json.load(file)
schema = {
"type": "object",
"properties": {
"users": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"},
},
"required": ["id", "name", "email"]
}
}
},
"required": ["users"]
}
# Validate JSON data against the schema
try:
validate(instance=json_data, schema=schema)
print("JSON data is valid.")
except exceptions.ValidationError as e:
print("JSON data is invalid:", e.message)
Output:
JSON data is valid.