Python String encode() Method
The encode()
method is used to convert a string into bytes using a specified encoding. It is particularly useful for storing strings in binary format or transmitting data over a network.
Syntax
string.encode(encoding="utf-8", errors="strict")
Parameters
encoding
(Optional): The encoding to use. Default is "utf-8"
. Other common encodings are "ascii"
, "latin-1"
, and "utf-16"
.
errors
(Optional): Specifies how to handle encoding errors.
'strict'
(Default): Raises aUnicodeEncodeError
on failure.'ignore'
: Ignores characters that cannot be encoded.'replace'
: Replaces unencodable characters with a question mark (?
) or a replacement character.'xmlcharrefreplace'
: Replaces unencodable characters with an appropriate XML character reference.'backslashreplace'
: Replaces unencodable characters with a backslash escape sequence.
Return Value
Returns an encoded version of the string as a bytes object.
Examples
Default Encoding (UTF-8)
text = "café"
encoded_text = text.encode()
print(encoded_text) # Output: b'caf\xc3\xa9'
Notice the b
prefix in the output, which signifies a bytes
literal.
You can convert the bytes back to a string using the decode()
method.
Specifying Encoding
text = "café"
encoded_text = text.encode("ascii") # Raises UnicodeEncodeError
print(encoded_text)
The above code raises an UnicodeEncodeError
because "é"
is not an ASCII character.
Handling Errors with ignore
and replace
Using ignore
text = "café"
encoded_text = text.encode("ascii", errors="ignore")
print(encoded_text) # Output: b'caf'
The errors="ignore"
parameter tells Python to skip any characters that cannot be encoded using the specified encoding (in this case, ASCII). As a result, the unencodable character "é"
is simply removed from the output.
Using replace
text = "café"
encoded_text = text.encode("ascii", errors="replace")
print(encoded_text) # Output: b'caf?'
The errors="replace"
parameter tells Python to substitute any characters that cannot be encoded using the specified encoding (in this case, ASCII) with a replacement character, typically a question mark. Therefore, the unencodable character "é"
is replaced with "?"
, resulting in the output b'caf?'
.