Extract a Number from a String in Python
In this guide, I will walk you through different methods to extract different types of numbers from a string in Python. We will extract integers, floats, negative numbers, and scientific notation numbers from a string.
We will discuss the following approaches:
- Using regular expression
- Using list comprehension, split() and isdigit()
- Using isdigit() method and generator expression
- Using isdigit() and filter() along with lambda function
- Using for loop and isdigit() method
# Using regular expressions
You can use the re.findall()
function of the re
module to extract numbers from a string in Python.
Here’s an example:
import re
def extract_numbers(input_string):
# Define regular expression pattern for matching numbers
pattern = r'\d+'
# Use re.findall() to find all occurrences of the pattern in the input string
numbers = re.findall(pattern, input_string)
return numbers
print(extract_numbers("I have 4 iPhone 15 Pro Max")) # Output: ['4', '15']
In this example, the regular expression \d+
is used, where \d
matches any digit (0-9) and +
means one or more occurrences. The re.findall()
function returns a list of all matches in the input string.
If your sentence contains a decimal point like "The price is $12.34"
. The output will be ['12', '34']
. The above code doesn’t recognize float.
Note: It is important to note that the output of the extract_numbers()
function is a list of strings representing the extracted numbers. If you intend to perform mathematical calculations with these numbers, you should convert them to integers or floats.
Here’s an updated example:
import re
def extract_numbers(input_string):
# Define regular expression pattern for matching numbers
pattern = r'\d+'
# Use re.findall() to find all occurrences of the pattern in the input string
numbers = re.findall(pattern, input_string)
return numbers
result = extract_numbers("I have 4 iPhone 15 Pro Max")
print(result) # Output: ['4', '15']
# Converting the extracted numbers to integers
int_numbers = [int(num) for num in result]
print(int_numbers) # Output: [4, 15]
If you need to convert the extracted numbers to float, use the float()
method.
Handling float (decimal point numbers)
If you need to extract float (decimal point numbers) from a string, you must handle them explicitly.
Here’s an example:
import re
def extract_numbers(input_string):
# Define a regex pattern to match both integers and floats with non-overlapping
pattern = r'\d+(?:\.\d+)?'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
return numbers
# Output: ['5', '123.45']
print(extract_numbers("The price of 5 pizzas is $123.45"))
In this example, the regular expression pattern r'\d+(?:\.\d+)?'
is used to match one or more digits (\d+
), optionally followed by a period and one or more digits ((?:\.\d+)?
). This allows the pattern to match both integers and floats. The re.findall()
function is then used to find all non-overlapping occurrences of this pattern in the input string, and the resulting numbers are returned as a list.
Here’s a complete breakdown of the regular expression \d+(?:\.\d+)?
:
\d+
: Matches one or more digits (\d
). The \d
is a shorthand character class representing any digit from 0 to 9. The +
quantifier ensures that there is at least one digit, but it can match more if they are consecutive.
(?: ...
)?: This is a non-capturing group, denoted by (?: ...)
, and the ?
at the end makes the entire group optional. It allows the pattern inside the group to appear zero or one time. This is used here to make the decimal part of a float optional.
\.\d+
: Inside the non-capturing group, this part matches a literal dot (\.
) followed by one or more digits (\d+
). It represents the decimal part of a floating point number.
So, the entire expression \d+(?:\.\d+)?
can be interpreted as follows:
\d+
: Matches one or more digits.
(?:\.\d+)?
: Optionally, matches a dot followed by one or more digits (representing the decimal part of a floating point number).
This regular expression is designed to match both integers and floats in a given string.
Important:
The above code may not give you the desired output if your input string contains numeric values with multiple dots like version numbers or IP addresses.
E.g. If your input string is "Current version of iOS is 16.7.5"
, the output would be ['16.7', '5']
. If you want to handle string with multiple dots, we have discussed this later in the guide here.
If you want to extract only integers and floats instead, we have also discussed it here.
Handling negative numbers
To handle negative numbers, you can extend the check by allowing a negative sign before the number.
Here’s an example:
import re
def extract_numbers(input_string):
# Define a regex pattern to match integers, floats, or negative numbers
pattern = r'-?\d+(?:\.\d+)?'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
return numbers
print(extract_numbers("It's -50°C in Alberta, Canada")) # Output: ['-50']
In this updated code, the regular expression -?\d+(?:\.\d+)?
has been extended to include an optional negative sign before the number.
-?
: This part allows for an optional negative sign before the numbers. Here’s the breakdown of the regex pattern -?
:
-
: Matches the literal hyphen or minus sign.
?
: Makes the preceding element (In this case, hyphen or minus sign) optional. It matches zero or one occurrence of the preceding element.
So, the combination -?
means “match zero or one minus sign (hyphen)”. This allows the regular expression to match both positive and negative numbers. If there is a minus sign, it will be matches; if there is none, it still matches.
Handling scientific notation numbers
If you want to handle scientific notation numbers, you can further modify the extract_numbers()
function to handle scientific numbers.
Here’s an example:
import re
def extract_numbers(input_string):
# Define a regex pattern to match integers, floats, negative numbers,
# and scientific notation numbers
pattern = r'-?\d+(?:\.\d+)?(?:[eE][-+]?\d+)?'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
return numbers
print(extract_numbers("The speed of light is 2.998e8")) # Output: 2.998e8
The updated pattern r'-?\d+(?:\.\d+)?(?:[eE][-+]?\d+)?'
includes the (?:[eE][-+]?\d+)?
part, which allows for matching scientific notation numbers. This part of the pattern looks for an optional exponent part.
Here’s a complete breakdown of the pattern (?:[eE][-+]?\d+)?
:
(?: …)
: This is a non-capturing group. It groups the enclosed pattern without capturing the matched result. It is used here to group the entire exponent part as a single unit.
[eE]
: This part matches the letter e
or E
. In scientific notation, the letter e
or E
is used to represent the exponent.
[-+]?
: This part matches an optional sign for the exponent part. It uses [-+]?
to match either a hyphen (-
) or a plus (+
) sign, and the ?
indicates that this part is optional.
\d+
: This part matches one or more digits. It represents the actual numeric value of the exponent.
Putting it all together, (?:[eE][-+]?\d+)?
matches an optional exponent part in the scientific notation.
Handling version numbers, IP addresses (Numbers with multiple periods)
If you want to extract numbers with multiple periods such as version numbers, or IP addresses from a string in Python, you can further extend the code to handle multiple periods in a number.
Here is an example:
import re
def extract_numbers(input_string):
pattern = r'-?\d+(?:\.\d+)*(?:[eE][-+]?\d+)?'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
return numbers
print(extract_numbers("Current version of ios on iPhone 15 Pro is 16.7.5"))
# Output: ['15', '16.7.5']
print(extract_numbers("My IP address is 192.158.1.38"))
# Output: ['192.158.1.38']
print(extract_numbers("The price of 5 pizzas is $123.45"))
# Output: ['5', '123.45']
print(extract_numbers("It's -50°C in Alberta, Canada"))
# Output: ['-50']
print(extract_numbers("The speed of light is 2.998e8"))
# Output: 2.998e8
In this example, the regular expression pattern r'-?\d+(?:\.\d+)*(?:[eE][-+]?\d+)?'
is modified to handle multiple occurrences of decimal part in a number. The (?:\.\d+)
captures a decimal part of a number. It ensures there is a dot followed by one or more digits. The quantifier *
after the entire non-capturing group (?:\.\d+)
means “zero or more occurrences” of the group. so the pattern (?:\.\d+)*
allows for zero or more repetitions of a dot followed by one or more digits.
Putting it all together (?:\.\d+)*
allows for zero or more occurrences of decimal point followed by one or more digits.
Extracting only numbers with multiple dots (version numbers or IP addresses)
If you want to only extract numbers with multiple dots like version number or IP address and leave the rest from a string, you can rewrite the code as follows:
import re
def extract_numbers(input_string):
pattern = r'\d+(?:\.\d+)+'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
return numbers
print(extract_numbers("Current version of ios on iPhone 15 Pro is 16.7.5"))
# Output: ['16.7.5']
print(extract_numbers("My IP address is 192.158.1.38"))
# Output: ['192.158.1.38']
print(extract_numbers("The price of 5 pizzas is $123.45"))
# Output: ['123.45']
The quantifier +
after the non-capturing group (?:\.\d+)
means “one or more occurrences” of the group. So the pattern (?:\.\d+)*
allows for one or more occurrences of a dot followed by one or more digits.
Putting it all together (?:\.\d+)*
allows for one or multiple occurrences of decimal points followed by one or more digits.
Extracting only integers and floats
To extract only integers and floats from a string in Python, we first extract integers, floats, and numbers with multiple decimal points from a string. Then, we check if a particular extracted value is a valid number (int or float) using the try-except
block and attempt to convert the string to a float using the float()
method. If the conversion is successful, then the numeric string is a valid number (integer or float).
Here’s an example:
import re
def extract_numbers(input_string):
pattern = r'\d+(?:\.\d+)*'
# Use the re.findall() to find all occurrences of the pattern
numbers = re.findall(pattern, input_string)
# Use try-except block to handle errors during conversion to float
final_numbers = []
for x in numbers:
try:
float(x)
final_numbers.append(x)
except ValueError:
pass # Ignore errors, i.e. non-numeric strings
print(final_numbers)
extract_numbers(
"iPhone 15 Pro costs $999.75 and it comes with ios version 16.7.5")
# Output: ['15', '999.75']
In this example, the try-except
block will catch any ValueError
that occurs during the attempt to convert the numeric string to a float. If there is an error, it will be ignored, and the loop will continue to the next iteration.
# Using list comprehension, split() and isdigit()
You can use the split()
method to split the string into a list of substrings and then use the isdigit()
method to check if each substring is a numeric value.
Here’s an example:
def extract_numbers(input_string):
# Split the string into a list of substring
substrings = input_string.split()
# Use list comprehension to filter and extract numbers
numbers = [int(x) for x in substrings if x.isdigit()]
return numbers
print(extract_numbers("I have 5 iPhone 15 Pro Max"))
# Output: [5, 15]
In this example, the list comprehension creates a new list (numbers
) by iterating through each substring in substrings
and adding the integer value of the substring to the list if it a digit.
Limitations:
Note that this approach assumes that the numbers in the string are integers and that they are separated by spaces. If your input has a different format, you might need to modify the code accordingly. E.g. If the input string is "noobmaster69pro iPhone 15 Pro"
, the output is [15]
.
# Using isdigit() method and generator expression
You can use the isdigit()
method along with a generator expression to extract a number from a string in Python.
Here’s an example:
def extract_numbers(input_string):
numbers = ''.join(char for char in input_string if char.isdigit())
return numbers
print(extract_numbers("The best smartphone is iPhone 15 Pro Max"))
# Output: 15
In this example, the generator expression (char for char in input_string if char.isdigit())
iterates over each character in the input string. The isdigit()
method is used to check if a character is a digit. The generator expression produces a sequence of characters that are digits from the input string. The join()
method is then used to concatenate these digit characters into a single string. The resulting string contains only the digits extracted from the input string.
Limitations:
- The code concatenates all the digits without preserving the order. For example, if the input string is
"abc123def456"
, the output would be"123456"
. - The above code only extracts digits, which means it ignores negative numbers or numbers with decimal points. If you want to include such cases, handling decimal points and negative numbers is quite challenging compared to other methods like using regular expressions.
# Using isdigit(), filter() methods with lambda function
You can also use the isdigit()
and filter()
methods along with the lambda function to extract numbers from a string in Python.
Here’s an example:
def extract_numbers(input_string):
filtered_digits = filter(lambda char: char.isdigit(), input_string)
result = ''.join(filtered_digits)
return result
print(extract_numbers("The best smartphone is iPhone 15 Pro Max"))
# Output: 15
In this example, we use the filter()
method along with the lambda function to create an iterator of characters that are digits. The lambda function lambda char: char.isdigit()
checks whether each character is a digit or not.
Then, we use the join()
method to join filtered digits to form a string containing only digits.
Limitations:
The code doesn’t handle whitespaces between digits. If the input string contains spaces between digits or contains multiple numbers, the output will be a concatenated string without any separation. E.g. If the input string is "I have 5 iPhone 15 Pro Max"
, the output will be "515"
.
# Using for loop and isdigit() method
The isdigit()
method in Python checks if all characters in a string are digits (0-9). However, you extract numbers from a string using this method by iterating through the characters of the string and building a new string consisting of only digits.
Here’s an example:
def extract_numbers(input_string):
result = ""
for char in input_string:
if char.isdigit():
result += char
return result
print(extract_numbers("The best smartphone is iPhone 15 Pro Max"))
# Output: 15
In this example, the extract_numbers()
function takes an input string and iterates through each character. If the character is a digit, it is added to the result
string. Finally, the function returns the string containing only the extracted digits.
Limitations:
The code doesn’t handle whitespaces between digits. If the input string contains spaces between digits or contains multiple numbers, the output will be a concatenated string without any separation. E.g. If the input string is "I have 5 iPhone 15 Pro Max"
, the output will be "515"
.
Conclusion
In conclusion, we have provided a comprehension overview of various methods to extract different types of numbers from a string in Python.
Regular expressions make it easier to capture different number formats and ensure your code can handle a variety of cases.
We have used regular expressions to extract integers, floats, negative numbers, scientific notation numbers, or numbers with multiple dots like version numbers or IP addresses.