Sort a string in Python

By James L.

A string in Python is a sequence of immutable characters, each with a specific position or index. This allows for operations like indexing, slicing, and sorting. However, strings are immutable, meaning their characters cannot be altered directly once created. Therefore, sorting strings efficiently requires creating a new sorted string or converting the string to a mutable data structure before sorting.

Python offers two primary approaches for sorting strings: using the sorted() function and the sort() method. While both functions achieve the same result, there are subtle differences in their implementation and performance.

Using the sorted() function

The sorted() function returns a new sorted list from an iterable (like a string or a list). It creates a new sorted list without modifying the original list.

Applying the sorted() function to the string returns a new list with the characters sorted. This sorted list can be joined back into a string using the join() method with an empty string '' as a separator.

Here’s an example:

my_str = "car"

sorted_str = ''.join(sorted(my_str))

print(sorted_str)

# Output: acr

In this example, sorted(my_str) returns a new sorted list ['a', 'c', 'r'], and we use the ''.join() to concatenate the sorted list of characters back into a string.

Using the sort() method of lists

The sort() method is a built-in list method that modifies the list in-place, sorting its elements in ascending order. However, since strings are immutable, you cannot use the sort() method directly on a string. Instead, you need to convert the string into a list first, sort the list, and then join the sorted characters back into a new string.

Here’s an example:

my_str = "car"

# Convert the string to a list
char_list = list(my_str)

# Sort the list
char_list.sort()

# Join the sorted characters back into a string
sorted_str = ''.join(char_list)

print(sorted_str)
# Output: acr

Here’s how the code works:

  • We convert the string my_str to a list using list(my_str), which creates a list of characters: ['c', 'a', 'r'].
  • We sort the list in-place using the sort() method: ['a', 'c', 'r'].
  • We join the sorted characters back into a string using the join() method with an empty string '' as the separator: ''.join(['a', 'c', 'r']) results in "acr".

Case-insensitive sorting

In Python, when you sort a string using the built-in sorted() function or the sort() method, the sorting is case-sensitive by default. This means that the uppercase letters take precedence over lowercase letters. For example, the letter ‘B’ would come before ‘a’.

However, in many situations, you may want to sort the strings in a case-insensitive manner, where the case of the letters (uppercase or lowercase) is ignored, and only the alphabetical order matters.

To achieve this, you can use the sorted() function or the sort() method along with the custom key function that converts the string to lowercase before sorting. This way, the sorting will be based on the lowercase representation of the strings, effectively making it case-insensitive.

There are three different approaches to converting strings to lowercase:

Using lowercase():

This method converts strings to lowercase before sorting, but may not handle all Unicode characters and locale-specific case representations.

Here’s an example:

my_str = "McDonald"

sorted_str = ''.join(sorted(my_str, key=lambda x: x.lower()))

print(sorted_str)

# Output: acDdlMno

Similarly, for the sort() method, use the following code:

my_str = "McDonald"

# Convert the string to a list
char_list = list(my_str)

# Sort the list
char_list.sort(key=lambda x: x.lower())

# Join the sorted characters back into a string
sorted_str = ''.join(char_list)

print(sorted_str)
# Output: acDdlMno

Using casefold():

This method is more robust than lower() for case-insensitive sorting because it provides better support for Unicode characters and internationalization.

Here’s an example:

my_str = "ßtefan"

sorted_str = ''.join(sorted(my_str, key=lambda x: x.casefold()))

print(sorted_str)

# Output: aefnßt

In this example, I have included a German name starting with the letter “ß”.

When using the lower() on the string my_str, the letter “ß” is retained, resulting in “ßtefan”.

However, when using lower() on the string my_str, the “ß” character is correctly case-folded to its lowercase equivalent “ss”, resulting in “sstefan”, which ensures proper sorting according to linguistic rules.

Similarly, for the sort() method, you can use the following code:

my_str = "ßtefan"

# Convert the string to a list
char_list = list(my_str)

# Sort the list
char_list.sort(key=lambda x: x.casefold())

# Join the sorted characters back into a string
sorted_str = ''.join(char_list)

print(sorted_str)
# Output: aefnßt

Using locale.strxfrm:

This method uses the current locale for locale-aware case-insensitive sorting, making it suitable for culturally-specific string sorting.

Here’s an example:

import locale

# Set the desired locale
locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8')

my_str = "águila"

sorted_str = ''.join(sorted(my_str, key=lambda x: locale.strxfrm(x)))

print(sorted_str)
# Output: aágilu

In this example, the string is sorted based on the rules of the Spanish locale (es_ES.UTF-8).

Let’s take a look at another example:

import locale

# Set the desired locale
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')

my_str = "étude"

sorted_str = ''.join(sorted(my_str, key=lambda x: locale.strxfrm(x)))

print(sorted_str)
# Output: deétu

In this example, the string is sorted based on the rules of the French locale (fr_FR.UTF-8).

Similarly, for the sort() method, you can use the following code:

import locale

# Set the desired locale
locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8')

my_str = "águila"

# Convert the string to a list
char_list = list(my_str)

# Sort the list
char_list.sort(key=lambda x: locale.strxfrm(x))

# Join the sorted characters back into a string
sorted_str = ''.join(char_list)

print(sorted_str)
# Output: aágilu

These examples demonstrate how locale.strxfrm() can be used to perform sorting operations that respect the linguistic conventions of a specific locale, ensuring that strings are sorted or compared correctly according to the rules of that locale.

Performance considerations

While both sort() and sorted() achieve the same result, they have a performance difference.

The performance difference may be negligible for small strings, but it becomes more significant as the size of the input string increases.

Using the sort() method is faster than using the sorted() function for sorting a string in Python.

The reason is that when using the sorted(), the string is first converted to a list internally within the function, and then a new sorted list is created, which adds the overhead of creating a new list. This process requires additional memory and computational resources.

On the other hand, when using the sort() method, the string is explicitly converted to a list using the list() method, and then the sort() method is applied to the same list, which operates in-place. This means that no new list is created, and the sorting is done by modifying the existing list. Since no new list is created, there is no overhead of creating a new data structure, making the process more efficient.

Therefore, if you need to sort a string in Python, using the sort() method is generally faster and more memory-efficient than using the sorted() function, especially for larger strings.

Conclusion

Python provides two main ways to sort strings: sorted() and sort().

The sorted() function creates a new sorted list from the original string. It’s straightforward but uses more memory, especially for large strings, since it creates a new list.

The sort() method modifies the original string directly, sorting the characters in-place. It requires an extra step to convert the string to a list first but avoids creating a new data structure, making it faster and more memory-efficient for large strings.

In general, use sort() for better performance when sorting large strings. Use sorted() when memory usage isn’t a concern or when you need a simple approach.

You may also need to consider case-insensitive sorting for strings. lower() converts to lowercase but has some limitations with Unicode. casefold() handles Unicode better. locale.strxfrm() uses your system’s language rules for case-insensitive sorting.

Choose the right sorting method based on your performance needs, input size, and whether you need to follow language-specific sorting rules.