Chapter 3 The First Python Program

3.2 Strings

String literals can be written in three different formats, the first being letters wrapped between a pair of single quotes

In the example above, there are three text characters in the string s, namely ‘a’, ‘b’ and ‘c’. The type of the string s is str, which is the Python representation of the string type.

The second way to write a string literal is to replace single quotes with double quotes; this is convenient if the string itself contains single quotation marks.

In case both single and double quotes are in a string itself, a third form of string literals can be used. It is specified by putting three double quotes on the left hand side and three on the right hand side of the string. This form of string literals also allows a string to span over multiple lines:

Line breaks are represented by a special character in the string, which can be written with the escaped form ‘n’. By default, Python displays the value of a string object in its escaped format, which is demonstrated by the example above. Escaped characters can also be used in a string literal directly.

In the example above, the two-character sequence ‘n’, is used in the literal explicitly to represent a new-line character. Note that explicit (i.e. not escaped) line breaks are not allowed in string literals with single or double quotes. Another escape character that has an escape form is the tab character, which can be written as ‘t’.

In the example above, the string s contains two explicit tab characters, which are shown in escaped forms by Python. Note that the quotation marks around strings serve only as indicators of a string literal—they are not a part of the string itself. Hence ‘’ or “” specifies an empty string, a string that does not contain any character.

String operators. Similar to integers and floating point numbers, string objects also support a set of operators. A commonly used string operator is the concatenation operator +:

Fig. 3.1 illustrates the changes that happen to the memory when the lines of code above are executed. When the string literals ‘abc’ and ‘def’ are evaluated, two corresponding string objects are constructed in the memory. They are associated with their respective identifiers via the assignment statements. When the concatenation operator is applied, a new string object is constructed, taking the concatenated value of the two operands.

One thing to note is that the same operator + behaves differently when applied to strings as compared to numbers. This fact is an example of polymorphism, which will be discussed later.

Fig. 3.1 Example memory structure for string operations

Fig. 3.1 Example memory structure for string operations

The ∗ operator can also be applied to strings. It takes a string operand and an integer operand, resulting in a string by repeating the string operand a number of times, as specified by the integer operand.

One important operator of strings is the getitem operator, which takes a string operand and an integer operand, resulting in the character in the string at the index specified by the integer. The getitem operator does not apply to numbers, such as integers and floating point numbers, but applies to other sequential objects, which will be introduced later. It takes the form of a pair of square brackets, written immediately after the string operand and enclosing the integer operand, which specifies for the character index.

Note that string indices start from 0 rather than 1. Hence s[i] stands for the i + 1th character in s when i ≥ 0. This is also true for other sequential types in Python, and sequential types in many other programming languages.

Negative numbers can also be used to specify indices in Python. Starting from −1, negative indices specify character indices from the right of the string.

String indices must be within the valid range—trying to get a character that does not exist in the string will result in an error.

>>> s='abc'
>>> s[3]
Traceback (most recent call last):
  File "<stdin >", line 1, in <module >
IndexError: string index out of range
>>> s[-4]
Traceback (most recent call last):
  File "<stdin >", line 1, in <module >
IndexError: string index out of range

A sub string can be extracted from a string by using the getslice operation, which similar to the getitem operation by using the square bracket. It is expressed by replacing the character index in a getitem expression with a slice, which consists of a start and an end index, separated by a colon.

The way in which getslice is performed is illustrated by Fig. 3.2, where the locations at which a slice is taken is worth noting. Both the start index and the end index specify positions before characters. Similar to getitem, character indices start from 0. For example, ‘s[0 : 2]’ takes a slice out of s, by starting from the first character (index = 0), and ending before the third character (index = 2), resulting in consisting of the first and second characters. Unlike getitem, getslice will not result in an index error; if the ending index is out of range, the slice will end with the last character in the string.

Fig. 3.2 Illustration of getslice

Fig. 3.2 Illustration of getslice

By making the end index smaller than or equal to the start index, an empty string results from getslice.

One or both indices in a getslice operation can be unspecified. The default value for an unspecified start index is 0, while the default value for an unspecified end index is the end of string. For example,

To enhance the flexibility of slicing, a slice can be further specified by adding a third parameter, step, which is joined to the start and end indices by using an additional colon (:), and indicates an interval at which characters are taken from the string. For example,

When the step is 2, every second character is taken from the string; when the step is 3, every third character is taken. The start and end indices specify the same locations with or without the step parameter. The default value of the step parameter is 1, which results in a continuous sub string. When the step is larger than 1, a discontinuous sub string is extracted.

The step parameter can also be negative, in which case the slice is taken from right to left, and hence the start index must be larger than the end index. Different from the left-to-right slicing, when slicing from right to left, indices indicate locations after corresponding characters. Similar to the left-to-right slicing, the absolute value of the step parameter specifies the interval at which characters are taken. For example,

An important function that is associated with strings is len, which takes a single string argument and returns the number of characters in the string.

As shown in the last example above, the length of an empty string is 0.

Conversion between strings and other types. Similar to the int and float functions for type conversion into integers and floating point numbers, the str function can be used to convert floating point numbers and integers into strings.

In the opposite direction, the int and float functions can turn strings that represent integers and floating point numbers to integer and floating point objects, respectively.

>>> s='123'
>>> int(s)
123
>>> s='1.23'
>>> float(s)
1.23
>>> int(s)
Traceback (most recent call last): File "<stdin >", line 1, in <module >
ValueError: invalid literal for int() with base 10: '1.23'
>>> s='abc'
>>> float(s)
Traceback (most recent call last):
File "<stdin >", line 1, in <module >
ValueError: could not convert string to float: abc

The conversion from strings to integers and floating point numbers is strict: when the string does not correspond to the target type, an error occurs. Two special strings that can be converted to a floating point numbers are ‘inf’ and ‘-inf’, which represent infinity and negative infinity, respectively. These two special floating point numbers do not have literals, and must be converted from the strings ‘inf’ and ‘-inf’.

When a floating point number is converted into a string, Python automatically rounds it up to a certain number of digits after the decimal point, so that the output is more readable.

If a specific number of digits after the decimal point is required in the output string, the round function introduced before can be used to round up the floating point number before it is converted into a string. Alternatively, a string formatting expression can be used. String formatting expressions are a powerful tool for specifying the format of numbers in a string. As a first example, consider the formatting of integers:

A string formatting expression consists of two parts, separated by a % symbol (e.g. ′%5d′%123). The part on the left is a pattern string that contains a pattern %xd (e.g. ′%5d′), where % indicates the start of a pattern, and the letter d indicates that the formatted pattern is an integer. On the right hand side of the % symbol is the argument to fill the pattern, and in this case it is an integer to be formatted in the string (e.g. 123). The x in the pattern is optional; it indicates the length of the string: if it is positive, spaces are padded on the left when the integer contains less digits than x, and if it is negative, spaces are padded on the right. If the integer contains more digits than the size specified by x, no space will be padded but the integer will not be truncated either.

In addition to patterns, the pattern string can consist of other characters. Characters that are not a part of a pattern will remain unchanged when patterns are replaced with arguments during string formatting.

To format a floating point number, the pattern in a pattern string is %x.yf,where x specifies the total size of the string, in the same way as integer formatting, y specifies the number of digits after the decimal point, and f marks a floating point pattern. x can be omitted, in which case no padding will be added.

More than one patterns can be defined in the pattern string, in which case a comma-separated list of arguments must be given on the right within a pair of parentheses. The patterns will be filled by the arguments in their input order.

If the number of patterns does not match the number of arguments, an error will be given:

>>> '%d %d %f' % (1, 2)
Traceback (most recent call last): File "<stdin >", line 1, in <module >
TypeError: not enough arguments for format string

String Comparison. The comparison operators also work on strings. To see if two strings are equal you simply write a boolean expression using the equality operator.

Other comparison operations are useful for putting words in lexicographical order. This is similar to the alphabetical order you would use with a dictionary, except that all the uppercase letters come before all the lowercase letters.

It is probably clear to you that the word apple would be less than (come before) the word banana. After all, a is before b in the alphabet. What happens to the words apple and Apple? Are they the same?

It turns out that uppercase and lowercase letters are considered to be different from one another. The way the computer knows they are different is that each character is assigned a unique integer value. “A” is 65, “B” is 66, and “5” is 53. The way you can find out the so-called ordinal value for a given character is to use a character function called ord.

When you compare characters or strings to one another, Python converts the characters into their equivalent ordinal values and compares the integers from left to right. As you can see from the example above, “a” is greater than “A” so “apple” is greater than “Apple”.

Humans commonly ignore capitalization when comparing two words. However, computers do not. A common way to address this issue is to convert strings to a standard format, such as all lowercase, before performing the comparison.

There is also a similar function called chr that converts integers into their character equivalent.

One thing to note in the last two examples is the fact that the space character has an ordinal value (32). Even though you don’t see it, it is an actual character. It is called nonprinting character.

Check your understanding

© Copyright 2024 GS Ng.

Next Section - 3.3 Strings Methods