Chapter 3 The First Python Program

3.2 Strings¶

String literals can be written in three different formats, the first being letters wrapped between a pair of single quotes

In the example above, there are three text characters in the string s, namely ‘a’, ‘b’ and ‘c’. The type of the string s is str, which is the Python representation of the string type.

The second way to write a string literal is to replace single quotes with double quotes; this is convenient if the string itself contains single quotation marks.

In case both single and double quotes are in a string itself, a third form of string literals can be used. It is specified by putting three double quotes on the left hand side and three on the right hand side of the string. This form of string literals also allows a string to span over multiple lines:

Line breaks are represented by a special character in the string, which can be written with the escaped form ‘n’. By default, Python displays the value of a string object in its escaped format, which is demonstrated by the example above. Escaped characters can also be used in a string literal directly.

In the example above, the two-character sequence ‘n’, is used in the literal explicitly to represent a new-line character. Note that explicit (i.e. not escaped) line breaks are not allowed in string literals with single or double quotes. Another escape character that has an escape form is the tab character, which can be written as ‘t’.

In the example above, the string s contains two explicit tab characters, which are shown in escaped forms by Python. Note that the quotation marks around strings serve only as indicators of a string literal—they are not a part of the string itself. Hence ‘’ or “” specifies an empty string, a string that does not contain any character.

String operators. Similar to integers and floating point numbers, string objects also support a set of operators. A commonly used string operator is the concatenation operator +:

Fig. 3.1 illustrates the changes that happen to the memory when the lines of code above are executed. When the string literals ‘abc’ and ‘def’ are evaluated, two corresponding string objects are constructed in the memory. They are associated with their respective identifiers via the assignment statements. When the concatenation operator is applied, a new string object is constructed, taking the concatenated value of the two operands.

One thing to note is that the same operator + behaves differently when applied to strings as compared to numbers. This fact is an example of polymorphism, which will be discussed later.

Fig. 3.1 Example memory structure for string operations¶

The ∗ operator can also be applied to strings. It takes a string operand and an integer operand, resulting in a string by repeating the string operand a number of times, as specified by the integer operand.

One important operator of strings is the getitem operator, which takes a string operand and an integer operand, resulting in the character in the string at the index specified by the integer. The getitem operator does not apply to numbers, such as integers and floating point numbers, but applies to other sequential objects, which will be introduced later. It takes the form of a pair of square brackets, written immediately after the string operand and enclosing the integer operand, which specifies for the character index.

Note that string indices start from 0 rather than 1. Hence s[i] stands for the i + 1th character in s when i ≥ 0. This is also true for other sequential types in Python, and sequential types in many other programming languages.

Negative numbers can also be used to specify indices in Python. Starting from −1, negative indices specify character indices from the right of the string.

String indices must be within the valid range—trying to get a character that does not exist in the string will result in an error.

>>> s='abc'
>>> s[3]
Traceback (most recent call last):
  File "<stdin >", line 1, in <module >
IndexError: string index out of range
>>> s[-4]
Traceback (most recent call last):
  File "<stdin >", line 1, in <module >
IndexError: string index out of range

A sub string can be extracted from a string by using the getslice operation, which similar to the getitem operation by using the square bracket. It is expressed by replacing the character index in a getitem expression with a slice, which consists of a start and an end index, separated by a colon.

The way in which getslice is performed is illustrated by Fig. 3.2, where the locations at which a slice is taken is worth noting. Both the start index and the end index specify positions before characters. Similar to getitem, character indices start from 0. For example, ‘s[0 : 2]’ takes a slice out of s, by starting from the first character (index = 0), and ending before the third character (index = 2), resulting in consisting of the first and second characters. Unlike getitem, getslice will not result in an index error; if the ending index is out of range, the slice will end with the last character in the string.

Fig. 3.2 Illustration of getslice¶

By making the end index smaller than or equal to the start index, an empty string results from getslice.

One or both indices in a getslice operation can be unspecified. The default value for an unspecified start index is 0, while the default value for an unspecified end index is the end of string. For example,

To enhance the flexibility of slicing, a slice can be further specified by adding a third parameter, step, which is joined to the start and end indices by using an additional colon (:), and indicates an interval at which characters are taken from the string. For example,

When the step is 2, every second character is taken from the string; when the step is 3, every third character is taken. The start and end indices specify the same locations with or without the step parameter. The default value of the step parameter is 1, which results in a continuous sub string. When the step is larger than 1, a discontinuous sub string is extracted.

The step parameter can also be negative, in which case the slice is taken from right to left, and hence the start index must be larger than the end index. Different from the left-to-right slicing, when slicing from right to left, indices indicate locations after corresponding characters. Similar to the left-to-right slicing, the absolute value of the step parameter specifies the interval at which characters are taken. For example,

An important function that is associated with strings is len, which takes a single string argument and returns the number of characters in the string.

As shown in the last example above, the length of an empty string is 0.

Conversion between strings and other types. Similar to the int and float functions for type conversion into integers and floating point numbers, the str function can be used to convert floating point numbers and integers into strings.

In the opposite direction, the int and float functions can turn strings that represent integers and floating point numbers to integer and floating point objects, respectively.

>>> s='123'
>>> int(s)
123
>>> s='1.23'
>>> float(s)
1.23
>>> int(s)
Traceback (most recent call last): File "<stdin >", line 1, in <module >
ValueError: invalid literal for int() with base 10: '1.23'
>>> s='abc'
>>> float(s)
Traceback (most recent call last):
File "<stdin >", line 1, in <module >
ValueError: could not convert string to float: abc

The conversion from strings to integers and floating point numbers is strict: when the string does not correspond to the target type, an error occurs. Two special strings that can be converted to a floating point numbers are ‘inf’ and ‘-inf’, which represent infinity and negative infinity, respectively. These two special floating point numbers do not have literals, and must be converted from the strings ‘inf’ and ‘-inf’.

When a floating point number is converted into a string, Python automatically rounds it up to a certain number of digits after the decimal point, so that the output is more readable.

If a specific number of digits after the decimal point is required in the output string, the round function introduced before can be used to round up the floating point number before it is converted into a string. Alternatively, a string formatting expression can be used. String formatting expressions are a powerful tool for specifying the format of numbers in a string. As a first example, consider the formatting of integers:

A string formatting expression consists of two parts, separated by a % symbol (e.g. ′%5d′%123). The part on the left is a pattern string that contains a pattern %xd (e.g. ′%5d′), where % indicates the start of a pattern, and the letter d indicates that the formatted pattern is an integer. On the right hand side of the % symbol is the argument to fill the pattern, and in this case it is an integer to be formatted in the string (e.g. 123). The x in the pattern is optional; it indicates the length of the string: if it is positive, spaces are padded on the left when the integer contains less digits than x, and if it is negative, spaces are padded on the right. If the integer contains more digits than the size specified by x, no space will be padded but the integer will not be truncated either.

In addition to patterns, the pattern string can consist of other characters. Characters that are not a part of a pattern will remain unchanged when patterns are replaced with arguments during string formatting.

To format a floating point number, the pattern in a pattern string is %x.yf,where x specifies the total size of the string, in the same way as integer formatting, y specifies the number of digits after the decimal point, and f marks a floating point pattern. x can be omitted, in which case no padding will be added.

More than one patterns can be defined in the pattern string, in which case a comma-separated list of arguments must be given on the right within a pair of parentheses. The patterns will be filled by the arguments in their input order.

If the number of patterns does not match the number of arguments, an error will be given:

>>> '%d %d %f' % (1, 2)
Traceback (most recent call last): File "<stdin >", line 1, in <module >
TypeError: not enough arguments for format string

String Comparison. The comparison operators also work on strings. To see if two strings are equal you simply write a boolean expression using the equality operator.

Other comparison operations are useful for putting words in lexicographical order. This is similar to the alphabetical order you would use with a dictionary, except that all the uppercase letters come before all the lowercase letters.

It is probably clear to you that the word apple would be less than (come before) the word banana. After all, a is before b in the alphabet. What happens to the words apple and Apple? Are they the same?

It turns out that uppercase and lowercase letters are considered to be different from one another. The way the computer knows they are different is that each character is assigned a unique integer value. “A” is 65, “B” is 66, and “5” is 53. The way you can find out the so-called ordinal value for a given character is to use a character function called ord.

When you compare characters or strings to one another, Python converts the characters into their equivalent ordinal values and compares the integers from left to right. As you can see from the example above, “a” is greater than “A” so “apple” is greater than “Apple”.

Humans commonly ignore capitalization when comparing two words. However, computers do not. A common way to address this issue is to convert strings to a standard format, such as all lowercase, before performing the comparison.

There is also a similar function called chr that converts integers into their character equivalent.

One thing to note in the last two examples is the fact that the space character has an ordinal value (32). Even though you don’t see it, it is an actual character. It is called nonprinting character.

Check your understanding

3.1Q-1: What is printed by the following statements?

s = "python"
t = "rocks"
print(s + t)

python rocks
Concatenation does not automatically add a space.
python
The expression s+t is evaluated first, then the resulting string is printed.
pythonrocks
Yes, the two strings are glued end to end.
Error, you cannot add two strings together.
The + operator has different meanings depending on the operands, in this case, two strings.

3.1Q-2: What is printed by the following statements?

s = "python"
excl = "!"
print(s+excl*3)

python!!!
Yes, repetition has precedence over concatenation
python!python!python!
Repetition is done first.
pythonpythonpython!
The repetition operator is working on the excl variable.
Error, you cannot perform concatenation and repetition at the same time.
The + and * operator are defined for strings as well as numbers.

3.1Q-3: What is printed by the following statements?

s = "python rocks"
print(s[3])

t
Index locations do not start with 1, they start with 0.
h
Yes, index locations start with 0.
c
s[-3] would return c, counting from right to left.
Error, you cannot use the [ ] operator with a string.
[ ] is the index operator

3.1Q-4: What is printed by the following statements?

s = "python rocks"
print(s[2] + s[-5])

tr
Yes, indexing operator has precedence over concatenation.
ps
p is at location 0, not 2.
nn
n is at location 5, not 2.
Error, you cannot use the [ ] operator with the + operator.
[ ] operator returns a string that can be concatenated with another string.

3.1Q-5: What is printed by the following statements?

s = "python rocks"
print(s[3:8])

python
That would be s[0:6].
rocks
That would be s[7:].
hon r
Yes, start with the character at index 3 and go up to but not include the character at index 8.
Error, you cannot have two numbers inside the [ ].
This is called slicing, not indexing. It requires a start and an end.

3.1Q-6: What is printed by the following statements?

s = "python rocks"
print(s[7:11] * 3)

rockrockrock
Yes, rock starts at 7 and goes through 10. Repeat it 3 times.
rock rock rock
Repetition does not add a space.
rocksrocksrocks
Slicing will not include the character at index 11. Just up to it (10 in this case).
Error, you cannot use repetition with slicing.
The slice will happen first, then the repetition. So it is ok.

3.1Q-7: What is printed by the following statements?

s = "python rocks"
print(len(s))

11
The blank counts as a character.
12
Yes, there are 12 characters in the string.

3.1Q-8: What is printed by the following statements?

s = "python rocks"
print(s[len(s)-5])

o
Take a look at the index calculation again, len(s)-5.
r
Yes, len(s) is 12 and 12-5 is 7. Use 7 as index and remember to start counting with 0.
s
s is at index 11
Error, len(s) is 12 and there is no index 12.
You subtract 5 before using the index operator so it will work.

3.1Q-9: Evaluate the following comparison:

"Dog" < "Doghouse"

True
Both match up to the g but Dog is shorter than Doghouse so it comes first in the dictionary.
False
Strings are compared character by character.

3.1Q-10: Evaluate the following comparison:

"dog" < "Dog"

True
d is greater than D according to the ord function (68 versus 100).
False
Yes, upper case is less than lower case according to the ordinal values of the characters.
They are the same word
Python is case sensitive meaning that upper case and lower case characters are different.

3.1Q-11: Evaluate the following comparison:

"dog" < "Doghouse"

Tru9
d is greater than D.
False
The length does not matter. Lower case d is greater than upper case D.

Next Section - 3.3 Strings Methods