Chapter 3 The First Python Program

3.3 Strings Methods¶

String methods can be classified into two basic categories. The first category returns information about a string, and the second category formats a string. Because string objects are immutable, string formatting methods return a copy of the string object, rather than modifying the object in place.

A set of commonly-used string information methods include:

isupper(), which takes no arguments and returns True if there is at least one cased character in the string and all cased characters in the string are in upper case, and False otherwise;
islower(), which takes no arguments and returns True if there is at least one cased character in the string and all cased characters in the string are in lower case, and False otherwise;
isalpha(), which takes no arguments, and returns True if the string is non-empty and all the characters in the string are alphabetic, and False otherwise;
isdigit(), which takes no arguments, and returns True if the string is non-empty and all the characters in the string are digits, and False otherwise;

isspace(), which takes non arguments, and returns True if the string is non-empty and all the characters in the string are whitespaces (e.g. spaces, tabs and new lines), and False otherwise.

>>> s1 = 'abcdef'
>>> s2 = 'A1'
>>> s3 = 'ABC'
>>> s4 = '__name__'
>>> s5 = 'hello!'
>>> s6 = 'name@institute'
>>> s7 = '12345'
>>> s8 = '123 456'
>>> s9 = ''
>>> s10 = ''
>>> s1.isupper()
False
>>> s2.isupper()
True
>>> s3.isupper()
True
>>> s8.isupper()
False
>>> s9.isupper()
False
>>> s10.isupper()
False
>>> s1.islower()
True
>>> s4.islower()
True
>>> s5.islower()
True
>>> s1.isalpha()
True
>>> s2.isalpha()
False
>>> s7.isalpha()
False
>>> s8.isalpha()
False
>>> s10.isalpha()
False
>>> s1.isdigit()
False
>>> s4.isdigit()
False
>>> s6.isdigit()
False
>>> s7.isdigit()
True
>>> s8.isdigit()
False
>>> s9.isdigit()
False
>>> s8.isspace()
False
>>> s9.isspace()
True
>>> s10.isspace()
False

A set of string information methods that search for a sub string includes:

rindex(s), which takes a string argument s and returns the last occurrence of the argument in the string. If the string argument contains more than one characters, the index of the first character is returned. If the string argument is not a sub string of the string, a value error is reported. rindex is the counterpart of index for string objects, returning the first occurrence of the sub string from the right, rather than from the left.
find(s), which takes a string argument s and returns the first occurrence of the argument in the string. If the string argument contains more than one characters, the index of the first character is returned. If the string argument is not a sub string of the string, find returns –1. find can be treated as an alternative to index for string objects, with the difference being the responce when the string argument is not a sub string: while the former returns –1, the latter raises a value error.
rfind(s), which takes a string argument s and returns the last occurrence of the argument in the string. If the string argument contains more than one characters, the index of the first character is returned. If the string argument is not a sub string of the string, rfind returns –1. rfind is the counterpart of find, returning the first occurrence of the string argument from the right, rather than from the left.
```
>>> s = 'abcdefdefabc '
>>> s.index('abc')
0
>>> s.find('def ')
3
>>> s.rfind('de')
6
>>> s.rindex('a')
9
>>> s.index('abd')
Traceback (most recent call last):
  File "<stdin >", line 1, in <module >
ValueError: substring not found
>>> s.rfind('abd ')
-1
```

The index, rindex, find and rfind methods can take optional arguments that specify a slice of the string in which the sub string is looked for. This is achieved by two optional arguments, indicating the start index and the end index of the slice, respectively. The locations of the slicing indices are the same as those by the getslice operations. In case only the start index is specified, the slice ends at the end of the string.

>>> s = 'abcabcabcdefdefabc'
>>> s.index('abc', 3) # search s[3:]
3
>>> s.find('abc', 5, -2) # search s[5:-2]
6

A set of convenient methods for checking the beginning and end of a string include:

startswith(s), which takes a string argument s and returns True if the string starts with the argument, and False otherwise;

endswith(s), which takes a string argument s and returns True if the string ends with the argument, and False otherwise.

>>> s = 'abcdefghi'
>>> s.startswith('a')
True
>>> s.startswith('abc')
True
>>> s.endswith('d')
False
>>> s.endswith('hi')
True

The methods startswith and endswith can also take a tuple of strings as the argument, in which case the return value is True if the string starts with any string in the tuple, and False otherwise.

>>> s = 'abcdefghi '
>>> s.startswith(('abc', 'def'))
True
>>> s.startswith(('a', 'b', 'c'))
True
>>> s.endswith(('def', 'abc'))
False
>>> s.endswith(('ghi', 'hi', 'i'))
True

A set of string formatting methods that modify the cases of cased characters includes:

upper(), which takes no arguments and returns a copy of the string object with all cased characters converted to the upper case;
lower(), which takes no arguments and returns a copy of the string object with all cased characters converted to the lower case;
swapcase(), which takes no arguments and returns a copy of the string object with all upper-case characters converted into the lower case, and vice versa.
```
>>> s = 'Abc!'
>>> s.upper()
'ABC!'
>>> s.lower()
'abc!'
>>> s.swapcase()
'aBC!'
```

A set of string formatting methods that adds or removes whitespaces on the ends of a string includes:

ljust(x), which takes an input argument s that specifies a length, and returns a copy of the string left justified in a string of the specified length. Spaces are padded on the right if the length is longer than the length of the string, while no truncation is performed if the length is smaller than the length of the string.
rjust(x),which takes an input argument that specifies a length, and returns a copy of the string right justified in a string of the specified length. Spaces are padded on the left if the length is larger than the length of the string, while no truncation is performed if the length is smaller than the length of the string.
lstrip(),which takes no arguments and returns a copy of the string with whitespaces on the left being stripped.
rstrip(),which takes no arguments and returns a copy of the string with whitespaces on the right being stripped.

strip(), which takes no argument and returns a copy of the string with whitespaces on both sides being stripped.

>>> s = 'abc'
>>> s.ljust(5)
'abc '
>>> s.rjust(6)
' abc'
>>> s.ljust(2)
'abc'
>>> s = ' abc\n\t
>>> s
' abc\n\t'
>>> s.lstrip()
'abc\n\t'
>>> s.rstrip()
'abc'
>>> s.strip()
'abc'

The methods above can be generalized to insert or strip arbitrary characters on the ends of a string. In particular, ljust and rjust can take an additional argument that specifies a padding character, while lstrip, rstrip and strip can take an additional argument that specifies a set of characters to be stripped.

>>> s = '123'
>>> s.ljust(5, '0') # pad '0'
'12300 '
>>> s.rjust(6, '-') # pad '-'
'---123'
>>> s = 'aabcdceft'
>>> s.lstrip('a') # strip 'a'
'bcdceft'
>>> s.lstrip('bac') # the set {'b', 'a', 'c'}
'dceft'
>>> s.rstrip('ag') # the set {'a', 'g'}
'aabcdceft'
>>> s.strip('gabf') # {'g', 'a', 'b', 'f'}
'cdceft'

Python provides a method, replace, for sub string replacement directly. It takes two string arguments, specifying the substring to be replaced and the replacement string, respectively, and returns a copy of the string after replacement.

replace allows an additional argument that specifies a count, so that the first occurrences of the sub string up to the count are replaced.

A final string formatting method is format is bound to a pattern string, and takes arguments that fill pattern fields in the string. A pattern field is formed by a pair of curly brackets, and contains either a number specifying the index of the corresponding argument, or a keyword to be filled by a keyword argument. For example,

>>> '{0} + {1} = {2}'.format(1, 2, 1+2)
'1 + 2 = 3'
>>> 'Hello, {0}'.format('Python')
'Hello , Python '
>>> '{2}-{0}-{1}'.format('abc', 'def', 'ghi')
'ghi -abc -def '
>>> '{x}-{y}-{0}'.format('ghi', x='abc', y='def')
'abc -def -ghi '

In the example above, the last method call contains two keyword arguments, x and y, which fill the keyword fields {x} and {y}, respectively. Note that similar to other function calls, keyword arguments must be placed after non-keyword arguments.

If the arguments are sequentially filled, the indics in the pattern fields can be omitted.

>>> '{} + {} = {}'.format(1, 2, 3)
'1 + 2 = 3'

Formatting specifications can be given to each pattern field by using ‘:<pattern>’, where <pattern> follows the pattern syntax in string formatting expressions. For example,

>>> '{0:d} + {1:.2f} = {2}'.format(1, 2.0, 1+2.0)
'1 + 2.00 = 3.0'
>>> s = '{0:d} + {1:.2f} = {x:s}'
>>> s.format(1, 2, x=str(1+2))
'1 + 2.00 = 3'

In the example above, ‘:d’, ‘:.2f’ and ‘:s’ are used to specify the format of an integer, a floating point number with 2 digits after the decimal point and a string, respectively. Formatting specifications can be used for keyword and non-keyword arguments.

If there are too few arguments, an error will be raised. If there are too many arguments, the first ones will be used to fill the pattern string.

>>> s '{} + {} = {}'
>>> s.format(1)
Traceback (most recent call last):
File "<stdin >", line 1, in <module > IndexError: tuple index out of range
>>> s.format(1, 2, 3, 4)
'1 + 2 = 3'

There is a built-in function, format(v,s), which takes a value v and a pattern string s as its input arguments, and return a string by calling s.format(v).

>>> format(3.1,'.2f')
'3.10 '

The methods and functions above are frequently used for sequential types. There are more methods for sequential types. The Python documentation is a useful reference for looking for a pre-defined function before writing one custom function.

Strings are Immutable. One final thing that makes strings different from some other Python collection types is that you are not allowed to modify the individual characters in the collection. It is tempting to use the [] operator on the left side of an assignment, with the intention of changing a character in a string. For example, in the following code, what happens when the first letter of greeting is changed?

Instead of producing the output Jello, world!, this code produces the runtime error TypeError: 'str' object does not support item assignment.

Strings are immutable, which means you cannot change an existing string. The best you can do is create a new string that is a variation on the original.

The solution here is to concatenate a new first letter onto a slice of greeting. This operation has no effect on the original string.

Check your understanding

3.2Q-1: What is printed by the following statements?

s = "python rocks"
print(s[1] * s.index("n"))

yyyyy
Yes, s[1] is y and the index of n is 5, so 5 y characters. It is important to realize that the index method has precedence over the repetition operator. Repetition is done last.
55555
Close. 5 is not repeated, it is the number of times to repeat.
n
This expression uses the index of n
Error, you cannot combine all those things together.
This is fine, the repetition operator used the result of indexing and the index method.

3.2Q-2: What is printed by the following statements:

s = "Ball"
s[0] = "C"
print(s)

Ball
Assignment is not allowed with strings.
Call
Assignment is not allowed with strings.
Error
Yes, strings are immutable.

Next Section - 3.4 A Simple Complete Python Program