17. String type#

Author: Tue Nguyen

17.1. Outline#

  • What is a string?

  • Create a string

  • Index & slice a string

  • Iterate through a string

  • Conversion to strings

  • Mutability

  • Copy a string

  • Order of characters

  • Operations on a string

  • Formatted strings

17.2. What is a string?#

  • A string is an immutable sequence of Unicode characters

  • Python does NOT distinguish between characters and strings

    • There is only the string type, not character type

    • A single character is a string of length 1

    • Examples of strings: "Hello", "A", "Good morning"

17.3. Create a string#

  • We can create a string using ' or "

  • There’s no difference beween using ' and using "

  • When we want to preserve new-line characters, we can use triple quotes

Ex 1: using single quotes

s = 'Hello'
print(s)
print(type(s))
Hello
<class 'str'>

Ex 2: using double quotes

s = "Hello"
print(s)
print(type(s))
Hello
<class 'str'>

Ex 3: single vs. double quotes

# Use single quotes when your text contains double quotes
text = 'He said: "I am handsome"'
print(text)
He said: "I am handsome"
# Use double quotes when your text contains single quotes
text = "I have a bachelor's degree"
print(text)
I have a bachelor's degree

Ex 4: Python is case-sensitive

"Hello" == "HELLO"
False

Ex 5: triple quotes

# Use triple double quotes when you want a multi-line text
text = """This is the first line
This is the second ine"""

print(text)
This is the first line
This is the second ine
# Triple quote preserve any kinds of white spaces
text = """   Line 1
    Line 2
Line 3"""

print(text)
   Line 1
    Line 2
Line 3
# You can also use triple single quotes
text = '''Line 1
Line 2'''

print(text)
Line 1
Line 2
# Triple quotes are very convenient when you write SQL queries
q = """
select
    name,
    address,
    email
    from employee
where 
    (salary >= 50000)
    and (country = 'UK')
"""
print(q)
select
    name,
    address,
    email
    from employee
where 
    (salary >= 50000)
    and (country = 'UK')

Ex 6: all quotes are the same

print("hello" == 'hello')
print("hello" == '''hello''')
print("hello" == """hello""")
True
True
True

17.4. Index a string#

  • Same as lists and tuples

  • Note that each character in the string is like an element of a tuple

# Init
s = "Hello world"
s
'Hello world'
# First character
s[0]
'H'
# Last character
s[-1]
'd'
# Fifth character
s[4]
'o'

17.5. Slice a string#

  • Same as lists and tuples

  • However, there is one exception

    • For a list, a slice containing a single element is a list and it is different from that element

    • For a string, a slice containing a single character is the same as that character

# For a list
# A slice containing a single element != that element
x = [1, 2, 3]
print(x[0])
print(x[:1])
print(x[0] == x[:1])
1
[1]
False
# For a string
# A slice containing a single character == that character
s = "Hello"
print(s[0])
print(s[:1])
print(s[0] == s[:1])
H
H
True

17.6. Iterate through a string#

  • Same as lists and tuples

# Init
s = "Hello"
# Iterate through each character and print it out
for c in  s:
    print(c)
H
e
l
l
o

17.7. Conversion to strings#

We use str() to cast an object to a string

# Bool to str
str(True)
'True'
# Int to str
str(100)
'100'
# Float to str
str(123.45)
'123.45'
# List to str
str([1, 2, 3])
'[1, 2, 3]'
# Tuple to string
str((1, 2, 3))
'(1, 2, 3)'

17.8. Mutablity#

  • A string is immutable, meaning that we cannot change it

  • Try s[0] = "A" to see the error

17.9. Copy a string#

17.9.1. Shallow copying#

  • Same as tuples, there is no way to make a shallow copy of a string

  • t = s[:] will result to an aliasing

  • To make a copy of a string s, Simply use t = s

  • Although s and t are pointing to the same object, we will not have to worry that changing s might change t because it’s impposible to change the string or its elements once created

17.9.2. Deep copying#

  • Deepcopying for string is unnecessary

17.10. Order of characters#

  • Each character is encoded by one integer

  • So comparing characters are in fact comparing their underlying integer codes

  • This might cause some confusion for beginners

    • Letters A-Z are encoded with integers from 65-90

    • Letters a-Z are encoded with integers from 97-122

    • Thus A < B but a > B

    • Therefore, when comparing strings, we first need to convert them to the same case (lower or upper)

# Check the underlying code using ord()
print(ord("A")) # 65
print(ord("Z")) # 90
print(ord("a")) # 97
print(ord("z")) # 122
65
90
97
122
# Compare strings
print("BOB" < "BILL") # False because O > I
print("MARK" < "MARY") # True because K < Y
print("Happy" < "beautiful") # Because H < b (not B)
False
True
True
# You can use chr() to get the Unicode character for a given integer
print(chr(65)) # Letter A
print(chr(80)) # Letter P
print(chr(2000)) # Some weird character
A
P
ߐ

17.11. Operations on a string#

  • We have all regular operations for an immutable sequence like for a tuple

  • Besides, we will have special operations for Unicode strings

17.11.1. Regular operations#

I will not repeat all but just some examples

# Init
s = "Hello"
s
'Hello'
# Count numer of characters
len(s)
5
# Check if 'h' in s (remember Python is case-sensitive)
'h' in s
False
# Check if "H" in s
"H" in s
True
# Count number of occurrences of "l"
s.count("l")
2
# Concat 2 strings
"Hello " + "world"
'Hello world'
# Sort a string with sorted()
# Note that sorted takes an iterable and returns a list
# no matter the input is a list, tuple, or string
# Thus, you always get a list back
print(sorted("World"))
print(sorted("world"))
['W', 'd', 'l', 'o', 'r']
['d', 'l', 'o', 'r', 'w']

17.11.2. Special string operations#

We can divide special string operations into the following groups

  • Group 1: case checking

  • Group 2: case transformations

  • Group 3: substring checking

  • Group 4: white space handling

  • Group 5: string padding

  • Group 6: string splitting & joining

  • Group 7: searching and replacing

Further readings

17.11.2.1. Group 1: case checking#

  • First we need to distinguish between cased and non-cased characters

  • Cased characters

    • They are letters in a language such as English, Vietnamese, or Italian

    • They have uppercase and lowercase forms. Ex: o-O, a-A, ê-Ê

  • Non-cased characters

    • They are the rest

    • And they have only form. Ex: 0, 1, ., ?, -

a) Check if ALL cased-characters in a string is in lowercase

  • Returns True if there is at least one cased chatacter and all cased characters are in lowercase

  • Return False otherwise

  • You can read the documentation using ?str.islower

# Read the documentation
?str.islower
"hello".islower() # True
True
"hello!".islower() # True
True
"Hello".islower() # False because of H
False
"012!@".islower() # False because there is no cased-character
False

b) Check if ALL cased-characters in a string is in uppercase

"HELLO".isupper() # True
True
"HELLO!@".isupper() # True
True
"HELLo".isupper() # False because of o
False
"012!@".isupper() # False because there is no cased character
False

c) Check if the first character of each word is in uppercase and the rest is lowercase

"Mr. John".istitle() # True
True
"MR. John".istitle() # False
False

d) Check if all characters are letters

"Hello".isalpha() # True
True
"Chào".isalpha() # True (all are Vietnamese letters)
True
"Hello world".isalpha() # False because of the space
False
"No1".isalpha() # Because of 1
False

e) Check if all characters are digits 0-9

"1234".isdigit()
True
"012-345".isdigit() # False because of the hyphen
False

f) Check if all characters are letters and/or digits (alpha numeric)

"Number1".isalnum() # True
True
"Number 1".isalnum() # False because of the space
False

g) Check if all characters are white spaces

Note that white spaces include space ( ), tab (\t), and new line character (\n)

"    ".isspace() # True
True
" \t\n".isspace() # True
True
text = """


"""
text.isspace() # True
True

h) More interesting examples

# Init a string
s = "The word WTO is short for World Trade Organization"
# Ex 1: count the number of capital letters
num_caps = 0
for c in s:
    if c.isupper():
        num_caps += 1
num_caps
7
# Ex 2: count the number of spaces
num_spaces = 0
for c in s:
    if c.isspace():
        num_spaces += 1
num_spaces
8

17.11.2.2. Group 2: case transformations#

# Init
s = "Python is fUn."
s
'Python is fUn.'
# Make uppercase
s.upper()
'PYTHON IS FUN.'
# Make lowercase
s.lower()
'python is fun.'
# Make a title 
s.title()
'Python Is Fun.'
# Make a scntence (Capitalize the first word only)
s.capitalize()
'Python is fun.'
# Swap case (lower -> upper, upper -> lower)
s.swapcase()
'pYTHON IS FuN.'

17.11.2.3. Group 3: substring checking#

# Init
s = "Python is fUn."
s
'Python is fUn.'

a) Contains

# Check if "Python" is in s
"Python" in s
True
# Check if "python" is in s
"python" in s
False

b) Starts with

# Check if s starts with "Py"
s.startswith("Py")
True
# Check if s starts with "py"
s.startswith("py")
False

c) Ends with

# Check if s ends with "."
s.endswith(".")
True
# Check if s ends with "!"
s.endswith("!")
False

17.11.2.4. Group 4: white space handling#

# Init a s tring
s = "\t   Hello\t"
s
'\t   Hello\t'
# Remove leading white spaces (left strip)
s.lstrip()
'Hello\t'
# Remove trailing white spaces (right strip)
s.rstrip()
'\t   Hello'
# Remove spaces at both ends (strip)
s.strip()
'Hello'

17.11.2.5. Group 5: string padding#

  • Padding a string s is adding a fill character to the left or the right of s

  • Normally we want s to have a fixed length

  • If s has not reach that fixed lenth, we want to fill in some character to achieve that goal

  • Ex: we want '01', '02', …, '11', '12' but the original data are '1', '2', …, '11', '12'

# Init a string
s = "A"
s
'A'
# Read the document for justification
?str.rjust
# Pad 0's to the LEFT of s (right just) to get a string of length 3
s.rjust(3, "0")
'00A'
# Pad 0's to the RIGHT of s (left just) to get a string of length 3
s.ljust(3, "0")
'A00'
# Since filling leading zeros is a common action
# Python string has a method .zfill()
# Ex: fill leading zeros to make a string of length 3
s.zfill(3)
'00A'
# Print out 'month_01', 'month_02', ..., `month_12`
for i in range(1, 13):
    month = str(i).zfill(2)
    print("month_" + month)
month_01
month_02
month_03
month_04
month_05
month_06
month_07
month_08
month_09
month_10
month_11
month_12

17.11.2.6. Group 6: string splitting & joining#

a) Split a string based on a given token

# Split a string by white spaces
"How are you?\nI am fine".split()
['How', 'are', 'you?', 'I', 'am', 'fine']
# Split a string by hyphens
"2022-01-01".split("-")
['2022', '01', '01']
# Split a string by a token
"John Doe||john@example.com||UK".split("||")
['John Doe', 'john@example.com', 'UK']

b) Split a string into partitions

# Suppose s is "Math:90"
# We want to split s into 3 parts based on the separator `:`
"Math:90".partition(":")
('Math', ':', '90')

c) Join a list of string

# Init a list of fruits
fruits = ["Apple", "Banana", "Guava", "Watermelon"]
# Join fruits using ,
",".join(fruits)
'Apple,Banana,Guava,Watermelon'
# Join fruits using a space
" ".join(fruits)
'Apple Banana Guava Watermelon'
# Join fruits using --
"--".join(fruits)
'Apple--Banana--Guava--Watermelon'

17.11.2.7. Group 7: searching and replacing#

a) Find a token in a string

  • Since a string is a sequence, we can use .index()

  • However, .index() will raise an error if the token is not in the string

  • A safter way is to use .find()

  • It worsk the same way as .index() does, but if the token is not in the string, it will return -1 instead of raising an error

  • .find() is available for strings only

  • To find from the right, use .rfind()

# Init a string
s = "He is a good man from a good family"
# Find 'good ' from the left 
s.find("good")
8
# Find 'good ' from the right 
s.rfind("good")
24

b) Replace a pattern in a string

# Replace 'good' by 'BAD'
s.replace("good", "BAD")
'He is a BAD man from a BAD family'

17.12. Formatted strings#

We already learn about this

# Init variables
name = "Jack"
city = "New York"
# Use .format()
# This works with every version of Python
"{} is from {}".format(name, city)
'Jack is from New York'
# Use string interpolation
# This works with Python >= 3.6 only
f"{name} is from {city}"
'Jack is from New York'

For more advanced string formatting, check https://pyformat.info

17.13. Summary#

What is a string?

  • An immutable sequence of Unicode characters

  • Python does NOT distinguish between characters and strings

    • There is only the string type, there’s no character type

    • A single character is a string of length 1

Create a string

  • Use ', ", ''', or """ (they are all the same)

  • Triple quotes preserve new-line characters (thus, good for multi-line strings)

Index & slicing

  • Same as lists and tuples

  • One exception

    • Slice of length 1 in the list type is a list and is different from the element it contains

    • Slice of length 1 in str type is the same as the single character contained in that slice

Iterate through a string

  • Same as lists and tuples

Conversion to strings

  • Use str() to convert an object to a string

Copying

  • There’s no shallow copying

  • Deep copying is not necessary because both the string and its elements are immutable

  • Thus make a copy of s, just use an assignment t = s

Order of characters

  • Each character is encoded by an integer

  • Comparing characters is in fact comparing their integer codes

  • Use ord() to get the integer code of a character

  • Use chr() to get the character that an integer represents

  • Note that A < Z but a > Z

Operations on strings

  • Regular operations: same as tuples

  • Special string operations

    • Group 1: case checking

      • Ex: s.islower(), s.isupper(), s.isalnum()

    • Group 2: case transformations

      • Ex: s.upper(), s.lower()

    • Group 3: substring checking

      • Ex: "A" in s, s.startswith("A"), s.endswith("A")

    • Group 4: white space handling

      • Ex: s.lstrip(), s.rstrip(), s.strip()

    • Group 5: string padding

      • Ex: s.ljust(2, "0"), s.rjust(2, "0"), s.zfill(2)

    • Group 6: string splitting & joining

      • Ex: s.split(), s.partition(":"), ",".joint(a_list)

    • Group 7: searching and replacing

      • Ex: s.find("A"), s.rfind("A"), s.replace("old", "new")

  • Other notes

    • Cased characters: letters in a language. Ex: o-O, a-A, ê-Ê

    • Non-cased characters: the rest. Ex: 0, 1, ., ?, -

    • For advanced string processing, check re module

Formatted strings

  • .format() can be used for all versions of Python

  • f"" can be used for Python >= 3.6 only

17.14. Practice#

To be updated