String type
Contents
17. String type#
Author: Tue Nguyen
17.1. Outline#
What is a string?
Create a string
Index & slice a string
Iterate through a string
Conversion to strings
Mutability
Copy a string
Order of characters
Operations on a string
Formatted strings
17.2. What is a string?#
A string is an immutable sequence of Unicode characters
Python does NOT distinguish between characters and strings
There is only the string type, not character type
A single character is a string of length 1
Examples of strings:
"Hello"
,"A"
,"Good morning"
17.3. Create a string#
We can create a string using
'
or"
There’s no difference beween using
'
and using"
When we want to preserve new-line characters, we can use triple quotes
Ex 1: using single quotes
s = 'Hello'
print(s)
print(type(s))
Hello
<class 'str'>
Ex 2: using double quotes
s = "Hello"
print(s)
print(type(s))
Hello
<class 'str'>
Ex 3: single vs. double quotes
# Use single quotes when your text contains double quotes
text = 'He said: "I am handsome"'
print(text)
He said: "I am handsome"
# Use double quotes when your text contains single quotes
text = "I have a bachelor's degree"
print(text)
I have a bachelor's degree
Ex 4: Python is case-sensitive
"Hello" == "HELLO"
False
Ex 5: triple quotes
# Use triple double quotes when you want a multi-line text
text = """This is the first line
This is the second ine"""
print(text)
This is the first line
This is the second ine
# Triple quote preserve any kinds of white spaces
text = """ Line 1
Line 2
Line 3"""
print(text)
Line 1
Line 2
Line 3
# You can also use triple single quotes
text = '''Line 1
Line 2'''
print(text)
Line 1
Line 2
# Triple quotes are very convenient when you write SQL queries
q = """
select
name,
address,
email
from employee
where
(salary >= 50000)
and (country = 'UK')
"""
print(q)
select
name,
address,
email
from employee
where
(salary >= 50000)
and (country = 'UK')
Ex 6: all quotes are the same
print("hello" == 'hello')
print("hello" == '''hello''')
print("hello" == """hello""")
True
True
True
17.4. Index a string#
Same as lists and tuples
Note that each character in the string is like an element of a tuple
# Init
s = "Hello world"
s
'Hello world'
# First character
s[0]
'H'
# Last character
s[-1]
'd'
# Fifth character
s[4]
'o'
17.5. Slice a string#
Same as lists and tuples
However, there is one exception
For a list, a slice containing a single element is a list and it is different from that element
For a string, a slice containing a single character is the same as that character
# For a list
# A slice containing a single element != that element
x = [1, 2, 3]
print(x[0])
print(x[:1])
print(x[0] == x[:1])
1
[1]
False
# For a string
# A slice containing a single character == that character
s = "Hello"
print(s[0])
print(s[:1])
print(s[0] == s[:1])
H
H
True
17.6. Iterate through a string#
Same as lists and tuples
# Init
s = "Hello"
# Iterate through each character and print it out
for c in s:
print(c)
H
e
l
l
o
17.7. Conversion to strings#
We use str()
to cast an object to a string
# Bool to str
str(True)
'True'
# Int to str
str(100)
'100'
# Float to str
str(123.45)
'123.45'
# List to str
str([1, 2, 3])
'[1, 2, 3]'
# Tuple to string
str((1, 2, 3))
'(1, 2, 3)'
17.8. Mutablity#
A string is immutable, meaning that we cannot change it
Try
s[0] = "A"
to see the error
17.9. Copy a string#
17.9.1. Shallow copying#
Same as tuples, there is no way to make a shallow copy of a string
t = s[:]
will result to an aliasingTo make a copy of a string
s
, Simply uset = s
Although
s
andt
are pointing to the same object, we will not have to worry that changings
might changet
because it’s impposible to change the string or its elements once created
17.9.2. Deep copying#
Deepcopying for string is unnecessary
17.10. Order of characters#
Each character is encoded by one integer
So comparing characters are in fact comparing their underlying integer codes
This might cause some confusion for beginners
Letters
A-Z
are encoded with integers from65-90
Letters
a-Z
are encoded with integers from97-122
Thus
A < B
buta > B
Therefore, when comparing strings, we first need to convert them to the same case (lower or upper)
# Check the underlying code using ord()
print(ord("A")) # 65
print(ord("Z")) # 90
print(ord("a")) # 97
print(ord("z")) # 122
65
90
97
122
# Compare strings
print("BOB" < "BILL") # False because O > I
print("MARK" < "MARY") # True because K < Y
print("Happy" < "beautiful") # Because H < b (not B)
False
True
True
# You can use chr() to get the Unicode character for a given integer
print(chr(65)) # Letter A
print(chr(80)) # Letter P
print(chr(2000)) # Some weird character
A
P
ߐ
17.11. Operations on a string#
We have all regular operations for an immutable sequence like for a tuple
Besides, we will have special operations for Unicode strings
17.11.1. Regular operations#
I will not repeat all but just some examples
# Init
s = "Hello"
s
'Hello'
# Count numer of characters
len(s)
5
# Check if 'h' in s (remember Python is case-sensitive)
'h' in s
False
# Check if "H" in s
"H" in s
True
# Count number of occurrences of "l"
s.count("l")
2
# Concat 2 strings
"Hello " + "world"
'Hello world'
# Sort a string with sorted()
# Note that sorted takes an iterable and returns a list
# no matter the input is a list, tuple, or string
# Thus, you always get a list back
print(sorted("World"))
print(sorted("world"))
['W', 'd', 'l', 'o', 'r']
['d', 'l', 'o', 'r', 'w']
17.11.2. Special string operations#
We can divide special string operations into the following groups
Group 1: case checking
Group 2: case transformations
Group 3: substring checking
Group 4: white space handling
Group 5: string padding
Group 6: string splitting & joining
Group 7: searching and replacing
Further readings
For the full reference, visit https://docs.python.org/3/library/stdtypes.html#string-methods
For advanced string processing, check
re
module https://docs.python.org/3/library/re.html
17.11.2.1. Group 1: case checking#
First we need to distinguish between cased and non-cased characters
Cased characters
They are letters in a language such as English, Vietnamese, or Italian
They have uppercase and lowercase forms. Ex:
o-O
,a-A
,ê-Ê
Non-cased characters
They are the rest
And they have only form. Ex:
0
,1
,.
,?
,-
a) Check if ALL cased-characters in a string is in lowercase
Returns
True
if there is at least one cased chatacter and all cased characters are in lowercaseReturn
False
otherwiseYou can read the documentation using
?str.islower
# Read the documentation
?str.islower
"hello".islower() # True
True
"hello!".islower() # True
True
"Hello".islower() # False because of H
False
"012!@".islower() # False because there is no cased-character
False
b) Check if ALL cased-characters in a string is in uppercase
"HELLO".isupper() # True
True
"HELLO!@".isupper() # True
True
"HELLo".isupper() # False because of o
False
"012!@".isupper() # False because there is no cased character
False
c) Check if the first character of each word is in uppercase and the rest is lowercase
"Mr. John".istitle() # True
True
"MR. John".istitle() # False
False
d) Check if all characters are letters
"Hello".isalpha() # True
True
"Chào".isalpha() # True (all are Vietnamese letters)
True
"Hello world".isalpha() # False because of the space
False
"No1".isalpha() # Because of 1
False
e) Check if all characters are digits 0-9
"1234".isdigit()
True
"012-345".isdigit() # False because of the hyphen
False
f) Check if all characters are letters and/or digits (alpha numeric)
"Number1".isalnum() # True
True
"Number 1".isalnum() # False because of the space
False
g) Check if all characters are white spaces
Note that white spaces include space (
), tab (\t
), and new line character (\n
)
" ".isspace() # True
True
" \t\n".isspace() # True
True
text = """
"""
text.isspace() # True
True
h) More interesting examples
# Init a string
s = "The word WTO is short for World Trade Organization"
# Ex 1: count the number of capital letters
num_caps = 0
for c in s:
if c.isupper():
num_caps += 1
num_caps
7
# Ex 2: count the number of spaces
num_spaces = 0
for c in s:
if c.isspace():
num_spaces += 1
num_spaces
8
17.11.2.2. Group 2: case transformations#
# Init
s = "Python is fUn."
s
'Python is fUn.'
# Make uppercase
s.upper()
'PYTHON IS FUN.'
# Make lowercase
s.lower()
'python is fun.'
# Make a title
s.title()
'Python Is Fun.'
# Make a scntence (Capitalize the first word only)
s.capitalize()
'Python is fun.'
# Swap case (lower -> upper, upper -> lower)
s.swapcase()
'pYTHON IS FuN.'
17.11.2.3. Group 3: substring checking#
# Init
s = "Python is fUn."
s
'Python is fUn.'
a) Contains
# Check if "Python" is in s
"Python" in s
True
# Check if "python" is in s
"python" in s
False
b) Starts with
# Check if s starts with "Py"
s.startswith("Py")
True
# Check if s starts with "py"
s.startswith("py")
False
c) Ends with
# Check if s ends with "."
s.endswith(".")
True
# Check if s ends with "!"
s.endswith("!")
False
17.11.2.4. Group 4: white space handling#
# Init a s tring
s = "\t Hello\t"
s
'\t Hello\t'
# Remove leading white spaces (left strip)
s.lstrip()
'Hello\t'
# Remove trailing white spaces (right strip)
s.rstrip()
'\t Hello'
# Remove spaces at both ends (strip)
s.strip()
'Hello'
17.11.2.5. Group 5: string padding#
Padding a string
s
is adding a fill character to the left or the right ofs
Normally we want
s
to have a fixed lengthIf
s
has not reach that fixed lenth, we want to fill in some character to achieve that goalEx: we want
'01'
,'02'
, …,'11'
,'12'
but the original data are'1'
,'2'
, …,'11'
,'12'
# Init a string
s = "A"
s
'A'
# Read the document for justification
?str.rjust
# Pad 0's to the LEFT of s (right just) to get a string of length 3
s.rjust(3, "0")
'00A'
# Pad 0's to the RIGHT of s (left just) to get a string of length 3
s.ljust(3, "0")
'A00'
# Since filling leading zeros is a common action
# Python string has a method .zfill()
# Ex: fill leading zeros to make a string of length 3
s.zfill(3)
'00A'
# Print out 'month_01', 'month_02', ..., `month_12`
for i in range(1, 13):
month = str(i).zfill(2)
print("month_" + month)
month_01
month_02
month_03
month_04
month_05
month_06
month_07
month_08
month_09
month_10
month_11
month_12
17.11.2.6. Group 6: string splitting & joining#
a) Split a string based on a given token
# Split a string by white spaces
"How are you?\nI am fine".split()
['How', 'are', 'you?', 'I', 'am', 'fine']
# Split a string by hyphens
"2022-01-01".split("-")
['2022', '01', '01']
# Split a string by a token
"John Doe||john@example.com||UK".split("||")
['John Doe', 'john@example.com', 'UK']
b) Split a string into partitions
# Suppose s is "Math:90"
# We want to split s into 3 parts based on the separator `:`
"Math:90".partition(":")
('Math', ':', '90')
c) Join a list of string
# Init a list of fruits
fruits = ["Apple", "Banana", "Guava", "Watermelon"]
# Join fruits using ,
",".join(fruits)
'Apple,Banana,Guava,Watermelon'
# Join fruits using a space
" ".join(fruits)
'Apple Banana Guava Watermelon'
# Join fruits using --
"--".join(fruits)
'Apple--Banana--Guava--Watermelon'
17.11.2.7. Group 7: searching and replacing#
a) Find a token in a string
Since a string is a sequence, we can use
.index()
However,
.index()
will raise an error if the token is not in the stringA safter way is to use
.find()
It worsk the same way as
.index()
does, but if the token is not in the string, it will return-1
instead of raising an error.find()
is available for strings onlyTo find from the right, use
.rfind()
# Init a string
s = "He is a good man from a good family"
# Find 'good ' from the left
s.find("good")
8
# Find 'good ' from the right
s.rfind("good")
24
b) Replace a pattern in a string
# Replace 'good' by 'BAD'
s.replace("good", "BAD")
'He is a BAD man from a BAD family'
17.12. Formatted strings#
We already learn about this
# Init variables
name = "Jack"
city = "New York"
# Use .format()
# This works with every version of Python
"{} is from {}".format(name, city)
'Jack is from New York'
# Use string interpolation
# This works with Python >= 3.6 only
f"{name} is from {city}"
'Jack is from New York'
For more advanced string formatting, check https://pyformat.info
17.13. Summary#
What is a string?
An immutable sequence of Unicode characters
Python does NOT distinguish between characters and strings
There is only the string type, there’s no character type
A single character is a string of length 1
Create a string
Use
'
,"
,'''
, or"""
(they are all the same)Triple quotes preserve new-line characters (thus, good for multi-line strings)
Index & slicing
Same as lists and tuples
One exception
Slice of length 1 in the
list
type is a list and is different from the element it containsSlice of length 1 in
str
type is the same as the single character contained in that slice
Iterate through a string
Same as lists and tuples
Conversion to strings
Use
str()
to convert an object to a string
Copying
There’s no shallow copying
Deep copying is not necessary because both the string and its elements are immutable
Thus make a copy of
s
, just use an assignmentt = s
Order of characters
Each character is encoded by an integer
Comparing characters is in fact comparing their integer codes
Use
ord()
to get the integer code of a characterUse
chr()
to get the character that an integer representsNote that
A < Z
buta > Z
Operations on strings
Regular operations: same as tuples
Special string operations
Group 1: case checking
Ex:
s.islower()
,s.isupper()
,s.isalnum()
Group 2: case transformations
Ex:
s.upper()
,s.lower()
Group 3: substring checking
Ex:
"A" in s
,s.startswith("A")
,s.endswith("A")
Group 4: white space handling
Ex:
s.lstrip()
,s.rstrip()
,s.strip()
Group 5: string padding
Ex:
s.ljust(2, "0")
,s.rjust(2, "0")
,s.zfill(2)
Group 6: string splitting & joining
Ex:
s.split()
,s.partition(":")
,",".joint(a_list)
Group 7: searching and replacing
Ex:
s.find("A")
,s.rfind("A")
,s.replace("old", "new")
Other notes
Cased characters: letters in a language. Ex:
o-O
,a-A
,ê-Ê
Non-cased characters: the rest. Ex:
0
,1
,.
,?
,-
For advanced string processing, check
re
module
Formatted strings
.format()
can be used for all versions of Pythonf""
can be used for Python>= 3.6
only
17.14. Practice#
To be updated