# String type
Author: Tue Nguyen

## Outline
- What is a string?
- Create a string
- Index & slice a string
- Iterate through a string
- Conversion to strings
- Mutability
- Copy a string
- Order of characters
- Operations on a string
- Formatted strings

## What is a string?
- A string is an **immutable** sequence of Unicode characters
- Python does NOT distinguish between characters and strings
    - There is only the string type, not character type
    - A single character is a string of length 1
    - Examples of strings: `"Hello"`, `"A"`, `"Good morning"`

## Create a string
- We can create a string using `'` or `"`
- There's no difference beween using `'` and using `"`
- When we want to preserve new-line characters, we can use triple quotes

**Ex 1: using single quotes**

In [1]:
s = 'Hello'
print(s)
print(type(s))

Hello
<class 'str'>


**Ex 2: using double quotes**

In [2]:
s = "Hello"
print(s)
print(type(s))

Hello
<class 'str'>


**Ex 3: single vs. double quotes**

In [12]:
# Use single quotes when your text contains double quotes
text = 'He said: "I am handsome"'
print(text)

He said: "I am handsome"


In [13]:
# Use double quotes when your text contains single quotes
text = "I have a bachelor's degree"
print(text)

I have a bachelor's degree


**Ex 4: Python is case-sensitive**

In [4]:
"Hello" == "HELLO"

False

**Ex 5: triple quotes**

In [8]:
# Use triple double quotes when you want a multi-line text
text = """This is the first line
This is the second ine"""

print(text)

This is the first line
This is the second ine


In [10]:
# Triple quote preserve any kinds of white spaces
text = """   Line 1
    Line 2
Line 3"""

print(text)

   Line 1
    Line 2
Line 3


In [11]:
# You can also use triple single quotes
text = '''Line 1
Line 2'''

print(text)

Line 1
Line 2


In [16]:
# Triple quotes are very convenient when you write SQL queries
q = """
select
    name,
    address,
    email
    from employee
where 
    (salary >= 50000)
    and (country = 'UK')
"""
print(q)


select
    name,
    address,
    email
    from employee
where 
    (salary >= 50000)
    and (country = 'UK')



**Ex 6: all quotes are the same**

In [17]:
print("hello" == 'hello')
print("hello" == '''hello''')
print("hello" == """hello""")

True
True
True


## Index a string
- Same as lists and tuples
- Note that each character in the string is like an element of a tuple

In [18]:
# Init
s = "Hello world"
s

'Hello world'

In [19]:
# First character
s[0]

'H'

In [20]:
# Last character
s[-1]

'd'

In [21]:
# Fifth character
s[4]

'o'

## Slice a string
- Same as lists and tuples
- However, there is one exception
    - For a list, a slice containing a single element is a list and it is different from that element
    - For a string, a slice containing a single character is the same as that character

In [22]:
# For a list
# A slice containing a single element != that element
x = [1, 2, 3]
print(x[0])
print(x[:1])
print(x[0] == x[:1])

1
[1]
False


In [23]:
# For a string
# A slice containing a single character == that character
s = "Hello"
print(s[0])
print(s[:1])
print(s[0] == s[:1])

H
H
True


## Iterate through a string
- Same as lists and tuples

In [24]:
# Init
s = "Hello"

In [25]:
# Iterate through each character and print it out
for c in  s:
    print(c)

H
e
l
l
o


## Conversion to strings
We use `str()` to cast an object to a string

In [171]:
# Bool to str
str(True)

'True'

In [170]:
# Int to str
str(100)

'100'

In [174]:
# Float to str
str(123.45)

'123.45'

In [180]:
# List to str
str([1, 2, 3])

'[1, 2, 3]'

In [177]:
# Tuple to string
str((1, 2, 3))

'(1, 2, 3)'

## Mutablity
- A string is **immutable**, meaning that we cannot change it
- Try `s[0] = "A"` to see the error

## Copy a string

### Shallow copying
- Same as tuples, there is no way to make a shallow copy of a string
- `t = s[:]` will result to an aliasing
- To make a copy of a string `s`, Simply use `t = s`
- Although `s` and `t` are pointing to the same object, we will not have to worry that changing `s` might change `t` because it's impposible to change the string or its elements once created  

### Deep copying
- Deepcopying for string is unnecessary

## Order of characters
- Each character is encoded by one integer
- So comparing characters are in fact comparing their underlying integer codes
- This might cause some confusion for beginners
    - Letters `A-Z` are encoded with integers from `65-90`
    - Letters `a-Z` are encoded with integers from `97-122`
    - Thus `A < B` but `a > B`
    - Therefore, when comparing strings, we first need to convert them to the same case (lower or upper)

In [52]:
# Check the underlying code using ord()
print(ord("A")) # 65
print(ord("Z")) # 90
print(ord("a")) # 97
print(ord("z")) # 122

65
90
97
122


In [53]:
# Compare strings
print("BOB" < "BILL") # False because O > I
print("MARK" < "MARY") # True because K < Y
print("Happy" < "beautiful") # Because H < b (not B)

False
True
True


In [65]:
# You can use chr() to get the Unicode character for a given integer
print(chr(65)) # Letter A
print(chr(80)) # Letter P
print(chr(2000)) # Some weird character

A
P
ߐ


## Operations on a string
- We have all regular operations for an immutable sequence like for a tuple
- Besides, we will have special operations for Unicode strings

### Regular operations
I will not repeat all but just some examples

In [32]:
# Init
s = "Hello"
s

'Hello'

In [33]:
# Count numer of characters
len(s)

5

In [35]:
# Check if 'h' in s (remember Python is case-sensitive)
'h' in s

False

In [34]:
# Check if "H" in s
"H" in s

True

In [37]:
# Count number of occurrences of "l"
s.count("l")

2

In [36]:
# Concat 2 strings
"Hello " + "world"

'Hello world'

In [68]:
# Sort a string with sorted()
# Note that sorted takes an iterable and returns a list
# no matter the input is a list, tuple, or string
# Thus, you always get a list back
print(sorted("World"))
print(sorted("world"))

['W', 'd', 'l', 'o', 'r']
['d', 'l', 'o', 'r', 'w']


### Special string operations
We can divide special string operations into the following groups
- Group 1: case checking
- Group 2: case transformations
- Group 3: substring checking
- Group 4: white space handling
- Group 5: string padding
- Group 6: string splitting & joining
- Group 7: searching and replacing

Further readings
- For the full reference, visit https://docs.python.org/3/library/stdtypes.html#string-methods
- For advanced string processing, check `re` module https://docs.python.org/3/library/re.html

#### Group 1: case checking
- First we need to distinguish between **cased** and **non-cased** characters
- Cased characters 
    - They are letters in a language such as English, Vietnamese, or Italian
    - They have uppercase and lowercase forms. Ex: `o-O`, `a-A`, `ê-Ê`
- Non-cased characters
    - They are the rest
    - And they have only form. Ex: `0`, `1`, `.`, `?`, `-`

**a) Check if ALL cased-characters in a string is in lowercase**

- Returns `True` if there is at least one cased chatacter and all cased characters are in lowercase
- Return `False` otherwise
- You can read the documentation using `?str.islower`

In [86]:
# Read the documentation
?str.islower

[1;31mSignature:[0m [0mstr[0m[1;33m.[0m[0mislower[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and
there is at least one cased character in the string.
[1;31mType:[0m      method_descriptor


In [77]:
"hello".islower() # True

True

In [80]:
"hello!".islower() # True

True

In [79]:
"Hello".islower() # False because of H

False

In [83]:
"012!@".islower() # False because there is no cased-character

False

**b) Check if ALL cased-characters in a string is in uppercase**

In [87]:
"HELLO".isupper() # True

True

In [88]:
"HELLO!@".isupper() # True

True

In [89]:
"HELLo".isupper() # False because of o

False

In [90]:
"012!@".isupper() # False because there is no cased character

False

**c) Check if the first character of each word is in uppercase and the rest is lowercase**

In [95]:
"Mr. John".istitle() # True

True

In [98]:
"MR. John".istitle() # False

False

**d) Check if all characters are letters**

In [100]:
"Hello".isalpha() # True

True

In [103]:
"Chào".isalpha() # True (all are Vietnamese letters)

True

In [101]:
"Hello world".isalpha() # False because of the space

False

In [102]:
"No1".isalpha() # Because of 1

False

**e) Check if all characters are digits `0-9`**

In [104]:
"1234".isdigit()

True

In [106]:
"012-345".isdigit() # False because of the hyphen

False

**f) Check if all characters are letters and/or digits (alpha numeric)**

In [108]:
"Number1".isalnum() # True

True

In [109]:
"Number 1".isalnum() # False because of the space

False

**g) Check if all characters are white spaces**

Note that white spaces include space (` `), tab (`\t`), and new line character (`\n`)

In [137]:
"    ".isspace() # True

True

In [138]:
" \t\n".isspace() # True

True

In [140]:
text = """


"""
text.isspace() # True

True

**h) More interesting examples**

In [117]:
# Init a string
s = "The word WTO is short for World Trade Organization"

In [118]:
# Ex 1: count the number of capital letters
num_caps = 0
for c in s:
    if c.isupper():
        num_caps += 1
num_caps

7

In [119]:
# Ex 2: count the number of spaces
num_spaces = 0
for c in s:
    if c.isspace():
        num_spaces += 1
num_spaces

8

#### Group 2: case transformations

In [123]:
# Init
s = "Python is fUn."
s

'Python is fUn.'

In [124]:
# Make uppercase
s.upper()

'PYTHON IS FUN.'

In [125]:
# Make lowercase
s.lower()

'python is fun.'

In [126]:
# Make a title 
s.title()

'Python Is Fun.'

In [127]:
# Make a scntence (Capitalize the first word only)
s.capitalize()

'Python is fun.'

In [128]:
# Swap case (lower -> upper, upper -> lower)
s.swapcase()

'pYTHON IS FuN.'

#### Group 3: substring checking

In [129]:
# Init
s = "Python is fUn."
s

'Python is fUn.'

**a) Contains**

In [130]:
# Check if "Python" is in s
"Python" in s

True

In [131]:
# Check if "python" is in s
"python" in s

False

**b) Starts with**

In [132]:
# Check if s starts with "Py"
s.startswith("Py")

True

In [133]:
# Check if s starts with "py"
s.startswith("py")

False

**c) Ends with**

In [134]:
# Check if s ends with "."
s.endswith(".")

True

In [135]:
# Check if s ends with "!"
s.endswith("!")

False

#### Group 4: white space handling

In [149]:
# Init a s tring
s = "\t   Hello\t"
s

'\t   Hello\t'

In [150]:
# Remove leading white spaces (left strip)
s.lstrip()

'Hello\t'

In [151]:
# Remove trailing white spaces (right strip)
s.rstrip()

'\t   Hello'

In [152]:
# Remove spaces at both ends (strip)
s.strip()

'Hello'

#### Group 5: string padding
- Padding a string `s` is adding a fill character to the left or the right of `s`
- Normally we want `s` to have a fixed length
- If `s` has not reach that fixed lenth, we want to fill in some character to achieve that goal
- Ex: we want `'01'`, `'02'`, ..., `'11'`, `'12'` but the original data are `'1'`, `'2'`, ..., `'11'`, `'12'` 

In [156]:
# Init a string
s = "A"
s

'A'

In [161]:
# Read the document for justification
?str.rjust

[1;31mSignature:[0m [0mstr[0m[1;33m.[0m[0mrjust[0m[1;33m([0m[0mself[0m[1;33m,[0m [0mwidth[0m[1;33m,[0m [0mfillchar[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).
[1;31mType:[0m      method_descriptor


In [166]:
# Pad 0's to the LEFT of s (right just) to get a string of length 3
s.rjust(3, "0")

'00A'

In [167]:
# Pad 0's to the RIGHT of s (left just) to get a string of length 3
s.ljust(3, "0")

'A00'

In [169]:
# Since filling leading zeros is a common action
# Python string has a method .zfill()
# Ex: fill leading zeros to make a string of length 3
s.zfill(3)

'00A'

In [181]:
# Print out 'month_01', 'month_02', ..., `month_12`
for i in range(1, 13):
    month = str(i).zfill(2)
    print("month_" + month)

month_01
month_02
month_03
month_04
month_05
month_06
month_07
month_08
month_09
month_10
month_11
month_12


#### Group 6: string splitting & joining

**a) Split a string based on a given token**

In [183]:
# Split a string by white spaces
"How are you?\nI am fine".split()

['How', 'are', 'you?', 'I', 'am', 'fine']

In [184]:
# Split a string by hyphens
"2022-01-01".split("-")

['2022', '01', '01']

In [185]:
# Split a string by a token
"John Doe||john@example.com||UK".split("||")

['John Doe', 'john@example.com', 'UK']

**b) Split a string into partitions**

In [188]:
# Suppose s is "Math:90"
# We want to split s into 3 parts based on the separator `:`
"Math:90".partition(":")

('Math', ':', '90')

**c) Join a list of string**

In [190]:
# Init a list of fruits
fruits = ["Apple", "Banana", "Guava", "Watermelon"]

In [191]:
# Join fruits using ,
",".join(fruits)

'Apple,Banana,Guava,Watermelon'

In [196]:
# Join fruits using a space
" ".join(fruits)

'Apple Banana Guava Watermelon'

In [195]:
# Join fruits using --
"--".join(fruits)

'Apple--Banana--Guava--Watermelon'

#### Group 7: searching and replacing

**a) Find a token in a string**
- Since a string is a sequence, we can use `.index()`
- However, `.index()` will raise an error if the token is not in the string
- A safter way is to use `.find()`
- It worsk the same way as `.index()` does, but if the token is not in the string, it will return `-1` instead of raising an error
- `.find()` is available for strings only
- To find from the right, use `.rfind()`

In [222]:
# Init a string
s = "He is a good man from a good family"

In [223]:
# Find 'good ' from the left 
s.find("good")

8

In [224]:
# Find 'good ' from the right 
s.rfind("good")

24

**b) Replace a pattern in a string**

In [227]:
# Replace 'good' by 'BAD'
s.replace("good", "BAD")

'He is a BAD man from a BAD family'

## Formatted strings
We already learn about this

In [200]:
# Init variables
name = "Jack"
city = "New York"

In [204]:
# Use .format()
# This works with every version of Python
"{} is from {}".format(name, city)

'Jack is from New York'

In [205]:
# Use string interpolation
# This works with Python >= 3.6 only
f"{name} is from {city}"

'Jack is from New York'

For more advanced string formatting, check https://pyformat.info

## Summary
**What is a string?**
- An **immutable** sequence of Unicode characters
- Python does NOT distinguish between characters and strings
    - There is only the string type, there's no character type
    - A single character is a string of length 1

**Create a string**
- Use `'`, `"`, `'''`, or `"""` (they are all the same)
- Triple quotes preserve new-line characters (thus, good for multi-line strings)

**Index & slicing**
- Same as lists and tuples
- One exception
    - Slice of length 1 in the `list` type is a list and is different from the element it contains
    - Slice of length 1 in `str` type is the same as the single character contained in that slice

**Iterate through a string**
- Same as lists and tuples

**Conversion to strings**
- Use `str()` to convert an object to a string

**Copying**
- There's no shallow copying
- Deep copying is not necessary because both the string and its elements are immutable
- Thus make a copy of `s`, just use an assignment `t = s`

**Order of characters**
- Each character is encoded by an integer
- Comparing characters is in fact comparing their integer codes
- Use `ord()` to get the integer code of a character
- Use `chr()` to get the character that an integer represents
- Note that `A < Z` but `a > Z`

**Operations on strings**
- Regular operations: same as tuples
- Special string operations
    - Group 1: case checking 
        - Ex: `s.islower()`, `s.isupper()`, `s.isalnum()`
    - Group 2: case transformations 
        - Ex: `s.upper()`, `s.lower()`
    - Group 3: substring checking 
        - Ex: `"A" in s`, `s.startswith("A")`, `s.endswith("A")`
    - Group 4: white space handling
        - Ex: `s.lstrip()`, `s.rstrip()`, `s.strip()`
    - Group 5: string padding
        - Ex: `s.ljust(2, "0")`, `s.rjust(2, "0")`, `s.zfill(2)`
    - Group 6: string splitting & joining
        - Ex: `s.split()`, `s.partition(":")`, `",".joint(a_list)`
    - Group 7: searching and replacing
        - Ex: `s.find("A")`, `s.rfind("A")`, `s.replace("old", "new")`
- Other notes
    - Cased characters: letters in a language. Ex: `o-O`, `a-A`, `ê-Ê`
    - Non-cased characters: the rest. Ex: `0`, `1`, `.`, `?`, `-`
    - For advanced string processing, check `re` module

**Formatted strings**
- `.format()` can be used for all versions of Python
- `f""` can be used for Python `>= 3.6` only

## Practice
To be updated