Python Programming

Lecture 5 Strings

5.1 Strings

  • A string is a sequence of characters. The elements of a string are characters. Empty string ''.(not' ') You can access the characters one at a time with the bracket operator:

  • 
    >>> fruit = 'banana'
    >>> letter = fruit[1]
    >>> print(letter)
    a
    >>> len(fruit)
    6
    
    
    >>> fruit = 'banana'
    >>> fruit[1:3]
    'an'
    >>> fruit[3:]
    'ana'
    
  • in operator

  • 
    >>> print('a' in 'banana')
    True
    >>> print('seed' in 'banana')
    False
    
  • iteration

  • 
    fruit = 'banana'
    for char in fruit:
        print(char)
    
  • Strings are immutable

  • 
    >>> greeting = 'Hello, world!'
    >>> greeting[0] = 'J'
    
    TypeError: 'str' object does not 
    support item assignment
    
    
    >>> greeting = 'Hello, world!'
    >>> new_greeting = 'J' + greeting[1:]
    >>> print(new_greeting)
    
    Jello, world!
    
  • Comparison operations are useful for putting words in alphabetical order.

  • 
    >>> print('apple'>'banana')
    False
    >>> print('ba' > 'banana')
    False
    >>> a_list = ["orange", "apple", "banana"]
    >>> sorted(a_list)
    ['apple', 'banana', 'orange']
    

Methods of Strings

  • .title(), .lower(), .upper()

  • String's methods do not change the original variable but return values.

  • 
    >>> name = "ada lovelace"
    >>> print(name.title())
    Ada Lovelace
    
    >>> print(name)
    ada lovelace
    
    
    >>> name = "Ada Lovelace"
    >>> print(name.upper())
    ADA LOVELACE
    
    >>> print(name.lower())
    ada lovelace
    
Stripping Whitespace
  • To programmers 'python' and 'python ' look pretty much the same. But to a program, they are two different strings. To ensure that no whitespace exists at the right end of a string, use the rstrip() method.

>>> favorite_language = 'python '
>>> favorite_language
'python '
>>> favorite_language.rstrip()
'python'
>>> favorite_language
'python '
  • However, it is only removed temporarily. To remove the whitespace from the string permanently, you have to store the stripped value back into the variable:

>>> favorite_language = 'python '
>>> favorite_language = favorite_language.rstrip()
>>> favorite_language
'python'
  • You can also strip whitespace from the left side of a string using the lstrip() method or strip whitespace from both sides at once using strip():

>>> favorite_language = ' python '
>>> favorite_language.rstrip()
' python'
>>> favorite_language.lstrip()
'python '
>>> favorite_language.strip()
'python'
Parsing strings
  • .find() searches for the position of a string in another string


>>> word = 'banana'
>>> index = word.find('a')
>>> print(index)
1
>>> word.find('na')
2
>>> word.find('na', 3)
4

>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1 : sppos]
>>> print(host)
uct.ac.za
  • .startswith() returns the boolean value


>>> line = 'Have a nice day'
>>> line.startswith('h') 
False

>>> line.lower()
'have a nice day'

>>> line.lower().startswith('h')
True
  • .split() break a sentence into words and make a list


>>> s = 'pining for the fjords'
>>> t = s.split()
>>> print(t)
['pining', 'for', 'the', 'fjords']
>>> print(t[2])
the
Format operator
  • .format() allows us to construct strings, replacing parts of the strings with the data stored in variables.

  • 
    >>> number = 42
    >>> print('I have spotted {} camels.' .format(number))
    I have spotted 42 camels.
    
  • What if we do not use format operator?

  • 
    >>> number = 42
    >>> print('I have spotted number camels.') #error.
    >>> print('I have spotted '+str(number)+' camels.') #not simple
    
  • The number of elements in the tuple must match the number of format sequences in the string. The types of the elements also must match the format sequences.

  • 
    >>> print('In {} years I spotted {} {}.'.format(3, 0.1, 'camels'))
    In 3 years I have spotted 0.1 camels.
    
    >>> print('In {0} years I have spotted {1} {2}.'.format(3, 0.1, 'camels'))
    In 3 years I have spotted 0.1 camels.
    
    >>> print('In {1} years I have spotted {0} {2}.'.format(3, 0.1, 'camels'))
    In 0.1 years I have spotted 3 camels.
    

String: Summary

  • The elements of a string are characters. Empty string ''

  • Features: Ordered, Immutable, Repeatable

  • Index and slice are the same with that of lists.

  • in operator shows the boolean value for whether a string contains a given string. You can compare two strings in Alphabetical order.

  • .upper(), lower(), .title()

  • rstrip(), .lstrip(), .strip()

  • .find(), .startwith(), .split()

  • .format()

5.2 成语接龙

  • 1. 加载成语词典。
  • 2. 给定一个成语,找到可以接上的所有成语。
  • 3. 从一个给定的成语开始,一直接下去,到不能接下去为止。

# 1. 加载成语词典
filename = 'idiom_dictionary.txt'
with open(filename, encoding="utf-8") as file_object:
    lines = file_object.readlines() #List

d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom = line[:endpoint].strip()
        pinyin_start = line.find(":", endpoint)
        pinyin_end =line.find("释义")
        each= line[pinyin_start+1: pinyin_end]
        pinyin_list = each.split()
        d_game[idiom] = pinyin_list
len(d_game)

# 2. 给定一个成语,找到可以接上的所有成语
idiom = input("请输入第一个成语\n")
char_4th = d_game[idiom][-1]
for x, y in d_game.items():
    if char_4th == y[0]:
        print(x)

# 3. 从一个给定的成语开始,一直接下去,到不能接下去为止。
idiom = input("请输入第一个成语\n")
enter=""
while enter!="q":
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            break
    enter=input("continue?")
  • 有可能找不到能够接下去的成语了,但是程序不会结束,怎么解决呢?

idiom = input("请输入第一个成语\n")
char_4th = d_game[idiom][-1]
enter=""
exist = True
while enter!="q" and exist:
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            exist = True
            break
        else:
            exist = False
    if exist:
        enter=input("continue?")
    else:
        print("对不起,没有成语了")   
  • 可进一步添加如下功能:
  • 基本释义,谐音取词,人机对战,随机取词,模糊匹配

# 基本释义功能
# 修改第一步
d_ex={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom = line[:endpoint].strip()
        pinyin_end =line.find("释义")
        pinyin_start = line.find(":", endpoint)
        explanation = line[pinyin_end:]
        d_ex[idiom] = explanation
words = input("请输入要查询的成语\n")
print(d_ex[words])

# 谐音取词
# 修改第一步
import unicodedata
d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom = line[:endpoint].strip()
        pinyin_start = line.find(":", endpoint)
        pinyin_end =line.find("释义")
        each= line[pinyin_start+1: pinyin_end]
        each = unicodedata.normalize('NFKD', each).encode('ascii','ignore').decode()
        pinyin_list = each.split()
        d_game[idiom] = pinyin_list

Summary

  • Strings
  • Reading: Python for Everybody
    • Strings Chapter 6