String Basics
January 2, 2018
String Basics
Most machine learning problems will contain text or categorical features, and it is often necessary to wrangle them prior to analysis. Python comes with many functions that can assist with this task.
Removing Unnecessary Whitespace
lstrip
removes leading whitespace.
start = ' a '
start_lstrip = start.lstrip()
print('Result: {result}.'.format(result=start_lstrip))
Result: a .
rstrip
removes trailling whitespace.
start_rstrip = start.rstrip()
print('Result: {result}.'.format(result=start_rstrip))
Result: a.
strip
removes both leading and trailing whitespace.
start_strip = start.strip()
print('Result: {result}.'.format(result=start_strip))
Result: a.
Case Manipulation
upper
returns a new, fully capitalised version of the original string.
my_string = 'i love python'
print(my_string.upper())
I LOVE PYTHON
lower
returns a new, fully lower-cased version of the original string.
print(my_string.lower())
i love python
title
converts the first character of each word to upper case, and all other letters to lower case.
print(my_string.title())
I Love Python
capitalize
converts the first letter of the string to upper case, and all other letters to lower case.
print(my_string.capitalize())
I love python
Searching and Replacing
The find
function returns the index of the object that we are searching for, if it is found.
my_string = 'The quick quick brown fox jumps over the lazy dog.'
my_string.find('quick')
4
find
returns -1 if the object cannot be found.
my_string.find('cat')
-1
The replace
function works as you would expect - it searches for all instances of the first input parameter and replaces all instances found with the second input parameter.
new_string = my_string.replace('dog', 'cat')
print(new_string)
The quick quick brown fox jumps over the lazy cat.
Splitting a String
If you need to convert a delimitered string to a list, the split
function has you covered. Below, we demonstrate how to convert a comma-delimitered string to a list.
delimitered_string = 'a,b,c'
split = delimitered_string.split(',')
print(split)
['a', 'b', 'c']
Joining Items in a List to Form a String
To perform a reverse of split
, you can utilise the join
function.
joined = ','.join(split)
print(joined)
a,b,c