String Basics

January 2, 2018

String Basics

Most machine learning problems will contain text or categorical features, and it is often necessary to wrangle them prior to analysis. Python comes with many functions that can assist with this task.


Removing Unnecessary Whitespace

lstrip removes leading whitespace.

start = '  a  '
start_lstrip = start.lstrip()
print('Result: {result}.'.format(result=start_lstrip))
Result: a  .


rstrip removes trailling whitespace.

start_rstrip = start.rstrip()
print('Result: {result}.'.format(result=start_rstrip))
Result:   a.


strip removes both leading and trailing whitespace.

start_strip = start.strip()
print('Result: {result}.'.format(result=start_strip))
Result: a.


Case Manipulation

upper returns a new, fully capitalised version of the original string.

my_string = 'i love python'
print(my_string.upper())
I LOVE PYTHON


lower returns a new, fully lower-cased version of the original string.

print(my_string.lower())
i love python


title converts the first character of each word to upper case, and all other letters to lower case.

print(my_string.title())
I Love Python


capitalize converts the first letter of the string to upper case, and all other letters to lower case.

print(my_string.capitalize())
I love python


Searching and Replacing

The find function returns the index of the object that we are searching for, if it is found.

my_string = 'The quick quick brown fox jumps over the lazy dog.'
my_string.find('quick')
4


find returns -1 if the object cannot be found.

my_string.find('cat')
-1


The replace function works as you would expect - it searches for all instances of the first input parameter and replaces all instances found with the second input parameter.

new_string = my_string.replace('dog', 'cat')
print(new_string)
The quick quick brown fox jumps over the lazy cat.


Splitting a String

If you need to convert a delimitered string to a list, the split function has you covered. Below, we demonstrate how to convert a comma-delimitered string to a list.

delimitered_string = 'a,b,c'
split = delimitered_string.split(',')
print(split)
['a', 'b', 'c']


Joining Items in a List to Form a String

To perform a reverse of split, you can utilise the join function.

joined = ','.join(split)
print(joined)
a,b,c
comments powered by Disqus