Overcoming Script Thinking

Before beginning, the Python class example came from


Don’t get me wrong; bash/bourne shell scripts are good. They have their place, and they are responsible for making a lot of things happen. But I noticed that when I wrote a data conversion program a few weeks ago — a program to convert the output from a student administration system to the input of a student health system — I wrote with C language characteristics, but the program took on the attributes of a shell script, pretty much straight-line from top to bottom. There were only a usage and “main” functions. The  program was implemented using Python.

Overcoming the shell script mentality is a matter of asking questions to determine if the investment in file reader classes (C++ or Python) or extensive functions (C/C++) is worth it. That is a decision must be made to find out if taking the time to write all that extra code could or would be needed by other programs in the future. Then, you must decide if time is a factor, and, decide to make the investment to write longer lasting code or not.

Given the program had to read one format and write out another, there were many different ways to set position variables on the input and output, so the information would be read and written correctly. In C++ an iterator class could have been written with two instaniations of the class, one for input, the other for output. In C a bunch of functions perhaps using either heap or module-scope memory could keep track of input and output as well. What made this slightly more difficult is some information in the output had to be combined from two or more fields in the input, and some output information had to be deliberately left blank.

In Python, there is strong encouragement to look beyond traditional C/C++ alternatives. In other words, don’t parse the input based on field starting positions and lengths.

# These are useful helper variables that will be used to pull
# apart the line to make sense of what’s a field and what’s
# not, like commas (,) inside double quotes (#).

current_line_idx = 0
substr_len = 0

line_len = len(input_line)

# Wahoo! This is a fixed position file. We can just load up the
# output by extracting substrings without parsing. This is all
# based on the SNAP interface document.
# This is just straightforward, boring, field extraction, no
# loops, nothing clever, nothing fancy. We just have plain old
# field extraction.

substr_len = 12
end_idx = start_idx + substr_len
output_string += input_line[start_idx:end_idx]      # refid
start_idx += substr_len

Instead, allow iterators and generators to take up the burdon, and in the process the program will run faster. (Python is not a compiled language; it’s interpreted, and by its own nature can run slower than a program in a compiled language, like C/C++.)

#!/usr/bin/env python

This is iterator_test.py, and is an example of picking fields out of a document.


import sys

class FileReader(object):
def __init__(self, f, *fields):
self.f = f
self.fields = fields

def next(self):

line = self.f.next().rstrip()
pieces = []

for t in self.fields:
start = t[0] – 1
end = t[1]

return pieces

def __iter__(self):
return self

f = open(“data.txt”)
fr = FileReader(f, (1, 8), (9, 12))

for line_chunks in fr:
print line_chunks

So, I’ve come to the conclusion this is worth writing. If you’re going to learn a language — in my case Python — then learn the best aspects of writing in that language.

More examples of good Python syntax may be found in Tarek Ziade’s book Expert Python Programming



Leave a comment

Filed under General

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s