Creating a Script
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How can I create a Python script to run outside of Jupyter?
How can I use command-line arguments?
Objectives
Use argparse
Transform Jupyter notebook code into a script
During this workshop we’ve used Jupyter Notebooks to try out code. Notebooks are really good for doodling, sketching, testing, etc. We can try multiple approaches to a problem and develop one that we really like.
Notebooks are not as good for production usage. Once we have a solution that we like, it would be more useful to run all the code at once instead of hitting Shift
+ Enter
on every cell in a notebook.
Python Scripts
To move our code into a production mode, we can create a Python script file. A python script file is a text file with Python code in it. You can create a script file with any plain text editor like (Notepad++, Sublime Text, Atom, Notepad). But, you can’t use a document editor like Microsoft Word or Google Docs. These editors re-encode the text into a format that Python can’t interpret.
Creating a Python Script
Anaconda also has a text editor, so we will use that to turn some of our previous file survey code into a script.
To open a text file in Anaconda, click the +
button in the left sidebar and click on the Text File
tile in the main tab.
A text editing interface will open.
We will immediately rename this file by going to File > Save File As… and giving it the name filesurvey.py
.
Python Extensions
A python script is a text file. It can use anything for an extension as long as the contents are valid Python code. We typically use
.py
to tell other people that the file contains Python code.
A Python script is code that is read and executed from the top of the script to the bottom. If you are in a Jupyter Notebook and the first cell contains
os.path.join('video_dir', 'project_name')
and second cell contains
import os
You can execute the second cell and then the first cell and everything will work. If you have a Python script, and the code reads
os.path.join('video_dir', 'project_name')
import os
When you try to run this script, it will report an error.
This is a good reason to put all import
statements at the top of a script.
This ensures that their functions have been loaded when they are used in the script.
Import Statements for
filesurvey.py
Open a text editor tab in Jupyter. At the top of the file, add the
import
statements that you will need to run the file survey loop that we built in the previous lesson.Solution
We need the following modules.
import os from pymediainfo import MediaInfo import csv
With all the necessary import
statements are at the top of the file, we can copy-paste in the other portions of the loop.
Go cell-by-cell to copy code from your previous notebook and past it into your text file.
A simplified version of the code is provided below is available if you run into any problems. Change any paths so that they reflect your computer’s folders.
video_dir = '/Users/username/Desktop/pyforav'
mov_list = glob.glob(os.path.join(video_dir, '**', '*mov'))
all_file_data = []
for item in mov_list:
media_info = MediaInfo.parse(item)
for track in media_info.tracks:
if track.track_type == "General":
general_data = [
track.file_name,
track.file_extension,
track.format,
track.file_size,
track.duration]
all_file_data.append(general_data)
with open('/Users/pyforav/Desktop/script_output.csv', 'w') as f:
md_csv = csv.writer(f)
md_csv.writerow([
'filename',
'extension',
'format',
'size',
'duration'
])
md_csv.writerows(sorted(all_file_data))
After saving this file, we can run it using terminal. For this, we’ll use Anaconda’s built-in terminal since it has Python installed. You could also use your computer’s default terminal like Terminal on MacOS or Command on Windows if you have Python configured for those.
To open a terminal in Anaconda, click the +
button in the left sidebar and click on the Terminal
tile in the main tab.
The syntax for running a python script is python path/to/script.py
(or python.exe path/to/script.py
in Windows Command).
python filesurvey.py
After running this command a new CSV named script_output.csv
is saved to the desktop.
Increasing Script Flexibility
By creating a script we automated some of the effort of running our Python code.
But, the current script can only gather data from the pyforav
folder and save a CSV to script_output.csv
.
To make this more flexible, we can accept arguments from the command line.
If you have not used the command line much in the past, commands work a lot like functions in Python.
- The first thing in a command is the tool’s name, e.g.
conda
- After that, there might be subtools, e.g.
conda install
- After that, there might be arguments that will be used by the tool, e.g.
conda install -c conda-forge ffmpeg
If the above example was a line of Python it might look like this: conda.install(c="conda-forge", package="ffmpeg")
.
Command-Line Syntax
On the command line, every tool, subtool, argument, and flag is separated by a space.
Strings typically do not have to be surrounded by quotation marks
"
.
To make it easier to survey other folders and save the output to new files, we will add another module to our filesurvey.py
script, argparse
.
First, we have to import the module at the beginning of our script.
import argparse
Because argparse
is more abstract than other components we have used so far, we will include working code immediately.
Paste this into the script immediately after the import statements.
parser = argparse.ArgumentParser()
parser.description = "survey a directory for AV files and report on technical metadata"
parser.add_argument("-d", "--directory",
required = True,
help = "Path to a directory of AV files")
parser.add_argument("-e", "--extension",
required = True,
help = "Extension of AV file to survey")
parser.add_argument("-o", "--output",
required = True,
help = "Path to the save the metadata as a CSV")
args = parser.parse_args()
print(args.directory, args.extension, args.output)
Save the script.
One of the immediate benefits of using argparse
is that it creates help dialogs for our command-line tool.
Try:
python filesurvey.py -h
According to the help dialog, the script is able to accept a location of our choosing to survey, the type of file to survey, and a filename of our choosing to save the result of the survey.
Using the new arguments
What happens when you run the following command?
python filesurvey.py -d Desktop/pyforav/mkv -e mkv -o Desktop/mkv_files.csv
Why didn’t it create the
mkv_files.csv
? How would you change the script to make that happen?Solution
Using argparse allows the script to accept the arguments, but to use them in our script we need to reference them in the right places.
The print statement at the bottom of the argparse code shows how to reference the data. Our next task is to put those references in the right places.
For the directory to survey, we need to replace the hard-coded path that is assigned to video_dir
with args.directory
. From:
video_dir = '/Users/username/Desktop/pyforav'
To:
video_dir = args.directory
Similarly for the extension, we need to replace the portion of the glob command that hard codes mov
with args.extension
. From:
mov_list = glob.glob(os.path.join(video_dir, '**', '*mov'))
To:
mov_list = glob.glob(os.path.join(video_dir, '**', '*' + args.extension))
Finally, we do the same for the script output with args.output
. From:
with open('/Users/pyforav/Desktop/script_output.csv', 'w') as f:
To:
with open(args.output, 'w') as f:
Try running the script again.
Trapping errors/exceptions and anticipating diverse collections
As we move our code from the niceties of the sample workshop files to the broader pool of material you might see in real life, we also need to look out for exceptions.
What happens we survey non-AV files?
What happens if we run the script we just created on non-AV file?
python filesurvey.py -d Desktop -e txt -o nonav_survey.csv
Solution
An error is printed to the terminal, and no CSV is created.
This is where we can use the programming concept of try
and except
.
First we try one piece of code, and if an error results, we try a different piece of code that should work.
For example, if we survey files that aren’t time-based, they won’t have a track.duration
reported by pymediainfo
.
When we run our previous script,
- our code errors out
- the
for
loop stops - the script doesn’t finish
Instead, we’ll ask the code to try to collect all of the attributes. When the script comes across files that are missing the duration, it will,
- try the first code
- fail because of an exception (
track.duration
doesn’t exist) - try the second code chunk which doesn’t ask for duration and return a None value.
for item in mov_list:
media_info = MediaInfo.parse(item)
for track in media_info.tracks:
if track.track_type == "General":
try:
general_data = [
track.file_name,
track.file_extension,
track.format,
track.file_size,
track.duration]
except:
general_data = [
track.file_name,
track.file_extension,
track.format,
track.file_size,
None]
all_file_data.append(general_data)
By making this effort, we ensure two things:
- that our code will continue to function despite the strange files we throw at it,
- that our MediaInfo data will be consistent from file to file and ready for export.
Key Points
Once you’ve created code you want to use repeatedly, a script is more useful than a Jupyter Notebook
argparse
is the Python module for accepting arguments from the command line
try
andexcept
blocks can be used to run code that may not work in every situation