This lesson is being piloted (Beta version)

Creating a Script

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How can I create a Python script to run outside of Jupyter?

  • How can I use command-line arguments?

Objectives
  • Use argparse

  • Transform Jupyter notebook code into a script

During this workshop we’ve used Jupyter Notebooks to try out code. Notebooks are really good for doodling, sketching, testing, etc. We can try multiple approaches to a problem and develop one that we really like.

Notebooks are not as good for production usage. Once we have a solution that we like, it would be more useful to run all the code at once instead of hitting Shift + Enter on every cell in a notebook.

Python Scripts

To move our code into a production mode, we can create a Python script file. A python script file is a text file with Python code in it. You can create a script file with any plain text editor like (Notepad++, Sublime Text, Atom, Notepad). But, you can’t use a document editor like Microsoft Word or Google Docs. These editors re-encode the text into a format that Python can’t interpret.

Creating a Python Script

Anaconda also has a text editor, so we will use that to turn some of our previous file survey code into a script.

To open a text file in Anaconda, click the + button in the left sidebar and click on the Text File tile in the main tab. A text editing interface will open. We will immediately rename this file by going to File > Save File As… and giving it the name filesurvey.py.

Python Extensions

A python script is a text file. It can use anything for an extension as long as the contents are valid Python code. We typically use .py to tell other people that the file contains Python code.

A Python script is code that is read and executed from the top of the script to the bottom. If you are in a Jupyter Notebook and the first cell contains

os.path.join('video_dir', 'project_name')

and second cell contains

import os

You can execute the second cell and then the first cell and everything will work. If you have a Python script, and the code reads

os.path.join('video_dir', 'project_name')
import os

When you try to run this script, it will report an error.

This is a good reason to put all import statements at the top of a script. This ensures that their functions have been loaded when they are used in the script.

Import Statements for filesurvey.py

Open a text editor tab in Jupyter. At the top of the file, add the import statements that you will need to run the file survey loop that we built in the previous lesson.

Solution

We need the following modules.

import os
from pymediainfo import MediaInfo
import csv

With all the necessary import statements are at the top of the file, we can copy-paste in the other portions of the loop. Go cell-by-cell to copy code from your previous notebook and past it into your text file.

A simplified version of the code is provided below is available if you run into any problems. Change any paths so that they reflect your computer’s folders.

video_dir = '/Users/username/Desktop/pyforav'

mov_list = glob.glob(os.path.join(video_dir, '**', '*mov'))

all_file_data = []

for item in mov_list:
    media_info = MediaInfo.parse(item)
    for track in media_info.tracks:
        if track.track_type == "General":
            general_data = [
                track.file_name,
                track.file_extension,
                track.format,
                track.file_size,
                track.duration]
    all_file_data.append(general_data)


with open('/Users/pyforav/Desktop/script_output.csv', 'w') as f:
    md_csv = csv.writer(f)
    md_csv.writerow([
        'filename',
        'extension',
        'format',
        'size',
        'duration'
    ])
    md_csv.writerows(sorted(all_file_data))

After saving this file, we can run it using terminal. For this, we’ll use Anaconda’s built-in terminal since it has Python installed. You could also use your computer’s default terminal like Terminal on MacOS or Command on Windows if you have Python configured for those.

To open a terminal in Anaconda, click the + button in the left sidebar and click on the Terminal tile in the main tab.

The syntax for running a python script is python path/to/script.py (or python.exe path/to/script.py in Windows Command).

python filesurvey.py

After running this command a new CSV named script_output.csv is saved to the desktop.

Increasing Script Flexibility

By creating a script we automated some of the effort of running our Python code. But, the current script can only gather data from the pyforav folder and save a CSV to script_output.csv. To make this more flexible, we can accept arguments from the command line.

If you have not used the command line much in the past, commands work a lot like functions in Python.

If the above example was a line of Python it might look like this: conda.install(c="conda-forge", package="ffmpeg").

Command-Line Syntax

On the command line, every tool, subtool, argument, and flag is separated by a space.

Strings typically do not have to be surrounded by quotation marks ".

To make it easier to survey other folders and save the output to new files, we will add another module to our filesurvey.py script, argparse.

First, we have to import the module at the beginning of our script.

import argparse

Because argparse is more abstract than other components we have used so far, we will include working code immediately. Paste this into the script immediately after the import statements.

parser = argparse.ArgumentParser()
parser.description = "survey a directory for AV files and report on technical metadata"
parser.add_argument("-d", "--directory",
                    required = True,
                    help = "Path to a directory of AV files")
parser.add_argument("-e", "--extension",
                    required = True,
                    help = "Extension of AV file to survey")
parser.add_argument("-o", "--output",
                    required = True,
                    help = "Path to the save the metadata as a CSV")
args = parser.parse_args()

print(args.directory, args.extension, args.output)

Save the script.

One of the immediate benefits of using argparse is that it creates help dialogs for our command-line tool. Try:

python filesurvey.py -h

According to the help dialog, the script is able to accept a location of our choosing to survey, the type of file to survey, and a filename of our choosing to save the result of the survey.

Using the new arguments

What happens when you run the following command?

python filesurvey.py -d Desktop/pyforav/mkv -e mkv -o Desktop/mkv_files.csv

Why didn’t it create the mkv_files.csv? How would you change the script to make that happen?

Solution

Using argparse allows the script to accept the arguments, but to use them in our script we need to reference them in the right places.

The print statement at the bottom of the argparse code shows how to reference the data. Our next task is to put those references in the right places.

For the directory to survey, we need to replace the hard-coded path that is assigned to video_dir with args.directory. From:

video_dir = '/Users/username/Desktop/pyforav'

To:

video_dir = args.directory

Similarly for the extension, we need to replace the portion of the glob command that hard codes mov with args.extension. From:

mov_list = glob.glob(os.path.join(video_dir, '**', '*mov'))

To:

mov_list = glob.glob(os.path.join(video_dir, '**', '*' + args.extension))

Finally, we do the same for the script output with args.output. From:

with open('/Users/pyforav/Desktop/script_output.csv', 'w') as f:

To:

with open(args.output, 'w') as f:

Try running the script again.

Trapping errors/exceptions and anticipating diverse collections

As we move our code from the niceties of the sample workshop files to the broader pool of material you might see in real life, we also need to look out for exceptions.

What happens we survey non-AV files?

What happens if we run the script we just created on non-AV file?

python filesurvey.py -d Desktop -e txt -o nonav_survey.csv

Solution

An error is printed to the terminal, and no CSV is created.

This is where we can use the programming concept of try and except. First we try one piece of code, and if an error results, we try a different piece of code that should work. For example, if we survey files that aren’t time-based, they won’t have a track.duration reported by pymediainfo. When we run our previous script,

  1. our code errors out
  2. the for loop stops
  3. the script doesn’t finish

Instead, we’ll ask the code to try to collect all of the attributes. When the script comes across files that are missing the duration, it will,

  1. try the first code
  2. fail because of an exception (track.duration doesn’t exist)
  3. try the second code chunk which doesn’t ask for duration and return a None value.
for item in mov_list:
    media_info = MediaInfo.parse(item)
    for track in media_info.tracks:
        if track.track_type == "General":
            try:
                general_data = [
                    track.file_name,
                    track.file_extension,
                    track.format,
                    track.file_size,
                    track.duration]
            except:
                general_data = [
                    track.file_name,
                    track.file_extension,
                    track.format,
                    track.file_size,
                    None]
    all_file_data.append(general_data)

By making this effort, we ensure two things:

  1. that our code will continue to function despite the strange files we throw at it,
  2. that our MediaInfo data will be consistent from file to file and ready for export.

Key Points

  • Once you’ve created code you want to use repeatedly, a script is more useful than a Jupyter Notebook

  • argparse is the Python module for accepting arguments from the command line

  • try and except blocks can be used to run code that may not work in every situation