This lesson is being piloted (Beta version)

Python for AV

Python is often referred to as the “swiss army knife” of computer programming languages, capable of meeting the needs of nearly any task with its characteristic ease-of-use, simplicity, and efficiency. In this half-day workshop, participants will be introduced to all the ways Python can have a transformative effect on media digitization workflows.

Designed for archivists working with media collections of any size, this will be a practical and skills-oriented effort, combining a thorough grounding in Python basics with an introduction to a number of tools that allow for the manipulation and analysis of media files. Specific modules/libraries/tools will include: the os, glob, re, subprocess, shutil, csv and json modules; FFmpeg; pymediainfo/MediaConch; pandas; matplotlib; and seaborn.


Workshop participants must bring their own laptops to the workshop. Laptops should have at least 4GB of RAM, 10GB of available drive space, and Windows 8+ or MacOS 10.10+

Familiarity with the following programming concepts is helpful:

  • Variables
  • for loops
  • if/then statements

Familiarity with the following AV file concepts is recommended:

  • container formats
  • codec
  • transcoding
  • technical metadata


Setup Download files required for the lesson
00:00 1. What is Python? What is Python?
What is the command line?
Why should I use it?
00:30 2. Running and Quitting How can I run Python code?
00:45 3. Transcoding One File How can I use Python to transcode a file
01:15 4. Generating a List of Files How can I use Python to find different types of files?
01:45 5. 1st Break Break
01:55 6. Moving and Transcoding Files How can I use Python to move files from one place to another?
How can I use Python to run ffmpeg?
02:15 7. Lossless Compression and Codec Decisions How can I use Python to losslessly compress files?
How can I use Python to make decisions based upon codec?
02:35 8. 2nd Break Break
02:45 9. Finding Out About Files How can I use Python and pymediainfo to learn more about media files?
03:15 10. Finding Out About AV Files How can I use Python to collect more technical metadata?
How can I use Python to create CSVs?
03:35 11. Checksums How can I use Python to generate checksums?
How can I use Python to validate bags?
03:35 12. Creating a Script How can I create a Python script to run outside of Jupyter?
How can I use command-line arguments?
04:00 13. Data Analysis and Visualization (Bonus) How can I use Python/pandas to analyze data gathered from/about my media collection?
How can I use data visualization tools like matplotlib or seaborn to present my analysis in a clear and digestible way?
04:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.