How to use the Python Data Processing Pipeline

Python Data Processing Pipeline Tutorial

Introduction
Hello and welcome to the tutorial for the Python Data Processing Pipeline (Datapipeline AP). Here, you are going to learn how to quickly develop complex workflows to process your data online and offline. The point of this tutorial to show you step by step how to use the Datapipeline API, show you the core components, and illustrate the core idea. This will enable you to develop your own solutions and understand existing pipelines written for our Ephys Software which is in part built using the Datapipeline API.
At the end of this tutorial you will not need to memorize things, but understand them!

What is a Data Processing Pipeline?

Processing time series data usually involves applying several operations in series.  Those operations often modify the stream of data and the result or output of one operation serves as an input for the following operation. The idea of a Data Processing Pipeline is to provide a collection of simple and reusable building blocks each performing a single operation (cf: Fig2).
The promise of such a collection of building blocks is to quickly develop and prototype complex workflows by mainly focusing on the data and the operations instead of writing code. Thus, the blocks can be seen as block boxes attached to each other.

Before we begin...

In order to follow the tutorial and build your own pipelines, you should have some knowledge of how to develop and run Python code on your own.  It is recommended to have a very basic understanding of Object-Oriented Programming https://en.wikipedia.org/wiki/Object-oriented_programming https://docs.python.org/3/tutorial/classes.html
Furthermore, it is helpful to be familiar with basic concepts of electrophysiology, what a neuron is, and how action potentials are generated and propagated. Last but not least, a bit of knowledge of some statistical concepts is not going to hurt either.
However, I will provide useful links, explanations and cite papers if needed.

Where to find the code?

All the code needed to complete this tutorial is located at:
As of April 28th, 2021. the code, data, and papers can be found on Twinkle at:
/home/colliculus/behaviourBoxes/software/ratCageProgramsV2/DataProcessingPipelineCode
And in Freiburg at: Q:\HNO-T-NeuroBio-EP-Analyse\EphysSoftware

Ephys Programs

There are several Ephys programs available:

1. Ephys_Search.py
This is the go-to program for making sure you have hit the correct stereotaxic location.
2. ABL_Thresholding.py 
This the program to determine the correct ABL values for your given stereotaxic location.
3. ILD_Tuning.py and ITD_Tuning.py
These two programs allow you to sweep through a range of ILD/ITD values to determine the ILD and ITD sensitivity.
4. GRF2018_Acute.py
This program reads in stimulation parameter files provided as CSV files. Examples can be seen in the "stim_parameter_files" which is a subfolder of the aforementioned folder DataProcessingPipelineCode on Twinkle and the aforementioned folder EphysSoftware in Freiburg.

Multiprocessing Versions of the Ephys Programs

Python programs often run a single core of the CPU. But given that modern CPUs have four or more CPU cores, it is desirable to make use of them to improve the run-time performance of the Python code. Using more than one CPU core is also referred to as Multiprocessing.

The multiprocessing versions of the Ephys Programs are named:
Ephys_Search_mp.py
ABL_Thresholding_mp.py
ILD_Tuning_mp.py
and ITD_Tuning_mp.py
GRF2018_Acute_mp.py

Unlike the regular Ephys programs, the multiprocessing versions cannot be run from within Spyder and must instead be run from the console/command line.
So, in order to run them, open the E.g. the Windows PowerShell. The menu entry is usually located in the start menu as a part of your Anaconda installation.
In the PowerShell window navigate to the location of the Ephys Programs and run the Python programs using:
python <name of program>
So E.g. to run the Ephys_Search Program you type:
python Ephys_Search_mp.py
and then hit the enter key.