1. Python language
1.1 Introduction
Python is a multi-platform, free and open-source programming language, first released in 1991. The current version, Python 3, is one of the most widely used programming languages to date. Python is an interpreted language (as opposed to compiled as C or Fortran), which means one does not need to compile the code before executing it. This allows you to run Python code interactively (you can modify it and immediately run it) through the provided Python interpreter, a command-line tool which can run on-the-fly the code you type in. Python is defined as a high-level, general purpose language and supports object-oriented programming.
An overview of the Python scientific ecosystem is given in section 1.1 of the SciPy Lecture Notes.
1.2 Installation
Python comes in various flavors and can be installed in different ways. However, the easiest way to have Python in your system is to install a scientific distribution such as the Anaconda distribution, which provides a manager for a full set of libraries and software to perform data analysis with Python.
Alternatively, if you want more control on the installed packages and a better optimization you can install the Miniforge distribution.
1.3 Usage
After the installation of Anaconda Python 3 version, you can run the Ananconda Navigator which provides you with the tools to use and setup your Python installation.
Between these tools, JupyterLab is the core application you will use to actually perform your data analysis. It consists of a browser-based user-interface for managing Jupyter Notebooks. A Jupyter Notebook is an interactive-programming browser-interface with a Python interpreter running under the hood.
After opening JupyterLab, in the Launcher main window you can open a new Notebook (or do File > New > Notebook) and start already to type some Python code. In the empty cell that appears, you can try to type:
print("Hello World!")
or simply
7 * 6
and press the Run button ► (the small triangle just above the cell) or just hit Shift+Enter. Congratulations, you just run your first Python code! See the next paragraph for more information on Python programming.
To have an idea on what one can actually do into a Jupyter Notebook, have a look at the Bokeh and Holoviews galleries, which provides some interactive examples.
You can go through the JupyterLab User Guide and the Jupyter Notebook documentation to to get familiar with the user interface.
1.4 Actual programming
Printing Hello World! has been pretty exciting but with Python you can do quite more!
Actually, one does not need at all to be a programmer in order to use Python for data analysis, still, knowing the basis of the language will help you immensely in doing it properly.
The section 1.2 of the SciPy Lecture Notes dedicated to Python provides a very good introduction to the language itself, and initially one should at least be familiar with the sections 1.2.1, 1.2.2 and 1.2.3. It would still be useful to go through the rest, also if you are interested in write some code or automatize some operations. In the latter case, the Spyder development environment, included in Anaconda, could come in handy.
Depending on your inclination, these same basic aspects are covered in more detail in chapters 1 to 5 of the excellent official Python tutorial, while if you are planning to develop code, consider going through the chapters 6 to 10.
1.5 Modules and packages
In the following are reported a few key concepts strictly related to programming, but which have to be clear in order to use Python for data analysis.
When launching a new Notebook (i.e. a new Python interpreter) you are provided by default with just a very basic set of functions, such as e.g. print() seen above or abs():
abs(-7)
which returns the absolute value of a number.
In order to perform something more, you have to load (or more precisely to import) additional packages into the active session. Many of these packages are already present by default, others can be installed. For example, supposing you want to calculate the cosine of π, you would do:
import math
math.cos(math.pi)
This piece of code already illustrates two important aspects in Python:
Importing. Here,
mathis a module, which is included by default in the Python distribution, but needs to be ‘activated’ with theimport mathstatement.Object-oriented programming. The
mathmodule contains several objects, like functions (which are called methods, such ascos()) and variables (which are called attributes, such aspi). These objects are accessed through the dot.notation. Somath.cos()ormath.sin()give the cosine and sine functions, respectively, whilemath.pireturns the π constant.
To better illustrate this let’s try a variant of the importing:
from math import cos, pi
This line of code is pretty self-explaining. In this way, cos() and pi have been made directly available and one can just write:
cos(pi)
with the same result as before.
You can inspect the type of cos and pi objects with the type() function. For example:
type(pi)
will return float, indicating pi is a floating point number.
To summarize, here math is a module, which contains several methods (i.e. functions, as it is cos()) and attributes (i.e. variables, as it is pi).
Similarly, other types of objects in Python can have their own methods and attributes. As an example, the object mydata, which we assume has been properly constructed, can posses, let’s say, the mydata.temperature attribute (which would probably be a float number representing the temperature at which data has been acquired) or the mydata.normalize() method (which, for example, could rescale mydata values, so that the integral under the curve is equal to one).
A collection of modules is called a package. So to give another example, let’s take the convolve method contained in the signal module of the scipy package. To access this function any of this will work:
import scipy
scipy.signal.convolve()
from scipy import signal
signal.convolve()
from scipy.signal import convolve
convolve()
from scipy.signal import convolve as conv
conv()
In the last example, convolve has been imported with the shorthand conv. This is an useful and extensively used practice, especially when you need to use the same object several times.
The same concept of importing applies similarly to Python scripts: simple text files, you may have written by yourself, typically with ‘.py’ extension, and containing custom definitions of functions or other objects you want to reuse.
To have an insight into scripts and modules, check the section 1.2.5 of SciPy lectures and chapter 6 of the Python tutorial.