As I have said before, one of the best advantages of Python is the huge community and amount of resources that supports it. One of these libraries is NumPy (NUMerical PYthon).
It is one of the main libraries to support scientific work with Python. It brings powerful data structures and implements matrices and multidimensional matrices.
As a short example we can see how to create a 1-dimension structure and a 2-dimensions structure:
import numpy as np
a = np.array([1, 2, 3])
...
b = np.array([(1, 2, 3), (4, 5, 6)])
...
You can find the code example here.
But, why should we use NumPy structures instead of Python structures?
There are a couple of main reasons:
- NumPy arrays consumes less memory than Python lists.
- NumPy arrays are faster in execution terms.
Because you do not need to trust me, let’s play a little bit with the code and run some informal benchmarks.
Let’s start with the memory assumption:
import sys
import numpy as np
s = range(1000)
print(sys.getsizeof(5) * len(s))
...
d = np.arange(1000)
print(d.size * d.itemsize)
You can find the code example here.
This gives us the next result:
Python list:
28000
NumPy array:
8000
As we can see, there is a big difference on the memory consumption.
Now, let’s do the same for the execution time. Again, we are going to write a small code snippet and execute an informal benchmark:
import time
import numpy as np
SIZE = 1_000_000
L1 = range(SIZE)
L2 = range(SIZE)
A1 = np.arange(SIZE)
A2 = np.arange(SIZE)
start = time.time()
result = [(x, y) for x, y in zip(L1, L2)]
print((time.time() - start) * 1000)
...
start = time.time()
result = A1 + A2
print((time.time() - start) * 1000)
You can find the code example here.
This gives us the next result:
Python list:
316.49184226989746
NumPy array:
65.60492515563965
Again, as we can see, the execution time for the NumPy structures is much better.
In addition to the speed and memory improvements, it is worth to point to the different syntax between Python and NumPy when writing the addition operation:
- Python: [(x, y) for x, y in zip(L1, L2)]
- NumPy: A1 + A2
As we can see, the difference is quite big. The second case, even if you know nothing about Python or NumPy, is very easy to understand.
Quick review of the NumPy API
- Creating matrices
- import numpy as np – Import the NumPy dependency.
- np.array() – Creates a matrix.
- np.ones((3, 4)) – Creates a matrix with a one in every position.
- np.zeros((3, 4)) – Creates a matrix with a zero in every position.
- np.random.random((3, 4)) – Creates a matrix with random values in every position.
- np.empty((3, 4)) – Creates an empty matrix.
- np.full((3, 4), 8) – Creates a matrix with a specified value in every position.
- np.arange(0, 30, 5) – Creates a matrix with a distribution of values (from 0 to 30 every 5).
- np.linspace(0, 2, 5) – Creates a matrix with a distribution of values (5 elements from 0 to 2).
- np.eye(4, 4) – Creates an identity matrix.
- np.identity(4) – Creates an identity matrix.
- Inspect matrices
- a.ndim – Matrix dimension.
- a.dtype – Matrix data type.
- a.size – Matrix size.
- a.shape – Matrix shape.
- a.reshape(3, 2) – Change the shape of a matrix.
- a[3, 2] – Select a single element of the matrix.
- a[0:, 2] – Extract the value in the column 2 from every row.
- a.min(), a.max() and a.sum() – Basic operations over the matrix.
- np.sqrt(a) – Square root of the matrix.
- np.std(a) – Standard deviation of the matrix.
- a + b, a – b, a * b and a / b – Basic operations between matrices.
And, this is all. This has been just a quick, very quick, review of the NumPy library. I just recommend you to play around a bit more but, we will use it more in the future.