Learn Python Series (#11) - NumPy Part 1
Learn Python Series (#11) - NumPy Part 1
What Will I Learn?
- You will learn how to import NumPy,
- what an
ndarray object
is, and why it is so useful, powerful and fast, - how to generate number sequences.
- about a few other useful NumPy methods,
- about Numpy's "basic"
Array Attributes
Requirements
- A working modern computer running macOS, Windows or Ubuntu
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
- The ambition to learn Python programming
- An installed version of NumPy in your Python (virtual) environment. In case you are using Anaconda, NumPy is installed by default. If it's not installed, just do so via
pip install numpy
from the command line.
Difficulty
Intermediate
Curriculum (of the Learn Python Series
):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
Learn Python Series (#11) - NumPy Part 1
As part of this Learn Python Series, NumPy must be included to my perception. NumPy is a package for numerical computation and includes support for multi-dimensional arrays and matrices, and mathematical (high-level) functions to perform operations to those arrays. NumPy allows for fast numerical computing in Python where the "standard" Python bytecode interpreter isn't initially designed for numerical computing. Using NumPy, well-written Python code running mathematical algorithms and lots of data, isn't slow at all!
Because NumPy serves as a fundamental package for scientific Python computing, on top of which multiple other scientific packages are built even, NumPy is mostly used by data scientists having in-depth scientific backgrounds. And therefore, presumably, not many easy to get into NumPy tutorials exist. However, I argue that NumPy can also be used as a default toolkit for non-scientific Python programmers, even beginners. This NumPy tutorial sub-series hopes to onboard Python programmers from any background or level.
NumPy's Core: The ndarray object
The ndarray object is the core of NumPy: n-dimensional arrays, holding the same ("homogeneous") sorts of data to which various math operations can be performed efficiently. This is different to standard Python lists because NumPy arrays are fixed size, hold elements of the same data type, and function element-wise by default, hence not needing for
loops per element.
For example, let's assume two lists a
and b
of equal length, all holding integers, from which we want to create a new list c
in which every element of a
and b
is multiplied:
# Standard Python way
a = [1,2,3,4]
b = [5,6,7,8]
c = []
for i in range(len(a)):
c.append(a[i] * b[i])
print(type(c), c)
# <class 'list'> [5, 12, 21, 32]
<class 'list'> [5, 12, 21, 32]
# NumPy way
import numpy as np
a = np.array(a)
b = np.array(b)
c = a*b
print(type(c), c)
# <class 'numpy.ndarray'> [ 5 12 21 32]
<class 'numpy.ndarray'> [ 5 12 21 32]
Explanation:
In the 'Standard Python way' example, first the list [1,2,3,4]
was assigned to variable a
and [5,6,7,8]
to variable b
, after which another empty list was assigned to variable c
, to initialize c
. Next a for
loop was needed in which every element of both a
and b
were fetched by index number i
, multiplied, and its multiplication result was appended to (the initially empty) list c
.
In the 'NumPy way' example, first the NumPy package was imported as np
in order to use it, and then a
and b
were both "converted" from a list to a 1-dimensional NumPy array. And as a result, because in NumPy element-by-element operations are the default, no for
loop was needed to let c
hold the multiplication results. This is called vectorization, where explicit looping is absent and mathematical operations (in this case a simple multiplication) was performed "under the NumPy hood".
The creation of number sequences
NumPy has multiple built-in methods to create sequences of values, which we can then further manipulate.
- There's for example the NumPy function
arange()
that, unlikerange()
does in standard Python, returns evenly-spaced arrays (not lists).
Usage: numpy.arange([start, ]stop, [step, ]dtype=None)
Examples:
# Using 1 argument: as stop
arr_1 = np.arange(10)
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Using 2 arguments: start & stop
arr_2 = np.arange(5, 10)
# array([5, 6, 7, 8, 9])
# Using 3 arguments: start, stop,
# and a step-incrementor, which can be a float!
arr_3 = np.arange(0, 5, 0.8)
# array([0. , 0.8, 1.6, 2.4, 3.2, 4. , 4.8])
- Another NumPy sequence creator is the
linspace()
function. Instead of specifying the steps (likearange()
expects),linspace()
via itsnum
keyword argument (kwarg) expects the number of elements you want to create. It by default includes the endpointstop
argument in the created array.
Usage: numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
Examples:
# Create 5 evenly-spaced array elements (with step == 2),
# by not including the stop value (= endpoint) 10
arr_4 = np.linspace(0, 10, num=5, endpoint=False, dtype=int)
# array([0, 2, 4, 6, 8])
# ... or create 6 evenly-spaced array elements (with step == 2),
# by including the stop value (= 10) as last array element
arr_5 = np.linspace(0, 10, num=6, dtype=int)
# array([ 0, 2, 4, 6, 8, 10])
# If the the `num` argument is unset, by default 50 array elements
# will be created
arr_6 = np.linspace(0, 1, endpoint=False)
# array([0. , 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18,
# 0.2, 0.22, 0.24, 0.26, 0.28, 0.3, 0.32, 0.34, 0.36, 0.38,
# 0.4, 0.42, 0.44, 0.46, 0.48, 0.5, 0.52, 0.54, 0.56, 0.58,
# 0.6, 0.62, 0.64, 0.66, 0.68, 0.7, 0.72, 0.74, 0.76, 0.78,
# 0.8, 0.82, 0.84, 0.86, 0.88, 0.9, 0.92, 0.94, 0.96, 0.98])
- A third function (from many more) to create a sequence of numbers is
random.random()
, provided an "array size" is set. If ommitted, only one value of type float is returned, else a NumPy array of floats, all between 0.0 and 1.0.
Usage: numpy.random.random(size=None)
Examples:
# Create a single random float
x = np.random.random()
# 0.6022344994122718
# Create a 1-dimensional array with 5 elements
# PS: note the comma after the 5
arr_7 = np.random.random((5,))
# array([0.45631267, 0.08919399, 0.76948001, 0.14375291, 0.02052383])
# Create a 2-dimensional array with 6 elements (size == 3*2 == 6)
arr_8 = np.random.random((3,2))
# array([[0.0379596 , 0.89298785],
# [0.03927935, 0.96021587],
# [0.38208804, 0.21292953]])
NumPy's "Basic" Array Attributes
If you had to think twice to wrap your head around the creation of the last 2-dimensional (arr_8
) example, then don't worry. Because NumPy handles N-dimensional arrays, I wanted to briefly touch upon the concept of 2-dimensional arrays, already in this Part 1 of the NumPy sub-series. But understanding multi-dimensional arrays, let alone being able to write eloquent code using them, can be tough. It's tough for me as well to explain what the attributes (properties, characteristics) of multi-dimensional arrays are. But let me try nonetheless....
PS: for the early next parts of the NumPy sub-series I'll try only to use 1-dimensional arrays. So even if you don't really understand the following attribute explanation, you can probably still follow along the other NumPy topics I will be covering.
- In NumPy terminology, dimensions are also called axes,
- the number of axes (= dimensions) a NumPy array has, is called its rank, for example a 3-dimensional array has a rank of 3,
- by defining the shape of an array, you define the array's dimensions, where the size of each dimension is also called the axis length,
- and the length of the shape tuple is again the array rank.
Examples:
# Let's create a 2-dimensional array, <= so rank 2
# which holds 3 (three) 1-dimensional arrays <= axis_1 has length 3
# each holding 4 (four) integer elements <= axis_2 has length 4
arr_9 = np.arange(1, 13).reshape(3,4) # <= this is a tuple, the shape tuple
print(arr_9)
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
arr_9.ndim
# 2 <= indeed, this array has 2 dimensions
2
arr_9.shape
# (3, 4) <= that's 2 numbers, so rank 2, a 2-dimensional array
(3, 4)
arr_9.size
# 12 <= the array size is 3*4 == 12 elements in total
12
Test questions
Let's define a NumPy array holding a 3D-coordinate, like so:
coord = np.array([1,2,3])
Do you now know the attribute values of this 3D-coordinate?
- what's the rank, the number of dimensions, of this array?
- what's the shape of the array? Or in other words, what are the lengths of the axes?
- what's the array size?
Let's find out, together!
coord.ndim
# 1 <= even though the array is holding a 3D-coordinate, the array itself is of rank 1,
# it just has one dimension!
1
coord.shape
# (3,) <= there's just 1 dimension, 1 axis, with length 3
(3,)
coord.size
# 3 <= in total there are 3 elements stored in the array
3
What's covered in the next tutorials?
Now that we know the NumPy library exists, that it's used for numerical Python computing, that it uses ndarray objects, which allow for vectorization, how we generate value sequences, and what the attributes / properties of N-dimensional arrays are... in the next tutorial part we can cover some of the NumPy operations, explore some "universal functions" (well-known mathematical functions you are, or maybe were in school / university (?), already familiar to!
However, in the next Learn Python Series
we'll first be focusing on some more built-in modules, to handle files (in general), and CSV and JSON more specifically, as well as using the popular external Requests: http for humans
library, to fetch data from the web. Also, we'll go over using BeautifulSoup
to parse HTML files.
If we combine our Python knowledge regarding strings, lists, dictionaries, tuples, Matplotlib, NumPy, CSV, JSON, fetching web data via Requests, parsing HTML via BeautifulSoup, and reading from and saving to (our own) files, we can do lots of very useful things already! Stay tuned for the following episodes of the Learn Python Series
!
Thank you for your time!
Posted on Utopian.io - Rewarding Open Source Contributors
So we can write tutorials like this and it gets approved by utopian ?
Is there something wrong with my
Learn Python Series
? What would you like to see improved on this NumPy intro episode ?No No, don't take it wrong. Your post is perfectly fine. I was just asking, if writing tutorials like this gets approved on any open source.
As I see, it just needs personal touch. That's all that matters ?
And I can see, your series is getting pretty good upvotes, so I will try to find out something.
As long as its about an open source topic it will be approved. However, you will also need to follow the rules of Utopian and this includes not making tutorials that are on basic concepts (like looping, variables etc).
Thank you for the contribution. It has been approved.
I love NumPy, the vectorisation you talked about has saved the performance of many of my projects at university...
You can contact us on Discord.
[utopian-moderator]
Hey @amosbastian, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!
thanks for sharing python series.
Thanks for your every educational blog @scipio
very informative post......
very informative post......
Hey @scipio I am @utopian-io. I have just upvoted you!
Achievements
Suggestions
Get Noticed!
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Numpy is great. I don't use python much anymore but I still use it for Numpy, Pandas, Scikitlearn and Tensorflow.