NumPy: Zero to Hero
GitHub: https://github.com/hasan-firat-data-and-business-analyst/Data-Science-Python/blob/main/NumPy.py
What is NumPy ?
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Data Structures


The main data structure in NumPy is the ndarray, which is a shorthand name for N-dimensional array. When working with NumPy, data in an ndarray is simply referred to as an array. It is a fixed-sized array in memory that contains data of the same type, such as integers or floating point values.
Data Types in Python
By default Python have these data types:
- strings — used to represent text data, the text is given under quote marks. e.g. “ABCD”
- integer — used to represent integer numbers. e.g. -1, -2, -3
- float — used to represent real numbers. e.g. 1.2, 42.42
- boolean — used to represent True or False.
- complex — used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j
Data Types in NumPy
NumPy has some extra data types, and refer to data types with one character, like i for integers, u for unsigned integers etc.
Below is a list of all data types in NumPy and the characters used to represent them.
- i — integer
- b — boolean
- u — unsigned integer
- f — float
- c — complex float
- m — timedelta
- M — datetime
- O — object
- S — string
- U — unicode string
- V — fixed chunk of memory for other type ( void )
The most common way to work with numbers in NumPy is through ndarray objects. They are similar to Python lists, but can have any number of dimensions. Also, ndarray
supports fast math operations.
Since it can store any number of dimensions, one can use ndarray
s to represent any of the data types : scalars, vectors, matrices, or tensors. ( More information will be given in the “ Array’s Dimension “ )
Install of NumPy
Using the command
pip install numpy
Using the panel
Import numpy as np
Note: the part “ as np “ is kind of alias, and “ as “ is keyword for using the alias. You are able to use whatever you want after the keyword( “as”), such as “ import numpy as datascience “ or “ import numpy as material_data_science”, for example.
Visualisation for deep understanding the form of array

Creating the arrays
The array object in NumPy is called ndarray
We are able to create a NumPy ndarray
object by using the array()
function.

type(): “ type() “ is built-in function in Python and it gives the type of the object. In the above example, as we can see that array (the folder name) is “ numpy.ndarray “ type.
The second way to create the arrays

What is the difference between the first and the second example?
I guess, there is no difference as a result. But, the first one we used “ print () “ function, such as print(array), and we got
“ [1,2,3,4] “ as a result. In the second example, we did not use “ print() “ function and we got “ array([11,22,33,44,55]) “ as a result. The examples are extended for “ type() “ and “ print(type()) “.If you change the function for the both examples, you are able to understand what I said.
Array’s Dimension
A dimension in arrays is the levels of array depth also know as nested arrays, which is the arrays having the arrays as their factors or elements.
0 — D Arrays or Scalar
The factor in an array.

1 — D Arrays or Uni-Dimensional
An array, that have arrays, such as 0 — D Arrays example, is called 1 — D arrays.

2 — D Arrays
An array having 1 — D arrays as its factor is called 2 — D arrays. This is common in matrix calculations.

Do not forget!!! The number of factors or elements in each arrays must be equal. In above example, each arrays have 4 factors.
3 — D Arrays
An array having 2 — D arrays as its factor or elements is called 3 — D array.

The second example

As we may see that, you do not have to use only numbers in arrays.
How to find the number of dimensions ?

Shape Function in NumPy

In shape method,
0 — D does not have any columns or rows, that’s why it is “ () “
1 — D has one row having 5 factors, and it is “ (5,) “
2 — D has two rows and each rows have 4 elements, and it is “ (2,4) “
3 — D has 3 different clusters and each clusters have 2 rows and each rows have 5 elements or factors, that’s why it is “ (3,2,4)
More information in shape function

In the second example above at index-4 we have value 4, so we can say that 5th ( 4 + 1 th) dimension has 4 elements.
Reshape
Reshaping means changing the shape of an array. The shape of an array is the number of elements in each dimension. By reshaping we can add or remove dimensions or change number of elements in each dimension.
1-D to 2-D

1-D to 3-D

Size function in NumPy

This function determines the number of values in arrays.
N — Dimensional Arrays
An array may have any number of dimensions, and you are able to use “ ndmin “ function for getting the number of dimensions.

In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element that is the vector, the 3rd dim has 1 element that is the matrix with the vector, the 2nd dim has 1 element that is 3D array and 1st dim has 1 element that is a 4D array.
Indexing Arrays
Access Array Elements
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

Mathematical Operations

When we try to take mathematical action with “ indexing “, we fail to take it action. Because “ indexing “ is not int, float or Boolean, that’s why we fail to use mathematical operations including “ mathematical addition” . As you can think that, we did not fail to do addition in the below table. Frankly speaking, this not “ mathematical addition “, this is kind of assembly or combination. If you check the examples below out, carefully. You are going to understand totally, what I mentioned.

2-D Arrays

3-D Arrays


Negative Indexing

Slicing Arrays
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end].
We can also define the step, like this: [start:end:step].
If we don’t pass start its considered 0
If we don’t pass end its considered length of array in that dimension
If we don’t pass step its considered 1

Negative Slicing

STEP
Use the step value to determine the step of the slicing.

Table 1. Return every other element from index 1 to index 0. In the first example, the change of the number is 2 between 1 and 5, in the second one, the change of the number is 3 between 1 and 8.

Slicing multi-dimension



Checking the Data Type of an Array

Creating Arrays With a Defined Data Type


The below example, we fail to use both number and string in the same code block.

The last example demonstrates the changing of the types, such as float to integer or Boolean to float for example and the examples can be extended.
Copy and View
The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.
Copy

The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.
View

The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.
Status of data in arrays
Every NumPy array has the attribute base that returns None if the array owns the data. Otherwise, the base attribute refers to the original object.

The copy returns None
. The view returns the original array.
Joining
We pass a sequence of arrays that we want to join to the concatenate()
function, along with the axis. If axis is not explicitly passed, it is taken as 0.

Joining with “ Stack “ function
Stacking is same as concatenation, the only difference is that stacking is done along a new axis.


Splitting
Splitting is reverse operation of Joining. We use array_split()
for splitting arrays.

Searching
You can search an array for a certain value, and return the indexes that get a match. To search an array, use the where() method.

Search Sorted and Search from right side
There is a method called searchsorted()
which performs a binary search in the array, and returns the index where the specified value would be inserted to maintain the search order.

Example explained: The number 7 should be inserted on index 1 to remain the sort order.
The method starts the search from the left and returns the first index where the number 7 is no longer larger than the next value.
By default the left most index is returned, but we can give side=’right’ to return the right most index instead.
As you can see that, the system starts to search the exact value on the right side.
Find
The find() function finds the substring in a given array of string, between the provided range [start, end] returning the first index from where the substring starts.
This function calls str.find function internally in an element-wise manner.

Sort
Sorting means putting elements in an ordered sequence.
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.
The NumPy ndarray object has a function called sort(), that will sort a specified array.



Filtering
Getting some elements out of an existing array and creating a new array out of them is called filtering.
In NumPy, you filter an array using a boolean index list.
A boolean index list is a list of booleans corresponding to indexes in the array.
If the value at an index is True
that element is contained in the filtered array, if the value at that index is False
that element is excluded from the filtered array.

The example above will return [11, 22, 77], and [11, 33, 77, ‘ Toronto’] why?
Because the new filter contains only the values where the filter array had the value True, in this case, index 0, 2, and 4, and 0, 2, 4, and 5 .
Creating filter with a condition


Zeros

Ones

Arange
numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)
Return evenly spaced values within a given interval.
Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

Linspace
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)[source]
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval [start, stop].
The endpoint of the interval can optionally be excluded.
Changed in version 1.16.0: Non-scalar start and stop are now supported.
Changed in version 1.20.0: Values are rounded towards -inf instead of 0 when an integer dtype is specified. The old behavior can still be obtained with np.linspace(start, stop, num).astype(int)
numint, optional
Number of samples to generate. Default is 50. Must be non-negative.
endpointbool, optional
If True, stop is the last sample. Otherwise, it is not included. Default is True.
retstepbool, optional
If True, return (samples, step), where step is the spacing between samples.
-> start : [optional] start of interval range. By default start = 0
-> stop : end of interval range
-> restep : If True, return (samples, step). By deflut restep = False
-> num : [int, optional] No. of samples to generate
- > dtype : type of output array

What is the difference between Arange and Linspace
arange is Similar to linspace, but uses a step size (instead of the number of samples).

Arange() return values with in a range which has a space between values.
linspace() return set of samples with in a given interval.
In sum, arange allow you to define the size of the step. linspace allow you to define the number of steps.
Random
The Generator provides access to a wide range of distributions, and served as a replacement for RandomState. The main difference between the two is that Generator relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions. The default BitGenerator used by Generator is PCG64. The BitGenerator can be changed by passing an instantized BitGenerator to Generator.
numpy.random.default_rng()¶
Random Sample
random.rand(d0, d1, …, dn)¶ ==== Random values in a given shape.

Generate Random Floats
The random.random()
method returns a random float number between 0.0 to 1.0. The function doesn't need any arguments.

Generate Random Integers
The random.randint()
method returns a random integer between the specified integers.

Generate Random Numbers within Range
The random.randrange()
method returns a randomly selected element from the range created by the start, stop and step arguments. The value of start is 0 by default. Similarly, the value of step is 1 by default.

Select Random Elements
The random.choice()
method returns a randomly selected element from a non-empty sequence. An empty sequence as argument raises an IndexError.


Shuffle Elements Randomly
The random.shuffle()
method randomly reorders the elements in a list.

Full
numpy.full(shape, fill_value, dtype = None, order = ‘C’) : Return a new array with the same shape and type as a given array filled with a fill_value.
Parameters :
shape : Number of rows
order : C_contiguous or F_contiguous
dtype : [optional, float(by Default)] Data type of returned array.
fill_value : [bool, optional] Value to fill in the array.

Empty
Return a new array of given shape and type, without initializing entries.
empty, unlike zeros, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution.

What is difference between Array and List
Array can store elements of only one data type but List can store the elements of different data types too. Hence, Array stores homogeneous data values, and the list can store heterogeneous data values
What is the difference between ndarray and array in numpy?
1- using array()
, zeros()
or empty()
methods: Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)
) for instantiating an array.
2- from ndarray
class directly: There are two modes of creating an array using __new__
: If buffer is None, then only shape, dtype, and order are used. If buffer is an object exposing the buffer interface, then all keywords are interpreted.