PYTHON 101: INTRODUCTION TO PYTHON FOR DATA ANALYTICS
Python is a versatile and powerful programming language, widely used in data analytics due to its simplicity and the vast ecosystem of libraries tailored for data processing. In this guide, we’ll cover the essentials you need to get started with Python for data analytics, including variables, data types, control structures, functions, and an introduction to NumPy, a fundamental library for numerical computing.
INTRODUCTION TO PYTHON FOR DATA ANALYTICS CONCEPTS
Variables
Variables are containers for storing data values. In Python, you don’t need to declare the type of a variable explicitly, as it is inferred based on the value you assign.
CODE:
age = 30
name = "John"
Enter fullscreen mode Exit fullscreen mode
Data Types
Python has various built-in data types:
Integers: Whole numbers (10, -5)
Floats: Decimal numbers (3.14, -2.5)
Strings: Text data (“Hello”, “123”)
Booleans: True or False values (True, False)
Lists: Ordered, mutable collections of items ([1, 2, 3])
Dictionaries: Key-value pairs ({“name”: “John”, “age”: 30})
CODE:
x = 10
print(type(x))
Enter fullscreen mode Exit fullscreen mode
Lists vs. Tuples
Lists are mutable, meaning you can modify their elements after creation.
CODE:
my_list = [1, 2, 3]
my_list[0] = 4
print(my_list) # The Output is [4, 2, 3]
Enter fullscreen mode Exit fullscreen mode
Tuples are immutable, meaning once they are created, their values cannot be changed.
CODE:
my_tuple = (1, 2, 3)
# my_tuple[0] = 4
print(my_tuple) # The Output is (1, 2, 3)
Enter fullscreen mode Exit fullscreen mode
# my_tuple[0] = 4 would raise the following error
TypeError Traceback (most recent call last)
Cell In[12], line 2
1 my_tuple = (1, 2, 3)
----> 2 my_tuple[0] = 4
3 print(my_tuple) # The Output is (1, 2, 3)
TypeError: 'tuple' object does not support item assignment
Enter fullscreen mode Exit fullscreen mode
Comparison Operators
Comparison operators allow you to compare values:
==: Equal to
!=: Not equal to
: Greater than
<: Less than
CODE:
x = 5
y = 10
print (x > y) # The Output Is False
Enter fullscreen mode Exit fullscreen mode
Logical Operators
Logical operators are used to combine conditional statements:
and: True if both conditions are true
or: True if at least one condition is true
not: Reverses the result (True becomes False)
CODE:
x = 5
y = 10
print(x < 10 and y > 5) # The Output Is True
Enter fullscreen mode Exit fullscreen mode
Membership Operators
Membership operators check if an item is present in a sequence (list, tuple, string):
in: True if the item is found
not in: True if the item is not found
CODE:
my_list = [1, 2, 3]
print(3 in my_list) # The Output Is True
Enter fullscreen mode Exit fullscreen mode
If-Else Statements
Conditional statements allow decision-making:
CODE:
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5") # The Output Is x is less than or equal to 5
Enter fullscreen mode Exit fullscreen mode
For Loops
Loops allow you to iterate over sequences:
CODE:
for i in range(5):
print(i)
Enter fullscreen mode Exit fullscreen mode
Functions
Functions enable code reuse. You define a function using the def keyword
CODE:
def greet(name):
return f"Hello, {name}!"
print(greet("John"))
Enter fullscreen mode Exit fullscreen mode
NUMPY
Python alone is powerful, but for large-scale data analytics and mathematical operations, NumPy is essential. NumPy introduces a high-performance, multi-dimensional array object known as ndarray, which is much more efficient for numerical computations than Python’s built-in lists.
NumPy Arrays vs. Python Lists
- Lists: Flexible, can store mixed data types, but are slower for numerical operations.
CODE:
my_list_1 = [11, 21, 31, 41]
Enter fullscreen mode Exit fullscreen mode
- NumPy Arrays: Homogeneous (all elements are of the same type) and optimized for performance.
CODE:
import numpy as np
my_array = np.array([10, 20, 30, 40])
Enter fullscreen mode Exit fullscreen mode
NumPy arrays are faster and more efficient because they use contiguous memory. Python lists store each element as an independent object in memory, whereas NumPy arrays store data in a block of memory, making it easier and faster to perform operations like matrix multiplication and element-wise arithmetic.
Creating NumPy Arrays
You can create arrays in NumPy using various functions.
CODE:
import numpy as np
# Creating a simple array
arr = np.array([1, 2, 3, 4])
# Creating an array of zeros
zeros = np.zeros(5)
# Creating an array with a range of values
range_arr = np.arange(1, 10, 2)
Enter fullscreen mode Exit fullscreen mode
Operations with NumPy Arrays
NumPy allows you to perform element-wise operations on arrays, which is not as straightforward with Python lists.
CODE:
arr = np.array([1, 2, 3, 4])
arr2 = arr * 2 # Element-wise multiplication
print (arr2) # The Output Is [2 4 6 8]
Enter fullscreen mode Exit fullscreen mode
Memory Efficiency in NumPy
NumPy arrays consume less memory compared to lists because arrays store elements of the same data type, allowing for more compact storage. For instance, a Python list stores references to each item, while a NumPy array stores data directly in contiguous memory locations, making operations faster and more memory-efficient.
Converting Data Types in NumPy
NumPy makes it easy to convert data types for numerical computations.
CODE:
arr = np.array([1.0, 2.0, 3.0])
arr_int = arr.astype(int) # Convert array to integers
Enter fullscreen mode Exit fullscreen mode
Functions in Data Analytics Scripts
In addition to Python’s built-in functions, you will often define custom functions for specific tasks like data cleaning, analysis, and transformation. When working with data analytics, functions help modularize your code and make it reusable across different datasets.
CODE:
def normalize_data(data):
max_value = np.max(data)
min_value = np.min(data)
return (data - min_value) / (max_value - min_value)
# Usage with NumPy array
data = np.array([10, 20, 30, 40, 50])
normalized_data = normalize_data(data)
Enter fullscreen mode Exit fullscreen mode
Final Thoughts
Python, combined with libraries like NumPy, provides a solid foundation for data analytics. Understanding key concepts such as variables, data types, loops, and functions, alongside NumPy’s efficient array manipulation, prepares you to handle large datasets with ease. As you progress, you’ll unlock more sophisticated tools in Python’s data analytics ecosystem, including Pandas for data manipulation and Matplotlib for visualization.
暂无评论内容