Module 09 — Pandas and Data Structures#
Graduate MSBA Module Overview
Everything you’ve learned so far — variables, containers, loops, functions, file handling — has been building toward this: working with real datasets at analytical scale. Pandas is Python’s primary data analysis library, and the DataFrame is its central data structure.
A DataFrame is a two-dimensional table of data — rows and columns — that can hold thousands or millions of records and supports a vast range of analytical operations. With Pandas, you can load a dataset, inspect it, filter rows, select columns, aggregate values, sort results, and generate summary statistics — all with clean, readable code.
Course Connections#
The DataFrame embodies all four programming principles in concert: iteration happens automatically across rows and columns, inference drives filtering and selection, abstraction makes complex operations readable and concise, and polymorphism lets the same methods work across different data types and structures.
Quick Code Example#
import pandas as pd
data = {
'customer_name': ['Alice Johnson', 'Bob Martinez', 'Carol Chen', 'David Kim', 'Eve Torres'],
'region': ['Northwest', 'Southwest', 'Northwest', 'Southeast', 'Southwest'],
'total_spent': [1257.30, 430.50, 890.75, 125.00, 1450.00],
'purchase_count': [12, 4, 9, 2, 15],
'is_premium': [True, False, True, False, True]
}
df = pd.DataFrame(data)
df['avg_purchase'] = df['total_spent'] / df['purchase_count']
premium_customers = df[df['is_premium'] == True]
regional_summary = df.groupby('region')['total_spent'].agg(['sum', 'mean', 'count'])
print('Premium Customers:')
print(premium_customers[['customer_name', 'total_spent', 'avg_purchase']].to_string(index=False))
print('\nRegional Summary:')
print(regional_summary.round(2))Expected Output:
Premium Customers:
customer_name total_spent avg_purchase
Alice Johnson 1257.3 104.775000
Carol Chen 890.75 98.972222
Eve Torres 1450.0 96.666667
Regional Summary:
sum mean count
region
Northwest 2148.05 1074.02 2
Southeast 125.00 125.00 1
Southwest 1881.00 940.50 2Learning Progression#
| Platform | Student Experience |
|---|---|
| NotebookLM | Explore Pandas through business storytelling that shows how DataFrames transform raw data into analytical insights |
| Google Colab | Create DataFrames, filter and select data, aggregate and summarize, and explore real datasets |
| Zybooks | Structured exercises build fluency with Pandas syntax and data manipulation operations |
Module Pages#
- Concept → — Deep narrative on DataFrames and analytical thinking
- Advanced → — Extended code with groupby, merge, and apply operations
- Notebook → — Jupyter notebook lab description