Module 09 — Pandas and Data Structures#

Graduate MSBA Module Overview

Everything you’ve learned so far — variables, containers, loops, functions, file handling — has been building toward this: working with real datasets at analytical scale. Pandas is Python’s primary data analysis library, and the DataFrame is its central data structure.

A DataFrame is a two-dimensional table of data — rows and columns — that can hold thousands or millions of records and supports a vast range of analytical operations. With Pandas, you can load a dataset, inspect it, filter rows, select columns, aggregate values, sort results, and generate summary statistics — all with clean, readable code.


Course Connections#

The DataFrame embodies all four programming principles in concert: iteration happens automatically across rows and columns, inference drives filtering and selection, abstraction makes complex operations readable and concise, and polymorphism lets the same methods work across different data types and structures.


Quick Code Example#

import pandas as pd

data = {
    'customer_name': ['Alice Johnson', 'Bob Martinez', 'Carol Chen', 'David Kim', 'Eve Torres'],
    'region': ['Northwest', 'Southwest', 'Northwest', 'Southeast', 'Southwest'],
    'total_spent': [1257.30, 430.50, 890.75, 125.00, 1450.00],
    'purchase_count': [12, 4, 9, 2, 15],
    'is_premium': [True, False, True, False, True]
}

df = pd.DataFrame(data)

df['avg_purchase'] = df['total_spent'] / df['purchase_count']
premium_customers = df[df['is_premium'] == True]
regional_summary = df.groupby('region')['total_spent'].agg(['sum', 'mean', 'count'])

print('Premium Customers:')
print(premium_customers[['customer_name', 'total_spent', 'avg_purchase']].to_string(index=False))
print('\nRegional Summary:')
print(regional_summary.round(2))

Expected Output:

Premium Customers:
 customer_name  total_spent  avg_purchase
 Alice Johnson       1257.3    104.775000
    Carol Chen        890.75    98.972222
    Eve Torres       1450.0     96.666667

Regional Summary:
               sum     mean  count
region
Northwest  2148.05  1074.02      2
Southeast   125.00   125.00      1
Southwest  1881.00   940.50      2

Learning Progression#

Platform Student Experience
NotebookLM Explore Pandas through business storytelling that shows how DataFrames transform raw data into analytical insights
Google Colab Create DataFrames, filter and select data, aggregate and summarize, and explore real datasets
Zybooks Structured exercises build fluency with Pandas syntax and data manipulation operations

Module Pages#

  • Concept → — Deep narrative on DataFrames and analytical thinking
  • Advanced → — Extended code with groupby, merge, and apply operations
  • Notebook → — Jupyter notebook lab description