Let's see RDDs in action! Don't worry - these examples are designed to be crystal clear:
# 🚀 Creating your first RDD - like getting your delivery team ready!
from pyspark import SparkContext
# Start the Spark engine (hire your delivery manager)
sc = SparkContext("local", "Pizza Delivery App")
# Create an RDD from a list (your pizza orders)
pizza_orders = sc.parallelize([
"Margherita", "Pepperoni", "Hawaiian",
"Supreme", "Veggie", "BBQ Chicken"
])
print("📝 Total orders:", pizza_orders.count())
# Output: Total orders: 6
# 🔄 Transformation Example - Preparing special orders
# Add "Premium" to each pizza name (lazy operation)
premium_pizzas = pizza_orders.map(lambda pizza: f"Premium {pizza}")
# Filter only pizzas with "P" (still lazy!)
p_pizzas = premium_pizzas.filter(lambda pizza: pizza.startswith("Premium P"))
# 🚀 Action - Actually execute the plan!
result = p_pizzas.collect()
print("🍕 P-Pizzas:", result)
# Output: ['Premium Pepperoni']
🎯 What Just Happened? We created a plan (transformations) but Spark didn't do any work until we called collect()
(action). It's like planning your entire day but not getting out of bed until you absolutely have to!