🛠️ Creating a Pool via Databricks CLI
# Install Databricks CLI first
pip install databricks-cli
# Configure authentication
databricks configure --token
# Create a pool configuration file: my-pool.json
{
"pool_name": "my-awesome-data-pool",
"min_idle_instances": 2,
"max_capacity": 10,
"node_type_id": "i3.xlarge",
"idle_instance_autotermination_minutes": 60,
"enable_elastic_disk": true,
"disk_spec": {
"disk_type": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD"
},
"disk_size": 100
},
"custom_tags": {
"team": "data-science",
"environment": "development",
"project": "customer-analytics"
}
}
# Create the pool
databricks instance-pools create --json-file my-pool.json
🐍 Python API Example
from databricks_api import DatabricksAPI
# Initialize API client
db = DatabricksAPI(
host='https://your-workspace.cloud.databricks.com',
token='your-access-token'
)
# Create pool configuration
pool_config = {
"pool_name": "student-learning-pool",
"min_idle_instances": 1,
"max_capacity": 5,
"node_type_id": "i3.large",
"idle_instance_autotermination_minutes": 30,
"custom_tags": {
"purpose": "learning",
"created_by": "nishant_chandravanshi"
}
}
# Create the pool
pool = db.instance_pools.create(pool_config)
print(f"Pool created with ID: {pool['instance_pool_id']}")
# Create a cluster using the pool
cluster_config = {
"cluster_name": "my-pool-cluster",
"spark_version": "11.3.x-scala2.12",
"instance_pool_id": pool['instance_pool_id'],
"num_workers": 2,
"autotermination_minutes": 60
}
cluster = db.clusters.create(cluster_config)
print(f"Cluster created: {cluster['cluster_id']}")
🎯 Pro Tips for Beginners:
- Start Small: Begin with min_idle_instances = 1 for learning
- Use Auto-termination: Set 30-60 minutes to avoid unnecessary costs
- Tag Everything: Always add custom tags for easy tracking
- Monitor Usage: Check your pool metrics regularly