LATEST NEWS

DataBank and Goodman Group Partner to Open Los Angeles Data Center. Read the press release.

Colocation for AI Workloads: Power, Cooling & GPU Density Requirements Explained
  • DataBank
  • Resources
  • Blog
  • Colocation for AI Workloads: Power, Cooling & GPU Density Requirements Explained
Colocation for AI Workloads: Power, Cooling & GPU Density Requirements Explained

Colocation for AI Workloads: Power, Cooling & GPU Density Requirements Explained

  • Updated on May 18, 2026
  • /
  • 11 min read

Summarize with:

read in < 1 min

Executive Summary

Artificial intelligence and machine learning workloads are fundamentally different from traditional enterprise applications. Training large language models, running inference at scale, and processing massive datasets require infrastructure that most data centers and cloud deployments cannot provide efficiently.

The numbers tell the story: A single NVIDIA H100 GPU can consume 700 watts. A standard AI training cluster with 256 GPUs requires 180+ kilowatts of power and generates heat that would overwhelm conventional cooling systems. Meanwhile, organizations running these workloads in public cloud face costs exceeding $3-5 million annually, while colocation can deliver for a fraction of that expense.

This comprehensive guide explains exactly what AI workloads demand from infrastructure, why traditional data centers fall short, and how purpose-built colocation environments solve the power, cooling, and density challenges that make or break AI initiatives.

Understanding AI Infrastructure Requirements

The AI Workload Revolution

AI workloads differ fundamentally from traditional enterprise applications in three critical ways:

Computational Intensity: Training a modern large language model requires petaflops of computing power sustained over weeks or months. Inference serving processes millions of requests requiring immediate responses.

Data Movement: AI applications constantly move massive datasets between storage, memory, and processors. A single training run might process petabytes of data.

Resource Concentration: Unlike distributed web applications, AI workloads concentrate enormous compute density in small physical spaces, often 10-50x the power density of traditional servers.

These characteristics create infrastructure demands that expose the limitations of both traditional data centers and public cloud platforms.

Why Cloud Fails for AI at Scale

Public cloud providers market themselves as ideal for AI workloads. The reality is more complex:

Cost Structure Breakdown: A typical AI training cluster with 8 NVIDIA H100 GPUs costs approximately $30-50 per hour in major cloud platforms. Running continuously, that’s $260,000-$440,000 annually. Scale to realistic production requirements of 64+ GPUs, and annual costs easily exceed $2-3 million.

Performance Tax: Cloud virtualization adds overhead that matters enormously for AI. GPU passthrough, network latency, and storage I/O limitations reduce effective performance by 15-30% compared to bare-metal deployments.

Availability Constraints: GPU instances face constant availability issues. Organizations report waiting days or weeks to access required capacity, disrupting research timelines and production deployments.

Data Egress Economics: Training data and model updates generate massive data movement. Cloud egress fees, often costing $0.08-$0.12 per gigabyte, add tens of thousands in unexpected costs monthly.

These factors drive sophisticated AI organizations toward colocation-based infrastructure where they control costs, performance, and availability.

Power Requirements for AI Infrastructure

Understanding GPU Power Consumption

Modern AI accelerators consume dramatically more power than traditional servers:

NVIDIA H100: 700W per GPU NVIDIA A100: 400W per GPU
AMD MI300X: 750W per GPU Google TPU v5: 450W per chip

A single 42U rack populated with 8 GPU servers (4 GPUs each) can easily exceed 25-30 kilowatts, which is 5-10x typical server rack power draw.

Power Density Challenges

Traditional data centers are designed for 5-8 kilowatts per rack. AI infrastructure routinely requires:

Standard AI Deployment: 15-25 kW per rack High-Density AI: 30-50 kW per rack Extreme Density: 60-100+ kW per rack (liquid-cooled systems)

Most existing facilities cannot support these densities without major electrical infrastructure upgrades costing millions and taking months to complete.

Power Distribution Architecture

AI infrastructure requires robust electrical distribution:

Redundant Power Feeds: N+1 or 2N redundancy ensures uptime during maintenance or failures

High-Voltage Distribution: 415V or 480V reduces conductor size and improves efficiency

Intelligent PDUs: Real-time monitoring and remote switching capability

Busway Systems: Flexible power distribution supporting changing rack configurations

Calculating Your Power Requirements

Step 1: Determine GPU quantity and model 

Step 2: Add server infrastructure power (motherboard, CPU, memory, storage) 

Step 3: Include networking equipment (switches typically 500-1000W each) 

Step 4: Apply power supply efficiency factor (typically 90-95%) 

Step 5: Add 20% headroom for growth and redundancy

Example Calculation:

  • 32 NVIDIA H100 GPUs = 22,400W
  • 8 servers with dual CPUs, memory, storage = 4,800W
  • Network switching = 2,000W
  • Subtotal = 29,200W
  • Efficiency factor (÷0.92) = 31,739W
  • 20% headroom = 38,087W (38 kW)

This cluster requires sustained 38 kW capacity, which is impossible in most traditional colocation environments.

Cooling Requirements for AI Workloads

The Heat Generation Problem

Every watt of power consumed generates heat that must be removed. High-density AI infrastructure generates concentrated heat loads that overwhelm traditional cooling approaches.

Traditional Air Cooling Limits: Conventional raised-floor cooling works up to approximately 15-20 kW per rack. Beyond this threshold, hot spots develop even with containment systems.

The Physics Challenge: Air has limited heat capacity. Moving enough air to cool 30+ kW racks requires massive airflow, creating noise, turbulence, and inefficiency.

Cooling Technology Options

1. Optimized Air Cooling (Up to 25 kW/rack)

Enhanced air cooling with hot/cold aisle containment, in-row cooling units, and optimized airflow can support moderate AI density:

  • In-Row Cooling: Supplemental cooling units positioned within server rows
  • Rear Door Heat Exchangers: Cooling coils attached to rack doors intercept exhaust air
  • Raised Floor Optimization: Directed airflow with perforated tiles positioned precisely

Advantages: Familiar technology, lower upfront cost

Limitations: Maximum ~25 kW per rack, higher operational costs, noise

2. Direct-to-Chip Liquid Cooling (30-60 kW/rack)

Cold plates mounted directly on processors transfer heat to circulating liquid:

  • Coolant Distribution Units (CDUs): Convert facility chilled water to coolant compatible with server components
  • Quick-Disconnect Fittings: Enable server maintenance without draining systems
  • Dual-Loop Systems: Separate coolant and facility water for reliability

Advantages: Supports extreme density, quieter operation, improved energy efficiency

Limitations: Higher complexity, specialized maintenance skills required

3. Immersion Cooling (60-100+ kW/rack)

Servers submerged in dielectric fluid that doesn’t conduct electricity:

  • Single-Phase Immersion: Fluid circulates through heat exchangers
  • Two-Phase Immersion: Fluid boils, carrying heat away as vapor

Advantages: Maximum density, minimal acoustic signature, extreme efficiency

Limitations: Specialized equipment, complex operations, limited vendor ecosystem

Cooling Infrastructure Requirements

Effective AI cooling requires facility-level capabilities:

Chilled Water Capacity: Minimum 2-5 megawatts of cooling capacity 

Redundancy: N+1 chillers and pumps ensure continuous operation 

Temperature Control: Precision cooling maintaining narrow temperature bands 

Monitoring Systems: Real-time temperature sensing with automatic alerts 

Emergency Procedures: Clear protocols for cooling system failures

 

Space and Density Considerations

Rack Configuration Options

Standard Racks (42U): Traditional 19-inch racks accommodate most AI servers but may limit cooling options

Deep Racks (48″+ depth): Accommodate larger servers and rear-door heat exchangers

Open Racks: Improved airflow for air-cooled high-density deployments

Enclosed Racks: Better containment for liquid-cooled systems

Floor Space Requirements

AI deployments require more than just rack space:

Hot/Cold Aisle Containment: Enclosed aisles separating cold supply air from hot exhaust

Cooling Infrastructure: Space for in-row cooling units or CDUs

Maintenance Clearance: Adequate space for accessing both front and rear of equipment

Cable Management: Overhead or underfloor pathways for power and network cabling

A 32-rack AI deployment might require 2,000-3,000 square feet, including support infrastructure, not just the 500-600 square feet of the racks themselves.

Network Infrastructure for AI

AI workloads generate extreme network traffic:

Training Workloads: Multi-terabit internal connectivity for distributed training

Inference Serving: High-throughput, low-latency connections for request processing 

Data Loading: Fast storage network for dataset access

Network Requirements:

  • 100-400 Gbps per server connectivity
  • Sub-microsecond latency for GPU-to-GPU communication
  • Lossless Ethernet or InfiniBand for distributed training
  • Dedicated storage networks (NVMe-oF, etc.)

This demands:

  • High-density switches supporting 400G optics
  • Structured cabling supporting short-reach optics
  • Network redundancy for production workloads 

Real-World AI Infrastructure Scenarios

Scenario 1: AI Startup Training Foundation Models

Requirements:

  • 64 NVIDIA H100 GPUs for model training
  • High-performance storage for training datasets
  • Development environment for data scientists

Infrastructure Design:

  • 8 GPU servers (8x H100 each) = 56 kW
  • Direct-to-chip liquid cooling with CDUs
  • 400 Gbps InfiniBand networking
  • 2 petabytes NVMe storage
  • Colocation: 1/4 cage (10 racks)

Cost Comparison:

  • Cloud (AWS p5.48xlarge): ~$98/hour = $858,000/year
  • Colocation: ~$25,000/month = $300,000/year
  • Savings: $558,000 annually (65% reduction)

Scenario 2: Enterprise AI Inference Platform

Requirements:

  • 128 NVIDIA L4 GPUs for inference serving
  • Low-latency user access
  • High availability with redundancy

Infrastructure Design:

  • 16 inference servers (8x L4 each) = 38 kW
  • Enhanced air cooling with rear-door heat exchangers
  • 100 Gbps networking
  • Load balancing and orchestration
  • Colocation: Private cage (8 racks)

Benefits:

  • 5ms response time (vs. 20-50ms cloud)
  • Predictable costs
  • Control over deployment and updates

Scenario 3: Research Institution HPC Cluster

Requirements:

  • 256 NVIDIA A100 GPUs for research computing
  • Shared resource across multiple research groups
  • Budget-conscious deployment

Infrastructure Design:

  • 32 GPU servers (8x A100 each) = 140 kW
  • Combination air + liquid cooling
  • Job scheduling and resource management
  • Tiered storage architecture
  • Colocation: Private suite (25 racks)

Advantages:

  • 70% cost savings vs. cloud
  • On-demand access for researchers
  • No cloud quota limitations
  • Data sovereignty for sensitive research 

How DataBank Supports AI Infrastructure

Purpose-Built for High-Density Computing

DataBank’s Data Center Evolved™ platform addresses AI infrastructure challenges:

Power Capacity: Facilities designed from the ground up support 30-60+ kW per rack with room for growth. Advanced electrical infrastructure, including high-voltage distribution and intelligent monitoring.

Advanced Cooling: DataBank supports multiple cooling technologies:

  • Optimized air cooling with containment
  • Direct-to-chip liquid cooling with CDU integration
  • Rear-door heat exchangers
  • Custom cooling solutions for extreme density

Flexible Deployment Options: Start with a few racks and scale to private suites or dedicated facilities as AI initiatives grow. No long-term lock-in or forced migration.

Strategic Locations: With 75+ data centers across key U.S. metros, DataBank positions your AI infrastructure near:

  • Talent pools in tech hubs
  • Users requiring low-latency access
  • Research institutions and partners
  • Cloud on-ramps for hybrid architectures

AI-Ready Infrastructure Components

High-Density Racks: Support for extreme power densities with appropriate cooling

GPU-Optimized Networking: High-speed switches and structured cabling

Storage Solutions: SAN and object storage for training data and model repositories

Cloud Connectivity: Direct connections to major cloud providers for hybrid AI workflows

Security: Physical and network security meeting enterprise and regulatory requirements

Real Customer Success: University of Maryland

The University of Maryland needed HPC infrastructure for AI research, but faced:

  • Insufficient power in existing facilities
  • Budget constraints limiting cloud usage
  • Need for liquid cooling support

DataBank Solution:

  • Custom build-out in Northern Virginia facility
  • Direct-to-chip liquid cooling infrastructure
  • Flexible lease terms spreading costs
  • Expert support managing specialized cooling

Results:

  • Deployed 128+ GPUs for research computing
  • Achieved target power density of 45 kW per rack
  • Eliminated cloud budget overruns
  • Provided researchers with dedicated, performant infrastructure 

Selecting an AI-Ready Colocation Provider

Critical Evaluation Criteria

1. Power Infrastructure

  • What is the maximum power per rack?
  • Is electrical capacity available or require upgrades?
  • What is the power redundancy level (N, N+1, 2N)?
  • Can you scale power as deployment grows?

2. Cooling Capabilities

  • What cooling technologies are supported?
  • Has the provider deployed liquid cooling for customers?
  • What is the facility’s cooling redundancy?
  • Are there cooling specialists on staff?

3. Network Ecosystem

  • What network carriers are available?
  • Are there direct cloud connections?
  • Can the provider support high-speed InfiniBand or 400G Ethernet?
  • What is the network redundancy?

4. Physical Security

  • What access controls protect equipment?
  • Are there 24/7 security personnel?
  • How are visitor access and escorts managed?

5. Compliance Certifications

  • Does the facility meet relevant compliance standards?
  • Are SOC 2 reports available?
  • For research: Are ITAR or FedRAMP certifications available?

6. Technical Expertise

  • Does the provider have AI deployment experience?
  • Are there engineers who understand GPU infrastructure?
  • What level of support is included?

7. Financial Stability

  • Is the provider financially sound for long-term partnership?
  • What happens to your equipment if the provider has issues?

Planning Your AI Infrastructure Deployment

Phase 1: Requirements Assessment (Weeks 1-2)

  • Document current and 6-12 month GPU requirements
  • Calculate power and cooling needs
  • Identify network connectivity requirements
  • Define budget parameters

Phase 2: Provider Selection (Weeks 3-6)

  • Issue RFP to qualified providers
  • Conduct facility tours
  • Review technical specifications and SLAs
  • Validate reference customers with similar deployments
  • Negotiate contract terms

Phase 3: Design and Planning (Weeks 7-10)

  • Finalize rack layouts and power distribution
  • Design cooling solution
  • Plan network architecture
  • Coordinate with provider on infrastructure preparation

Phase 4: Deployment (Weeks 11-14)

  • Equipment procurement and staging
  • Installation and cabling
  • Network configuration
  • Testing and validation

Phase 5: Operations (Ongoing)

  • Performance monitoring
  • Capacity planning
  • Optimization and tuning
  • Scaling as requirements evolve 

The Future of AI Infrastructure

Emerging Trends

Increased Power Density: Next-generation GPUs will push power requirements even higher. NVIDIA’s upcoming architectures suggest 900-1000W per GPU.

Liquid Cooling Becomes Standard: As densities exceed air cooling limits, direct-to-chip and immersion cooling will become mainstream rather than exotic.

Edge AI: Inference workloads move closer to users, requiring distributed AI infrastructure in more locations.

Quantum Integration: Early quantum computing systems will integrate with classical AI infrastructure for hybrid quantum-classical algorithms.

Sustainability Focus: Energy efficiency and renewable power become critical differentiators as AI power consumption grows. 

Conclusion: Building for AI Success

AI infrastructure isn’t traditional IT at higher density; it’s a fundamentally different challenge requiring specialized facilities, cooling technologies, and expertise. Organizations that underestimate these requirements face deployment delays, cost overruns, and performance limitations that handicap their AI initiatives.

Colocation with an AI-capable provider offers the best of both worlds: infrastructure purpose-built for extreme density without the capital expense and long timelines of building your own facility, and without the cost explosion and performance compromises of public cloud.

DataBank’s AI-Ready Infrastructure delivers the power capacity, advanced cooling, network connectivity, and expert support that make AI initiatives successful. With 75+ facilities nationwide and proven experience deploying extreme-density computing environments, DataBank is the partner sophisticated AI organizations trust.

Ready to deploy your AI infrastructure? Contact DataBank to discuss your requirements and schedule a tour of our AI-ready facilities. Our infrastructure architects will work with you to design the optimal deployment for your specific needs.

DataBank

Sign Up For Our Resource Library

Enjoying our resource? Get the latest news and articles delivered straight to your inbox.

Can’t see the form? Click here.


Share Article



Popular Categories

Frequently Asked Questions


  • How are data center operations being transformed by AI?
    AI is revolutionizing data center operations by introducing intelligent automation, predictive analytics, and real-time monitoring. Through machine learning algorithms, AI can anticipate system failures, optimize server workloads, and dynamically manage power and cooling systems. It enables proactive maintenance and resource allocation, reducing downtime and operational costs. AI also enhances decision-making by analyzing massive datasets to uncover performance trends and inefficiencies. Overall, AI-driven management shifts data centers from reactive to predictive operations, improving reliability, scalability, and performance while minimizing human intervention.
  • How does AI improve efficiency and sustainability in data centers?
    AI improves efficiency by continuously analyzing sensor data to optimize cooling, power usage, and server performance. Predictive algorithms adjust environmental conditions to reduce energy waste while maintaining ideal temperatures. Machine learning models also consolidate underutilized servers, reducing idle power consumption. This intelligent optimization can cut energy use by up to 40%, significantly lowering carbon emissions and operational costs. AI further supports sustainability goals by predicting peak loads, scheduling tasks during off-peak hours, and integrating renewable energy sources. The result is a smarter, greener, and more cost-effective data center ecosystem.
  • What future trends are shaping AI-driven data centers?
    Future trends in AI-driven data centers include the rise of fully autonomous operations, deeper integration with edge computing, and the use of digital twin technology for predictive optimization. AI will increasingly drive energy-efficient “green” data centers through real-time sustainability analytics and intelligent cooling systems. Quantum computing and neuromorphic processors may further accelerate data processing capabilities. Enhanced interoperability between AI systems and multi-cloud environments will also become standard. Ultimately, these trends point toward self-managing, carbon-conscious data centers that adapt dynamically—balancing performance, cost, and sustainability to meet growing global data demands.

Get Started

Discover the DataBank Difference today:
Hybrid infrastructure solutions with boundless edge reach and a human touch.