Graceful Draining
Graceful draining enables zero-downtime deployments by allowing old handlers to complete in-flight requests while new handlers receive new requests.
The Problem
Without graceful draining, updating a handler can cause:
- Dropped requests - In-flight requests are terminated
- Connection resets - Clients see connection errors
- Data corruption - Partial operations may leave inconsistent state
The Solution
Graceful draining solves this by:
- Loading the new handler before removing the old one
- Routing new requests to the new handler immediately
- Allowing old requests to complete on the old handler
- Unloading the old handler only when fully drained
Timeline
Time ──────────────────────────────────────────────────────────▶
Old Handler: ████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
(handling) (draining) (unloaded)
New Handler: ░░░░░░░░░░░░████████████████████████████████████████
(handling new requests)
▲
│ Swap point
How It Works
1. Request Tracking
Each handler tracks active requests using atomic counters:
#![allow(unused)] fn main() { pub struct LoadedHandler { active_requests: AtomicU64, draining: AtomicBool, } }
2. Request Guards
When a request starts, a guard is acquired:
#![allow(unused)] fn main() { let guard = handler.acquire_request()?; // Request is processed... // Guard is dropped when request completes }
The guard:
- Increments
active_requestson creation - Decrements
active_requestson drop - Returns
Noneif handler is draining
3. Graceful Swap
When swapping handlers:
#![allow(unused)] fn main() { let result = registry.swap_graceful( "my-endpoint", new_library_path, Duration::from_secs(30) // drain timeout ).await?; }
This:
- Loads the new handler
- Atomically swaps the active handler
- Marks the old handler as draining
- Spawns a background task to monitor draining
- Returns immediately (non-blocking)
4. Drain Monitoring
A background task monitors the old handler:
#![allow(unused)] fn main() { while !old_handler.is_drained() { if elapsed > drain_timeout { // Force unload after timeout break; } tokio::time::sleep(Duration::from_millis(100)).await; } // Old handler is now safe to unload }
API
Swap with Draining
#![allow(unused)] fn main() { let result = registry.swap_graceful( endpoint_id, new_path, drain_timeout ).await?; // Result contains: // - swapped: bool - Whether swap succeeded // - old_requests_pending: u64 - Requests still in flight // - draining: bool - Whether old handler is draining }
Check Draining Status
#![allow(unused)] fn main() { // Is handler accepting new requests? let accepting = !handler.is_draining(); // Is handler fully drained? let drained = handler.is_drained(); // How many requests are in flight? let active = handler.active_request_count(); }
Get Statistics
#![allow(unused)] fn main() { let stats = registry.stats().await; println!("Active handlers: {}", stats.loaded_count); println!("Draining handlers: {}", stats.draining_count); println!("Active requests: {}", stats.active_requests); println!("Draining requests: {}", stats.draining_requests); }
Drain Timeout
The drain timeout determines how long to wait for requests to complete:
| Timeout | Use Case |
|---|---|
| 5s | Fast APIs with quick responses |
| 30s | Standard web applications |
| 60s | Long-running operations |
| 300s | File uploads, batch processing |
If the timeout expires, the old handler is forcefully unloaded. Any remaining requests will fail.
Best Practices
1. Set Appropriate Timeouts
Match your drain timeout to your longest expected request:
#![allow(unused)] fn main() { // For a file upload endpoint registry.swap_graceful( "upload-endpoint", new_path, Duration::from_secs(300) // 5 minutes for large uploads ).await?; }
2. Monitor Draining
Log draining status for observability:
#![allow(unused)] fn main() { if result.draining { tracing::info!( endpoint = endpoint_id, pending = result.old_requests_pending, "Handler draining" ); } }
3. Handle Drain Rejection
When a handler is draining, new requests are rejected:
#![allow(unused)] fn main() { match handler.acquire_request() { Some(guard) => { // Process request } None => { // Handler is draining, return 503 return Response::service_unavailable("Handler updating, retry shortly"); } } }
4. Cleanup Drained Handlers
Periodically clean up fully drained handlers:
#![allow(unused)] fn main() { // In a background task loop { registry.cleanup_drained().await; tokio::time::sleep(Duration::from_secs(60)).await; } }