Hardware Sizing
Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.
Hardware Sizing
Section titled “Hardware Sizing”Understand the hardware requirements for running ticket classification models at different scales.
Overview
Section titled “Overview”Hardware requirements for ticket classification depend on several factors:
- Model size and complexity
- Number of tickets processed
- Classification frequency
- Response time requirements
- Budget constraints
Quick Reference
Section titled “Quick Reference”| Scale | Tickets/Day | Min RAM | Min CPU | GPU | Model Type |
|---|---|---|---|---|---|
| Small | <1,000 | 512 MB | 1 core | No | Simple ML |
| Medium | 1,000-10,000 | 2 GB | 2 cores | Optional | BERT-based |
| Large | 10,000-100,000 | 8 GB | 4 cores | Recommended | BERT/Large |
| Enterprise | >100,000 | 16+ GB | 8+ cores | Required | Custom/Fine-tuned |
Deployment Models
Section titled “Deployment Models”CPU-Only Deployment
Section titled “CPU-Only Deployment”Best for:
- Small to medium ticket volumes (<10,000/day)
- Budget-conscious deployments
- Simpler models (distilled BERT, small transformers)
Recommended Specs:
Small Scale: CPU: 1-2 cores (2.0+ GHz) RAM: 512 MB - 2 GB Storage: 5 GB Network: Standard
Medium Scale: CPU: 2-4 cores (2.5+ GHz) RAM: 2-4 GB Storage: 10 GB Network: StandardExpected Performance:
- Classification time: 200-500ms per ticket
- Throughput: 100-500 tickets/minute
- Model loading time: 5-30 seconds
GPU-Accelerated Deployment
Section titled “GPU-Accelerated Deployment”Best for:
- Large ticket volumes (>10,000/day)
- Real-time classification requirements
- Large transformer models
- Fine-tuning and retraining
Recommended Specs:
Medium-Large Scale: CPU: 4-8 cores RAM: 8-16 GB GPU: NVIDIA T4 or better (16 GB VRAM) Storage: 20 GB SSD Network: High bandwidth
Enterprise Scale: CPU: 8-16 cores RAM: 16-32 GB GPU: NVIDIA A10/A100 (24-80 GB VRAM) Storage: 50+ GB NVMe SSD Network: High bandwidth, low latencyExpected Performance:
- Classification time: 10-50ms per ticket
- Throughput: 1,000-10,000 tickets/minute
- Model loading time: 2-10 seconds
Model Size Impact
Section titled “Model Size Impact”Small Models (50-150 MB)
Section titled “Small Models (50-150 MB)”Examples:
- DistilBERT
- MiniLM
- TinyBERT
Requirements:
- RAM: 512 MB - 1 GB
- CPU: 1-2 cores sufficient
- GPU: Not required
Use Cases:
- Low-volume environments
- Cost-sensitive deployments
- Edge deployments
Medium Models (300-500 MB)
Section titled “Medium Models (300-500 MB)”Examples:
- BERT-base
- RoBERTa-base
- Custom fine-tuned models
Requirements:
- RAM: 2-4 GB
- CPU: 2-4 cores recommended
- GPU: Optional, improves performance 5-10x
Use Cases:
- Most production deployments
- Balanced accuracy/performance
- Standard ticket volumes
Large Models (1-5 GB)
Section titled “Large Models (1-5 GB)”Examples:
- BERT-large
- RoBERTa-large
- GPT-based models
- Custom ensemble models
Requirements:
- RAM: 8-16 GB
- CPU: 4-8 cores minimum
- GPU: Highly recommended (T4 or better)
Use Cases:
- High-accuracy requirements
- Complex classification tasks
- Multi-label classification
- High-volume processing
Containerized Deployments
Section titled “Containerized Deployments”Docker Resource Limits
Section titled “Docker Resource Limits”Configure appropriate resource limits:
services: ticket-classifier: image: openticketai/engine:latest deploy: resources: limits: cpus: '2' memory: 4G reservations: cpus: '1' memory: 2GKubernetes Pod Sizing
Section titled “Kubernetes Pod Sizing”apiVersion: v1kind: Podmetadata: name: ticket-classifierspec: containers: - name: classifier image: openticketai/engine:latest resources: requests: memory: '2Gi' cpu: '1000m' limits: memory: '4Gi' cpu: '2000m'Resource Monitoring
Section titled “Resource Monitoring”Monitor these metrics:
- CPU Usage: Should be <80% average
- Memory Usage: Should have 20% headroom
- Classification Latency: P95 latency under target
- Queue Depth: Tickets waiting for classification
Scaling Strategies
Section titled “Scaling Strategies”Vertical Scaling
Section titled “Vertical Scaling”Increase resources on a single instance:
# StartRAM: 2 GB, CPU: 2 cores
# Scale upRAM: 4 GB, CPU: 4 cores
# Further scalingRAM: 8 GB, CPU: 8 coresPros:
- Simple to implement
- No code changes required
- Easy to manage
Cons:
- Limited by hardware maximums
- Single point of failure
- Potentially expensive
Horizontal Scaling
Section titled “Horizontal Scaling”Deploy multiple instances:
# Load balancer└── Classifier Instance 1 (2 GB, 2 cores)└── Classifier Instance 2 (2 GB, 2 cores)└── Classifier Instance 3 (2 GB, 2 cores)Pros:
- Better reliability
- Handles traffic spikes
- More cost-effective at scale
Cons:
- More complex setup
- Requires load balancer
- Shared state considerations
Auto-Scaling
Section titled “Auto-Scaling”Dynamic scaling based on load:
# Kubernetes HPAapiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: ticket-classifierspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ticket-classifier minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70Storage Requirements
Section titled “Storage Requirements”Model Storage
Section titled “Model Storage”- Base models: 100 MB - 5 GB
- Fine-tuned models: +100-500 MB
- Cache: 1-5 GB
- Logs: 100 MB - 1 GB/day
Recommended Setup
Section titled “Recommended Setup”Disk Layout:├── /models/ (10-20 GB, SSD)├── /cache/ (5 GB, SSD)├── /logs/ (rotating, 10 GB)└── /data/ (variable, standard storage)Network Requirements
Section titled “Network Requirements”Bandwidth
Section titled “Bandwidth”- Model downloads: Initial 1-5 GB, then minimal
- API traffic: 1-10 KB per ticket
- Monitoring: 1-5 MB/hour
Latency
Section titled “Latency”- Internal: <10ms ideal
- External APIs: <100ms acceptable
- Model serving: <50ms target
Cost Optimization
Section titled “Cost Optimization”Development Environment
Section titled “Development Environment”Minimal cost setup for testing:
Cloud Instance: Type: t3.small (AWS) / e2-small (GCP) vCPU: 2 RAM: 2 GB Cost: ~$15-20/monthProduction Small Scale
Section titled “Production Small Scale”Cost-effective production:
Cloud Instance: Type: t3.medium (AWS) / e2-medium (GCP) vCPU: 2 RAM: 4 GB Cost: ~$30-40/monthProduction Large Scale
Section titled “Production Large Scale”High-performance production:
Cloud Instance: Type: c5.2xlarge (AWS) / c2-standard-8 (GCP) vCPU: 8 RAM: 16 GB GPU: Optional T4 Cost: ~$150-300/month (CPU) or ~$400-600/month (GPU)Performance Testing
Section titled “Performance Testing”Benchmarking Your Setup
Section titled “Benchmarking Your Setup”Test classification performance:
# Load test with 100 concurrent requestsab -n 1000 -c 100 http://localhost:8080/classify
# Monitor during testdocker stats ticket-classifier
# Check response timescurl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/classifyPerformance Targets
Section titled “Performance Targets”| Metric | Target | Measurement |
|---|---|---|
| Latency P50 | <200ms | Median response time |
| Latency P95 | <500ms | 95th percentile |
| Latency P99 | <1000ms | 99th percentile |
| Throughput | >100/min | Tickets classified |
| CPU Usage | <80% | Average utilization |
| Memory Usage | <80% | Peak utilization |
Troubleshooting
Section titled “Troubleshooting”Out of Memory Errors
Section titled “Out of Memory Errors”Symptoms:
MemoryError: Unable to allocate arrayContainer killed (OOMKilled)Solutions:
- Increase memory allocation
- Use smaller model variant
- Reduce batch size
- Enable model quantization
Slow Classification
Section titled “Slow Classification”Symptoms:
- Latency >1 second per ticket
- Growing processing queue
Solutions:
- Enable GPU acceleration
- Use model distillation
- Optimize batch processing
- Add more replicas
High CPU Usage
Section titled “High CPU Usage”Symptoms:
- CPU constantly >90%
- Throttled performance
Solutions:
- Add more CPU cores
- Optimize model inference
- Implement request queuing
- Scale horizontally
Best Practices
Section titled “Best Practices”- Start with CPU-only for testing
- Monitor resource usage continuously
- Set appropriate resource limits
- Plan for 2x current load
- Use caching where possible
- Implement health checks
DON’T ❌
Section titled “DON’T ❌”- Under-provision memory (causes OOM)
- Skip performance testing
- Ignore monitoring metrics
- Over-provision unnecessarily
- Mix production and development workloads
Next Steps
Section titled “Next Steps”After sizing your hardware:
- Deploy Infrastructure: Set up servers/containers
- Install Model: Download and configure classification model
- Performance Test: Validate against your requirements
- Monitor: Set up metrics and alerting
Related Documentation
Section titled “Related Documentation”- Using Model - Configure and deploy classification models
- Taxonomy Design - Design your classification taxonomy
- Tag Mapping - Map classifications to ticket fields