Zum Inhalt springen

Hardware Sizing

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Understand the hardware requirements for running ticket classification models at different scales.

Hardware requirements for ticket classification depend on several factors:

  • Model size and complexity
  • Number of tickets processed
  • Classification frequency
  • Response time requirements
  • Budget constraints
ScaleTickets/DayMin RAMMin CPUGPUModel Type
Small<1,000512 MB1 coreNoSimple ML
Medium1,000-10,0002 GB2 coresOptionalBERT-based
Large10,000-100,0008 GB4 coresRecommendedBERT/Large
Enterprise>100,00016+ GB8+ coresRequiredCustom/Fine-tuned

Best for:

  • Small to medium ticket volumes (<10,000/day)
  • Budget-conscious deployments
  • Simpler models (distilled BERT, small transformers)

Recommended Specs:

Small Scale:
CPU: 1-2 cores (2.0+ GHz)
RAM: 512 MB - 2 GB
Storage: 5 GB
Network: Standard
Medium Scale:
CPU: 2-4 cores (2.5+ GHz)
RAM: 2-4 GB
Storage: 10 GB
Network: Standard

Expected Performance:

  • Classification time: 200-500ms per ticket
  • Throughput: 100-500 tickets/minute
  • Model loading time: 5-30 seconds

Best for:

  • Large ticket volumes (>10,000/day)
  • Real-time classification requirements
  • Large transformer models
  • Fine-tuning and retraining

Recommended Specs:

Medium-Large Scale:
CPU: 4-8 cores
RAM: 8-16 GB
GPU: NVIDIA T4 or better (16 GB VRAM)
Storage: 20 GB SSD
Network: High bandwidth
Enterprise Scale:
CPU: 8-16 cores
RAM: 16-32 GB
GPU: NVIDIA A10/A100 (24-80 GB VRAM)
Storage: 50+ GB NVMe SSD
Network: High bandwidth, low latency

Expected Performance:

  • Classification time: 10-50ms per ticket
  • Throughput: 1,000-10,000 tickets/minute
  • Model loading time: 2-10 seconds

Examples:

  • DistilBERT
  • MiniLM
  • TinyBERT

Requirements:

  • RAM: 512 MB - 1 GB
  • CPU: 1-2 cores sufficient
  • GPU: Not required

Use Cases:

  • Low-volume environments
  • Cost-sensitive deployments
  • Edge deployments

Examples:

  • BERT-base
  • RoBERTa-base
  • Custom fine-tuned models

Requirements:

  • RAM: 2-4 GB
  • CPU: 2-4 cores recommended
  • GPU: Optional, improves performance 5-10x

Use Cases:

  • Most production deployments
  • Balanced accuracy/performance
  • Standard ticket volumes

Examples:

  • BERT-large
  • RoBERTa-large
  • GPT-based models
  • Custom ensemble models

Requirements:

  • RAM: 8-16 GB
  • CPU: 4-8 cores minimum
  • GPU: Highly recommended (T4 or better)

Use Cases:

  • High-accuracy requirements
  • Complex classification tasks
  • Multi-label classification
  • High-volume processing

Configure appropriate resource limits:

docker-compose.yml
services:
ticket-classifier:
image: openticketai/engine:latest
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
kubernetes-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ticket-classifier
spec:
containers:
- name: classifier
image: openticketai/engine:latest
resources:
requests:
memory: '2Gi'
cpu: '1000m'
limits:
memory: '4Gi'
cpu: '2000m'

Monitor these metrics:

  • CPU Usage: Should be <80% average
  • Memory Usage: Should have 20% headroom
  • Classification Latency: P95 latency under target
  • Queue Depth: Tickets waiting for classification

Increase resources on a single instance:

# Start
RAM: 2 GB, CPU: 2 cores
# Scale up
RAM: 4 GB, CPU: 4 cores
# Further scaling
RAM: 8 GB, CPU: 8 cores

Pros:

  • Simple to implement
  • No code changes required
  • Easy to manage

Cons:

  • Limited by hardware maximums
  • Single point of failure
  • Potentially expensive

Deploy multiple instances:

# Load balancer
└── Classifier Instance 1 (2 GB, 2 cores)
└── Classifier Instance 2 (2 GB, 2 cores)
└── Classifier Instance 3 (2 GB, 2 cores)

Pros:

  • Better reliability
  • Handles traffic spikes
  • More cost-effective at scale

Cons:

  • More complex setup
  • Requires load balancer
  • Shared state considerations

Dynamic scaling based on load:

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ticket-classifier
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ticket-classifier
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
  • Base models: 100 MB - 5 GB
  • Fine-tuned models: +100-500 MB
  • Cache: 1-5 GB
  • Logs: 100 MB - 1 GB/day
Disk Layout:
├── /models/ (10-20 GB, SSD)
├── /cache/ (5 GB, SSD)
├── /logs/ (rotating, 10 GB)
└── /data/ (variable, standard storage)
  • Model downloads: Initial 1-5 GB, then minimal
  • API traffic: 1-10 KB per ticket
  • Monitoring: 1-5 MB/hour
  • Internal: <10ms ideal
  • External APIs: <100ms acceptable
  • Model serving: <50ms target

Minimal cost setup for testing:

Cloud Instance:
Type: t3.small (AWS) / e2-small (GCP)
vCPU: 2
RAM: 2 GB
Cost: ~$15-20/month

Cost-effective production:

Cloud Instance:
Type: t3.medium (AWS) / e2-medium (GCP)
vCPU: 2
RAM: 4 GB
Cost: ~$30-40/month

High-performance production:

Cloud Instance:
Type: c5.2xlarge (AWS) / c2-standard-8 (GCP)
vCPU: 8
RAM: 16 GB
GPU: Optional T4
Cost: ~$150-300/month (CPU) or ~$400-600/month (GPU)

Test classification performance:

Terminal window
# Load test with 100 concurrent requests
ab -n 1000 -c 100 http://localhost:8080/classify
# Monitor during test
docker stats ticket-classifier
# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/classify
MetricTargetMeasurement
Latency P50<200msMedian response time
Latency P95<500ms95th percentile
Latency P99<1000ms99th percentile
Throughput>100/minTickets classified
CPU Usage<80%Average utilization
Memory Usage<80%Peak utilization

Symptoms:

MemoryError: Unable to allocate array
Container killed (OOMKilled)

Solutions:

  1. Increase memory allocation
  2. Use smaller model variant
  3. Reduce batch size
  4. Enable model quantization

Symptoms:

  • Latency >1 second per ticket
  • Growing processing queue

Solutions:

  1. Enable GPU acceleration
  2. Use model distillation
  3. Optimize batch processing
  4. Add more replicas

Symptoms:

  • CPU constantly >90%
  • Throttled performance

Solutions:

  1. Add more CPU cores
  2. Optimize model inference
  3. Implement request queuing
  4. Scale horizontally
  • Start with CPU-only for testing
  • Monitor resource usage continuously
  • Set appropriate resource limits
  • Plan for 2x current load
  • Use caching where possible
  • Implement health checks
  • Under-provision memory (causes OOM)
  • Skip performance testing
  • Ignore monitoring metrics
  • Over-provision unnecessarily
  • Mix production and development workloads

After sizing your hardware:

  1. Deploy Infrastructure: Set up servers/containers
  2. Install Model: Download and configure classification model
  3. Performance Test: Validate against your requirements
  4. Monitor: Set up metrics and alerting