Skip to content

Hardware Sizing

Understand the hardware requirements for running ticket classification models at different scales.

Hardware requirements for ticket classification depend on several factors:

  • Model size and complexity
  • Number of tickets processed
  • Classification frequency
  • Response time requirements
  • Budget constraints
ScaleTickets/DayMin RAMMin CPUGPUModel Type
Small<1,000512 MB1 coreNoSimple ML
Medium1,000-10,0002 GB2 coresOptionalBERT-based
Large10,000-100,0008 GB4 coresRecommendedBERT/Large
Enterprise>100,00016+ GB8+ coresRequiredCustom/Fine-tuned

Best for:

  • Small to medium ticket volumes (<10,000/day)
  • Budget-conscious deployments
  • Simpler models (distilled BERT, small transformers)

Recommended Specs:

Small Scale:
CPU: 1-2 cores (2.0+ GHz)
RAM: 512 MB - 2 GB
Storage: 5 GB
Network: Standard
Medium Scale:
CPU: 2-4 cores (2.5+ GHz)
RAM: 2-4 GB
Storage: 10 GB
Network: Standard

Expected Performance:

  • Classification time: 200-500ms per ticket
  • Throughput: 100-500 tickets/minute
  • Model loading time: 5-30 seconds

Best for:

  • Large ticket volumes (>10,000/day)
  • Real-time classification requirements
  • Large transformer models
  • Fine-tuning and retraining

Recommended Specs:

Medium-Large Scale:
CPU: 4-8 cores
RAM: 8-16 GB
GPU: NVIDIA T4 or better (16 GB VRAM)
Storage: 20 GB SSD
Network: High bandwidth
Enterprise Scale:
CPU: 8-16 cores
RAM: 16-32 GB
GPU: NVIDIA A10/A100 (24-80 GB VRAM)
Storage: 50+ GB NVMe SSD
Network: High bandwidth, low latency

Expected Performance:

  • Classification time: 10-50ms per ticket
  • Throughput: 1,000-10,000 tickets/minute
  • Model loading time: 2-10 seconds

Examples:

  • DistilBERT
  • MiniLM
  • TinyBERT

Requirements:

  • RAM: 512 MB - 1 GB
  • CPU: 1-2 cores sufficient
  • GPU: Not required

Use Cases:

  • Low-volume environments
  • Cost-sensitive deployments
  • Edge deployments

Examples:

  • BERT-base
  • RoBERTa-base
  • Custom fine-tuned models

Requirements:

  • RAM: 2-4 GB
  • CPU: 2-4 cores recommended
  • GPU: Optional, improves performance 5-10x

Use Cases:

  • Most production deployments
  • Balanced accuracy/performance
  • Standard ticket volumes

Examples:

  • BERT-large
  • RoBERTa-large
  • GPT-based models
  • Custom ensemble models

Requirements:

  • RAM: 8-16 GB
  • CPU: 4-8 cores minimum
  • GPU: Highly recommended (T4 or better)

Use Cases:

  • High-accuracy requirements
  • Complex classification tasks
  • Multi-label classification
  • High-volume processing

Configure appropriate resource limits:

docker-compose.yml
services:
ticket-classifier:
image: openticketai/engine:latest
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
kubernetes-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ticket-classifier
spec:
containers:
- name: classifier
image: openticketai/engine:latest
resources:
requests:
memory: '2Gi'
cpu: '1000m'
limits:
memory: '4Gi'
cpu: '2000m'

Monitor these metrics:

  • CPU Usage: Should be <80% average
  • Memory Usage: Should have 20% headroom
  • Classification Latency: P95 latency under target
  • Queue Depth: Tickets waiting for classification

Increase resources on a single instance:

# Start
RAM: 2 GB, CPU: 2 cores
# Scale up
RAM: 4 GB, CPU: 4 cores
# Further scaling
RAM: 8 GB, CPU: 8 cores

Pros:

  • Simple to implement
  • No code changes required
  • Easy to manage

Cons:

  • Limited by hardware maximums
  • Single point of failure
  • Potentially expensive

Deploy multiple instances:

# Load balancer
└── Classifier Instance 1 (2 GB, 2 cores)
└── Classifier Instance 2 (2 GB, 2 cores)
└── Classifier Instance 3 (2 GB, 2 cores)

Pros:

  • Better reliability
  • Handles traffic spikes
  • More cost-effective at scale

Cons:

  • More complex setup
  • Requires load balancer
  • Shared state considerations

Dynamic scaling based on load:

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ticket-classifier
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ticket-classifier
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
  • Base models: 100 MB - 5 GB
  • Fine-tuned models: +100-500 MB
  • Cache: 1-5 GB
  • Logs: 100 MB - 1 GB/day
Disk Layout:
├── /models/ (10-20 GB, SSD)
├── /cache/ (5 GB, SSD)
├── /logs/ (rotating, 10 GB)
└── /data/ (variable, standard storage)
  • Model downloads: Initial 1-5 GB, then minimal
  • API traffic: 1-10 KB per ticket
  • Monitoring: 1-5 MB/hour
  • Internal: <10ms ideal
  • External APIs: <100ms acceptable
  • Model serving: <50ms target

Minimal cost setup for testing:

Cloud Instance:
Type: t3.small (AWS) / e2-small (GCP)
vCPU: 2
RAM: 2 GB
Cost: ~$15-20/month

Cost-effective production:

Cloud Instance:
Type: t3.medium (AWS) / e2-medium (GCP)
vCPU: 2
RAM: 4 GB
Cost: ~$30-40/month

High-performance production:

Cloud Instance:
Type: c5.2xlarge (AWS) / c2-standard-8 (GCP)
vCPU: 8
RAM: 16 GB
GPU: Optional T4
Cost: ~$150-300/month (CPU) or ~$400-600/month (GPU)

Test classification performance:

Terminal window
# Load test with 100 concurrent requests
ab -n 1000 -c 100 http://localhost:8080/classify
# Monitor during test
docker stats ticket-classifier
# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/classify
MetricTargetMeasurement
Latency P50<200msMedian response time
Latency P95<500ms95th percentile
Latency P99<1000ms99th percentile
Throughput>100/minTickets classified
CPU Usage<80%Average utilization
Memory Usage<80%Peak utilization

Symptoms:

MemoryError: Unable to allocate array
Container killed (OOMKilled)

Solutions:

  1. Increase memory allocation
  2. Use smaller model variant
  3. Reduce batch size
  4. Enable model quantization

Symptoms:

  • Latency >1 second per ticket
  • Growing processing queue

Solutions:

  1. Enable GPU acceleration
  2. Use model distillation
  3. Optimize batch processing
  4. Add more replicas

Symptoms:

  • CPU constantly >90%
  • Throttled performance

Solutions:

  1. Add more CPU cores
  2. Optimize model inference
  3. Implement request queuing
  4. Scale horizontally
  • Start with CPU-only for testing
  • Monitor resource usage continuously
  • Set appropriate resource limits
  • Plan for 2x current load
  • Use caching where possible
  • Implement health checks
  • Under-provision memory (causes OOM)
  • Skip performance testing
  • Ignore monitoring metrics
  • Over-provision unnecessarily
  • Mix production and development workloads

After sizing your hardware:

  1. Deploy Infrastructure: Set up servers/containers
  2. Install Model: Download and configure classification model
  3. Performance Test: Validate against your requirements
  4. Monitor: Set up metrics and alerting