Microservice Deployment (Single or Multiple Replicas)

The Spice Runtime operates as an independent microservice. Multiple replicas may be deployed behind a load balancer to achieve high availability and handle spikes in demand.

Benefits

Loose coupling between the application and the Spice Runtime.
Independent scaling and upgrades.
Can serve multiple applications or services within an organization.
Helps achieve high availability and redundancy.

Considerations

Additional network hop introduces latency compared to sidecar.
More complex infrastructure, requiring service discovery and load balancing.
Potentially higher cost due to additional infrastructure components.

Use This Approach When

A loosely coupled architecture and the ability to independently scale the AI service are desired.
Multiple services or teams need to share the same AI engine.
Heavy or varying traffic is anticipated, requiring independent scaling of the Spice Runtime.
Resiliency and redundancy are prioritized over simplicity.

Example Use Case
A large organization where multiple services (recommendations, analytics, etc.) need to share AI insights. A centralized Spice Runtime microservice cluster helps separate teams consume AI outputs without duplicating efforts.