blog post cover

Securing Microservices with Istio: A Self-Hosted Journey

Our Kubernetes-based architecture included several components that needed to be secured when deployed to customer environments. Learn how we improved self-hosted deployments by adopting Istio as our service mesh.
blog post author photo
Vivek Nair
January 23, 2025

At Gentrace, we help companies test and evaluate their AI systems at scale.

As our platform grew more sophisticated, we found ourselves managing an increasingly complex microservices architecture including our core API and web application to specialized components like WebSocket servers and task runners. With enterprise customers requiring self-hosted deployments and strict security requirements, we needed a robust solution for securing service-to-service communication that would work reliably in their own infrastructure environments.

The Initial Challenge

Our Kubernetes-based architecture included several components that needed to be secured when deployed to customer environments:

  • Frontend application
  • API server
  • WebSocket server for real-time updates
  • Task runners for evaluation jobs
  • Task scheduler for evaluation jobs
  • Multiple databases (PostgreSQL, ClickHouse)
  • Kafka for event streaming
Architecture diagram for deploying Gentrace in customer environments

Initially, we relied on Kubernetes' built-in networking and basic TLS termination at the ingress level. But as we onboarded more enterprise customers, we faced increasing demands for:

  • End-to-end encryption between services (mTLS)
  • Fine-grained access control
  • Certificate management
  • Network policy enforcement
  • Service-to-service traffic visibility

Enter Istio

After evaluating various options, we chose Istio as our service mesh solution because it provides robust security features while remaining transparent to our application code.

For example, Istio automatically encrypted traffic between services without any code changes. Our applications simply connected to hostnames like kafka:9092 or postgres:5432, while Istio handled the security behind the scenes.

The implementation was surprisingly straightforward. Here's what it involved:

  1. Installing Istio and its Custom Resource Definitions (CRDs) in the Kubernetes cluster
  2. Tagging the namespace for automatic sidecar injection
  3. Adding conditional annotations to our services

Here's how we integrated Istio into our Helm chart:

# values.yaml
istio:
  # -- Enable Istio integration
  enabled: true
  # -- Istio injection label value
  injection: "true"
  # -- Additional Istio annotations
  annotations: {}

For each deployment, we added a conditional block:

metadata:
  labels:
    app: {{ .Values.app.name }}
  {{- if .Values.istio.enabled }}
  annotations:
    sidecar.istio.io/inject: {{ .Values.istio.injection | quote }}
    {{- with .Values.istio.annotations }}
    {{- toYaml . | nindent 8 }}
    {{- end }}
  {{- end }}

Making It Optional

A key requirement was making Istio optional for customers who might want to use their own service mesh or none at all. Our Helm chart makes this straightforward:

  1. Istio can be enabled/disabled via a single value
  2. All Istio-specific configurations are conditionally applied
  3. The application works identically whether Istio is enabled or not

For example, here's how we handle the API deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
spec:
  template:
    metadata:
      labels:
        app: api
      {{- if .Values.istio.enabled }}
      annotations:
        sidecar.istio.io/inject: {{ .Values.istio.injection | quote }}
      {{- end }}

The Benefits

While zero-trust security with mTLS encryption was our primary goal, Istio provided additional benefits that delighted both us and our customers:

  1. Simplified Certificate Management: Istio handles certificate generation, rotation, and distribution automatically.
  2. Fine-grained Access Control: We can define precise rules about which services can communicate with each other.
  3. Observability: The Istio sidecar proxies automatically collect detailed metrics about service-to-service communication, which feeds into a built-in Prometheus instance for monitoring.
  4. Service Topology Visualization: Istio includes Kiali, a powerful visualization tool that teaks can use to monitor and audit service interactions in real-time. Simply running istioctl dashboard kiali brings up an interactive view of your service mesh.
  5. Easy Customer Adoption: Customers can enable Istio with a single configuration value:
istio:
  enabled: true

Implementation Details

Our implementation covers all components of our stack:

  1. Core Services: API, Web App, and WebSocket server
  2. Data Services: PostgreSQL, ClickHouse, and Kafka
  3. Background Workers: Task runners and schedulers
  4. Object Storage: Built-in MinIO for S3-compatible storage, with support for customer-provided S3 credentials. This handles file ingestion needs for AI evaluations, including PDFs and other documents.

Each component gets Istio sidecar injection when enabled:

template:
  metadata:
    labels:
      app: {{ .Values.taskrunner.name }}
    {{- if .Values.istio.enabled }}
    annotations:
      sidecar.istio.io/inject: {{ .Values.istio.injection | quote }}
    {{- end }}

Lessons Learned

  1. Keep It Optional: Not every customer needs or wants a service mesh, especially since sidecars eat some hardware resources. Making it optional via configuration was crucial.
  2. Consistent Implementation: Apply the same pattern across all services to maintain predictability.
  3. Documentation Matters: Clear documentation about prerequisites and setup steps helps customers succeed.
  4. Resource Planning: Account for the additional resources needed by Istio sidecars.

Looking Forward

The modular nature of our Helm chart means we can easily extend our service mesh capabilities. Future enhancements might include:

  • Custom Istio configurations for specific deployments (e.g., traffic policies, network rules, and routing behaviors to meet IT department requirements)
  • Integration with external certificate authorities and X.509 certificate management systems like HashiCorp Vault for automated certificate signing and distribution
  • Info: Want to try this yourself? Check out our self-hosted repository with complete Helm charts and documentation. You'll need to contact [email protected] for secure access tokens to access our Quay registry.

Join Us

We're always looking for engineers who enjoy solving complex systems challenges. If you:

  • Build secure, scalable architectures
  • Care about AI safety and reliability

Check out our open positions at gentrace.ai/eng

The journey to securing microservices communication doesn't have to be complex. With tools like Istio and thoughtful implementation, you can achieve robust security while maintaining simplicity for your users.

Evaluate

Experiment

Compare