Docs
[PLATFORM]

High Availability (HA)

PLTFRMS is designed to remain available under failure conditions β€” from individual service instances to complete datacenter outages.

High availability is achieved through a combination of replication, distribution, and intelligent routing across the entire platform.


Multi-replica services

Within each datacenter, services run as clusters with multiple replicas.

This ensures that:

  • Individual instance failures do not impact availability
  • Load can be distributed across replicas
  • Deployments can happen without downtime

Each product is built to operate in this distributed model, allowing it to scale horizontally while remaining resilient.


Multi-datacenter architecture

PLTFRMS is deployed across multiple datacenters.

Services are not limited to a single location β€” instead, they run across multiple DCs simultaneously, providing:

  • Redundancy in case of datacenter failure
  • Geographic distribution for improved latency
  • Continuous availability during maintenance or incidents

Anycast routing

PLTFRMS uses anycast routing, both externally and internally.

Services are exposed via:

  • /32 (IPv4)
  • /128 (IPv6)

These prefixes are announced from multiple locations. Traffic is automatically routed to the closest or best available datacenter, based on network conditions.

This provides:

  • Built-in failover (traffic shifts automatically)
  • Load distribution across locations
  • No dependency on a single entry point

Internally, anycast is also used between services, ensuring efficient and resilient communication within the platform.


Network ownership

PLTFRMS operates on a fully owned and managed network.

This includes:

  • Our own ASN and routing policies
  • Direct connectivity to IXPs, PNIs, and upstream providers
  • Full control over traffic flow and routing decisions

Because we control the network layer, we can optimize for performance, reliability, and failover without relying on third-party abstractions.


Inter-datacenter connectivity

Datacenters are connected through a private inter-DC network.

This backbone is used for:

  • Service-to-service communication across locations
  • Data replication and synchronization
  • Internal routing and failover mechanisms

By operating our own interconnects, we reduce dependency on the public internet and maintain predictable, low-latency communication between sites.


Failure handling

High availability in PLTFRMS is not a single feature β€” it is the result of multiple layers working together.

In practice, this means:

  • If a service instance fails β†’ traffic is routed to another replica
  • If a node fails β†’ workloads are rescheduled automatically
  • If a datacenter fails β†’ traffic shifts to other locations via BGP

All of this happens without manual intervention.


Why it matters

This architecture ensures that the platform remains:

  • Available β€” even during failures
  • Resilient β€” no single point of failure
  • Scalable β€” capacity grows horizontally
  • Predictable β€” consistent behavior under load or failure

High availability is built into the foundation of PLTFRMS, so you don’t have to design it from scratch.