High Availability

(OBJ 3.4)

High Availability
- Aims to keep services continuously available by minimizing downtime
- Achieved through load balancing, clustering, redundancy, and multi-cloud strategies

Uptime
- The time (minutes/hours) a system remains online over a given period, typically expressed as a percentage
Five nines
- Refers to 99.999% uptime, allowing only about 5 minutes of downtime per year
- This is very little downtime already
Six nines
- Refers to 99.9999% uptime, allows just 31 seconds of downtime per year
- Although most organization realistically need more downtime per year in order to implement upgrades and security patches

The process of distributing workloads across multiple computing resources
Optimizes resource use, throughput, and minimize response time
Prevents overloading of any single resource
Load balancers achieve this by using complex algorithms to distribute incoming requests to capable servers
Example:
- All requests go to a load balancer first, and then it will redirect that user's request to one of the three servers in order to respond to the request in a more timely manner.

Uses multiple computers, storage devices, and redundant network connections as a single system
Provides high availability, reliability, and scalability
Ensures continuity of service even in case of hardware failure in order to ensure there are no single points of failure
Can be combined with load balancing for robust solutions and maintaining higher levels of availability

Involves duplicating critical components to increase system reliability
Redundancy can be implemented by adding multiple
- Power supplies
- Network connections
- Servers
- Software services
- Service providers
Prevents single points of failure in systems
Examples
- Redundant power supplies
  - The failure of one power source will not affect the continuity of your other services
  - Achieved by installing two power supply units inside of a server or network device.
  - Using of a UPS, a backup generator, etc.
- Network connections
  - Maintain multiple network connections or pathways by using multiple cabled connections
- Backup servers
  - Servers and services can be configured to operate in a load balanced or clustered architecture
  - Helps prevent downtime in case of failure
  - If one instance fails, the remaining workload can be redistributed to the remaining backup servers
- Multiple service providers
  - To safeguard the organization against outages, use two or more service providers to ensure a constant backup
  - If one provider fails, the organization will be able to continue by using services from a secondary provider
  - Example:
    - Backup credit card processor
    - Having two domain controllers
      - A primary and secondary controllers
It's important that each design decision is considered to decide if redundancy is required for a particular piece of hardware.

Distributes data, applications, and services across multiple cloud providers
Mitigates the risk of a single point of failure
Offers flexibility for cost optimization
- Scaling operations and cost optimization
Aids in avoiding vendor lock-in issues
- Provides you with more options and leverage when it comes time to negotiate your service terms or if you need to migrate your services to another cloud provider due to an outage.
Requires proper data management, unified threat management, and consistent policy enforcement for security and compliance

Design a robust system architecture to achieve high availability
Utilize load balancing, clustering, redundancy, and multi-cloud approaches
Proactive measures reduce the risk of service disruptions and downtime costs
Safeguard organizational continuity and reliability in a competitive environment