High Availability
(OBJ 3.4)
High Availability Basics
- High Availability
- Aims to keep services continuously available by minimizing downtime
- Achieved through load balancing, clustering, redundancy, and multi-cloud strategies
Uptime and Availability Standards
- Uptime
- The time (minutes/hours) a system remains online over a given period, typically expressed as a percentage
- Five nines
- Refers to 99.999% uptime, allowing only about 5 minutes of downtime per year
- This is very little downtime already
- Six nines
- Refers to 99.9999% uptime, allows just 31 seconds of downtime per year
- Although most organization realistically need more downtime per year in order to implement upgrades and security patches
Load Balancing
- The process of distributing workloads across multiple computing resources
- Optimizes resource use, throughput, and minimize response time
- Prevents overloading of any single resource
- Load balancers achieve this by using complex algorithms to distribute incoming requests to capable servers
- Example:
- All requests go to a load balancer first, and then it will redirect that user's request to one of the three servers in order to respond to the request in a more timely manner.
Clustering
- Uses multiple computers, storage devices, and redundant network connections as a single system
- Provides high availability, reliability, and scalability
- Ensures continuity of service even in case of hardware failure in order to ensure there are no single points of failure
- Can be combined with load balancing for robust solutions and maintaining higher levels of availability
Redundancy
- Involves duplicating critical components to increase system reliability
- Redundancy can be implemented by adding multiple
- Power supplies
- Network connections
- Servers
- Software services
- Service providers
- Prevents single points of failure in systems
- Examples
- Redundant power supplies
- The failure of one power source will not affect the continuity of your other services
- Achieved by installing two power supply units inside of a server or network device.
- Using of a UPS, a backup generator, etc.
- Network connections
- Maintain multiple network connections or pathways by using multiple cabled connections
- Backup servers
- Servers and services can be configured to operate in a load balanced or clustered architecture
- Helps prevent downtime in case of failure
- If one instance fails, the remaining workload can be redistributed to the remaining backup servers
- Multiple service providers
- To safeguard the organization against outages, use two or more service providers to ensure a constant backup
- If one provider fails, the organization will be able to continue by using services from a secondary provider
- Example:
- Backup credit card processor
- Having two domain controllers
- A primary and secondary controllers
- Redundant power supplies
- It's important that each design decision is considered to decide if redundancy is required for a particular piece of hardware.
Multi-Cloud Approach
- Distributes data, applications, and services across multiple cloud providers
- Mitigates the risk of a single point of failure
- Offers flexibility for cost optimization
- Scaling operations and cost optimization
- Aids in avoiding vendor lock-in issues
- Provides you with more options and leverage when it comes time to negotiate your service terms or if you need to migrate your services to another cloud provider due to an outage.
- Requires proper data management, unified threat management, and consistent policy enforcement for security and compliance
Strategic Planning
- Design a robust system architecture to achieve high availability
- Utilize load balancing, clustering, redundancy, and multi-cloud approaches
- Proactive measures reduce the risk of service disruptions and downtime costs
- Safeguard organizational continuity and reliability in a competitive environment