Misuse of the term “fault tolerant” is laying companies open to business risk and financial losses from system downtime.
This is according to Stratus Technologies, which reveals that in separate research conducted by TheInfoPro and by ITIC/Stratus Technologies, the absence of clear definitions and interchangeable use of terms that specify levels of uptime is pronounced throughout the IT community.
Confusing vendor claims and the industry’s incorrect use of terminology are major contributing factors, Stratus asserts.
Stratus defines true fault tolerance as the term was originally understood and measured by a user community that demanded the highest possible uptime protection for their mission-critical applications. This definition is: fault tolerant computing is the ability to provide highly demanding enterprise application workloads with 99.999% (five nines) system uptime or better, zero failover time and no data loss.
Anything less, the comany says, is high availability computing at best, which is suitable for meeting uptime requirements of many less critical applications. However, applications that can damage a company’s external reputation, cause compliance violations, or result in unacceptable financial cost or life safety risk should they fail typically require better uptime than high-availability solutions can deliver.
It stresses that fault tolerant computing is not: failover; hot standby; replication; mirroring; or recovery.
TheInfoPro’s September QuickTip report “Users Demand High Availability, But How Good is ‘Good Enough?” included results from its most recent server study which reveal that almost 60% of IT users don’t understand the difference between availability, high availability and software-based fault tolerance – none of which meet the definition of fault tolerance, and hardware-based fault tolerance.
“When users discuss cloud computing or virtualisation, one benefit consistently mentioned is ‘improved availability'. This may be expressed either as fault tolerance, disaster recovery or high availability. Each of these is technically very different from the other, and they are achieved using very different solutions,” says TheInfoPro MD Bob Gill.
“In other words, there is significant confusion in the marketplace, as organisations want higher levels of availability, but are not always clear on the best method to meet the objective because a number of different terms are often used interchangeably with no commonly accepted definition.”
A Stratus Technologies-ITIC survey of almost 250 IT professionals bears out that confusion – 53% of respondents say they are using fault tolerant technology, yet an almost equal number define less than 99.999% system availability as fault tolerant. With currently available technology, only fully redundant hardware running in lockstep can provide true fault tolerance, with five minutes or less of unscheduled downtime per year, no failover and no data loss on x86-based systems.
“When customers buy what they think is a fault tolerant solution, they should expect a system that, for all practical purposes, has no unscheduled downtime,” says Dick Sharod, African regional director at Stratus Technologies. “Clusters, virtualisation and software-based availability solutions require failover recovery, which means data loss, or they can’t scale to support a demanding application and maintain transactional integrity. All this leaves many customers confused as to what they are getting or what they should expect to get when someone says ‘fault tolerant’.”