The data center industry continues to evolve with mergers, acquisitions, and a healthy crop of emerging companies. New data center products and services
are hitting the street, an aggressive debate on the model of selling space vs. power, and alternatives to physical data center space in the cloud are giving us a confusing maze of alternatives to meet our outsourcing needs.
The data center market is not unique. For example, in Southern California we have a wide variety of supermarkets and grocery stores including VONs, Ralphs, Albertsons, Jons, Trader Joes, Whole Foods, and lots of others. All grocery stores basically sell the same kinds of products, with very few exceptions.
What makes you go to VONs, rather than Whole Foods? Is it location? Prices? Image? A social issue?
The data center industry is not significantly different. In a city such as Los Angeles you have Equinix, Switch and Data, Savvis, BT Infonet, CoreSite, US Colo, Digital Realty, Level 3 – just to name a few. What makes one facility more attractive than another to fulfill your collocation needs?
Data centers, at the most common denominator, have traditionally offered:
- Concrete (space for cabinets, racks, cages, suites, etc)
- Power
- Air conditioning
- Interconnections
If all data centers offer the basic components listed above, then what discriminates the data centers from one another?
Now we can add additional alternatives to the basic data center model – the public cloud services provider/CSP and Software as a Service/SaaS.
As a potential data center tenant (this includes “virtual” data center tenants living in a CSP infrastructure) we have to evaluate all the above components, and determine which collocation or data center provider will best meet our facility, budget, and connectivity needs.
The Sense of Urgency
The CIO of the United States, Vivek Kundra, recently pressed the case for data center consolidation within the US government, as well as offering a strong recommendation that the US data industry strongly consider moving their operations into either consolidated data centers or virtualize within a cloud provider.
It is clear that data centers used by small and medium companies, as well as most content delivery companies, find better efficiencies in bringing their eCommerce and Internet-facing parts of their business into the data center, and locally interconnect with the Internet service provider community.
The cost of building a data center, providing staffing to manage the data center, and ensuring the efficiency of power and cooling usage is beyond the core competence of most companies. The need for disaster recovery plans, offsite storage, and other business continuity planning are just a few of the long list of items we need to consider as part of an overall information technology/IT or general business plan.
The potential waste of operational expenses, capital budgets, and resulting market “opportunity cost” justifies all companies at least consider outsourcing all or some of their IT operations – particularly as data center and CSPs increase their capabilities.
With the availability of netbooks, online applications (SaaS), and server-based office automation products, all companies should put this on their annual review list. Even the Los Angeles Police Department (LAPD) recently announced their decision to outsource the email to Google. This model does not appear to be going away anytime soon.
The “Selecting Your Data Center” Series
This series will walk through the process of identifying the need for outsourcing, identifying the best location for your data center, discriminating between the alternatives, and finally getting to your decision.
We welcome all comments, experiences, and discussions related to the data center community that would provide productive feedback for a potential data center or CSP tenant.
John Savageau, Long Beach
A 40 year old building with much of the original mechanical and electrical infrastructure. A 40 year old 4000 amp, 480 volt aluminum electrical buss duct, which had been modified and “tapped” often during its life, with much of the work done violating equipment specifications. With the old materials such as buss insulation gradually deteriorating, the duct expanding and contracting over the years, the fact aluminum was used during the initial installation to either save money or test a new technology vision – it all becomes a risk. A risk of buss failure, or at worst a buss failing to the point it results in a massive electrical explosion.
Sound extreme? Now add a couple of additional factors. The building is a mixed use-telecom carrier hotel, with additional space used for commercial collocation and standard commercial office space. This narrows it down to most of the carrier hotel facilities in the US and Europe. Old buildings, converted to mixed-use carrier hotel and collocation facilities, due mainly to an abundance of vacant space during the mid-1990s, and a need for telecom interconnection space following the Telecommunications Act of 1996.
Over the past four years the telecom, Internet, and data center industry has suffered several major electrical events. Some have resulted in complete facility outages, others have been saved by backup systems which operated as designed, preventing significant disruption to tenants and the services operated within the building.
A partial list of recent carrier hotel and data center facility outages or significant events include some of the most important facilities in the telecom and Internet-connected industry:
- 365 Main in San Francisco
- RackSpace hosting facilities in Dallas
- Equinix facilities in Australia and France
- MPT in San Jose
- IBM facility in NZ
- Fisher Plaza in Seattle
- Cincinnati Bell
And the list goes on. Facilities which are managed by good companies, but have many issues in common. Most of those issues are human issues. The resulting outages caused havoc or chaos throughout a wide range of commercial companies, telecom companies, Internet services and content.
The Human Factor in Facility Failures
Building a modern data center or carrier interconnection point follows a fairly simple series of tasks. Following a data center design and construction checklist, with strict compliance to the process and individual steps, can often mean the difference between a well-run facility and one that is at risk of failure during a commercial power outage, or systems failure.
In the design/construction phase, data center operators follow a system of:
- Determining the scope of the project
- Developing a data center design specification based on both company/industry standards
- Designing a specific facility based on business scope and budget, which will comply with the standard design specification
- Publish the design specification and distribute to several candidate construction management companies and engineering companies
- Use a strong project manager to drive the construction, permitting, certification, and vendor management process
- Complete systems integration and commissioning prior to actual operations
Of all the above tasks, a complete commissioning plan and integration test is essential to building confidence the data center or telecom facility will operate as planned. Many outages in the past have resulted from systems that were not fully tested or integrated prior to operations.
An example may be a breaker coordination study. This is the process of ensuring switch gear and panel breakers from the point of electrical presentation by the local power utility down to individual breaker panels are set, tested, and integrated according to vendor specification. Without a complete coordination study, there is no assurance components within an electrical system will either operate correctly during normal conditions, or operate correctly during equipment failures. An essential component of a complete systems integration test. Failure to complete a simple breaker coordination study during commissioning has resulted in major electrical failures in data centers as recently as 2008.
The InterNational Electrical Testing
Association (NETA) provides guidance on electrical commissioning for data centers under “full design load” conditions. This includes testing recommendations to test performance and operations including the sequence of operations for electrical, mechanical, building management systems/BMS, and power monitoring/management. The actual levels of NETA testing are:
- Level 1- Submittal Review and Factory Testing
- Level 2- Site Inspection and Verification to Submittal
- Level 3- Installation Inspections and Verifications to Design Drawings
- Level 4- Component Testing to Design Loads
- Level 5- System Integration Tests at Full Design Loads
No company should consider collocation within a facility that cannot produce complete documentation that integration testing and commissioning was completed prior to facility operations – and that testing should be at NETA Level 5. In some cases, documentation of “retro” testing is acceptable, however potential tenants in a facility should be aware that is still a compromise, as it is almost impossible to complete a retro-commissioning test in a live facility.
Bottom Line – even a multi-million dollar facility has no integrity without a detailed design specification and complete integration/commissioning test.
The Human Factor in Continuing Facility Operations
Assuming the facility adequately completes integration and commissioning at NETA Level 5, the next step is ensuring the facility has a comprehensive continuing operations plan to manage their electrical (and mechanical/air conditioning) systems. There are two main recommendations for ensuring the annual, monthly, and even daily equipment maintenance and inspection plans are being completed.
Computerized Maintenance Management System (CMMS)
Data centers and central offices are complex operations. Thousands of moving parts, thousands of things that can potentially break or go wrong. A CMMS system tries to bring all those components together into an integrated resource that includes (according to Wikipedia)
-
Work orders: Scheduling jobs, assigning personnel, reserving materials, recording costs, and tracking relevant information such as the cause of the problem (if any), downtime involved (if any), and recommendations for future action
-
Preventive maintenance (PM): Keeping track of PM inspections and jobs, including step-by-step instructions or check-lists, lists of materials required, and other pertinent details. Typically, the CMMS schedules PM jobs automatically based on schedules and/or meter readings. Different software packages use different techniques for reporting when a job should be performed.
-
Asset management: Recording data about equipment and property including specifications, warranty information, service contracts, spare parts, purchase date, expected lifetime, and anything else that might be of help to management or maintenance workers. The CMMS may also generate Asset Management metrics such as the Facility Condition Index, or FCI.
-
Inventory control: Management of spare parts, tools, and other materials including the reservation of materials for particular jobs, recording where materials are stored, determining when more materials should be purchased, tracking shipment receipts, and taking inventory.
-
Safety: Management of permits and other documentation required for the processing of safety requirements. These safety requirements can include lockout-tagout, confined space, foreign material exclusion (FME), electrical safety, and others.
And we can also add additional steps such as daily equipment inspections, facility walkthroughs, and staff training.
SAS 70 Audits
The SAS 70 Audit is becoming more popular with companies to force the data center operator to provide audited documentation by a neutral evaluator that they are actually completing the maintenance, security, staffing, and permitting activities as stated in marketing and other sales negotiations.
Wikipedia defines a SAS70 Audit as:
“… the professional standards used by a service auditor to assess the internal controls of a service organization and issue a service auditor’s report. Service organizations are typically entities that provide outsourcing services that impact the control environment of their customers. Examples of service organizations are insurance and medical claims processors, trust companies, hosted data centers, application service providers (ASPs), managed security providers, credit processing organizations and clearinghouses.
There are two types of service auditor reports. A Type I service auditor’s report includes the service auditor’s opinion on the fairness of the presentation of the service organization’s description of controls that had been placed in operation and the suitability of the design of the controls to achieve the specified control objectives. A Type II service auditor’s report includes the information contained in a Type I service auditor’s report and also includes the service auditor’s opinion on whether the specific controls were operating effectively during the period under review.”
Many companies considering outsourcing within the financial services industries are now considering a SAS70 audit essential to considering candidate data center facilities to host their data and applications. Startup companies with savvy investors are demanding SAS70 audits. In fact, any company considering outsourcing their data or applications into a commercial data center should demand to obtain or review SAS70 audits for each facility considered.
Otherwise, you are forced to “believe” the words of a marketer’s spin, a salesman’s desperate pitch, or the words of others to provide confidence your business will be protected in another company’s facility.
One thing to keep in mind about SAS70 audits… The audit only reviews items the data center operator chooses to audit. Thus, a company may have a very nice and polished SAS70 audit documentation, however the contents may not include every item you need to ensure the data center operator has a comprehensive operations plan. You may consider finding an experienced consultant to review the SAS70 document, and provide any additional guidance on whether or not the audit actually includes all facility maintenance and management items needed to ensure continuing protection from mechanical, monitoring/management, electrical, security, or human staffing failures.
Finally, Know Your Facility
Facility operators are traditionally reluctant to show a potential customer or tenant their electrical and mechanical diagrams and “as-built” documentation for the facility. This is the point you would find a 40 year old aluminum buss duct, single points of failure, and other infrastructure designs and realities you should know before putting your business into a data center or carrier hotel.
So, when all other data center and carrier hotel facilities appear equal, in geography and interconnections, look at facilities which will incur the least impact if your interconnections are disrupted, and demand your candidate data center operator and hosting provider are able to provide you complete documentation on the facility, commissioning, CMMS, and SAS70.
Your business, the global marketplace, and network-connected world depend on forcing the highest possible standards of facility design and operation.
John Savageau, Long Beach
Other articles in this series include: