fbpx
The Stack Archive Feature

The long-term approach to high reliability in the data centre

Mon 18 Dec 2017 | Darren Hardy

Darren Hardy, regional senior service manager at DencoHappel, explains why equipment manufacturers are best placed to effectively maintain data centres’ cooling assets and advises on best operational practice to maximise the availability and reliability of critical IT services

Once a cooling plant has been installed in a data centre, responsibility for its upkeep is often handed over to an IT manager. The IT manager will then enter into a maintenance agreement with a building facilities provider often based solely on cost.

But the quality of the servicing and maintenance affects how well the equipment works, under both normal and unplanned conditions. What’s more, how the climate control systems are managed daily significantly impacts how well back-up plans can be executed.

Here are the reasons why it pays to work with a specialist manufacturer in the long term to keep indoor temperature and humidity levels within the recommended range, and ensure that these remain unaffected should other critical services fail.

Specialist servicing for reliability

It is important for servicing and maintenance technicians to have a detailed knowledge of how precision control units work

Although engineers who have attended short training programmes such as F-Gas courses can work with air conditioning and refrigeration systems, close control units in mission-critical environments are best maintained by manufacturer-trained engineers.

General air conditioning systems differ from computer room air conditioning (CRAC) used in data centres. The former is designed to deliver a comfortable interior environment in areas with high footfall. The latter’s sole purpose is to control a data centre’s indoor climate so that IT hardware can operate reliably and efficiently round-the-clock.

If it goes wrong, critical infrastructure can go offline resulting in major financial losses. This is why close control cooling is engineered to maintain temperature, humidity and air filtration levels within a much tighter range.

It is therefore important for servicing and maintenance technicians to have a detailed knowledge of how precision control units work, and to understand their impact in a mission-critical setting.

What’s more, products vary from one manufacturer to the next, which is why many develop detailed servicing and maintenance work schedules tailored to the needs of their ranges. It is crucial to pay attention to the following elements, which tend to get overlooked.

Refrigerant charges within CRAC units are critical and the old-school methods of simply charging to sight glass or gauges are long since passed. More robust methods including the reading of suction superheat, liquid subcooling and discharge superheat, throughout the compressor operating envelope, are all now required.

By correctly maintaining components that affect compressor operation, the cost of repairs, not to mention risk of downtime, can be minimised

Overcharging of these systems has become commonplace and is usually due to a lack of experience and knowledge. Failure to get this element correct can result in compressor failure.

Although sensors and control algorithms are now incorporated into most modern precision climate control products to protect them from such mistakes, refrigerant undercharging or overcharging will both reduce a compressor’s lifespan, therefore best avoided in the first place.

Cleaning condenser coils is an essential task in any planned, preventative maintenance regime. Depending on the type of dirt and contaminants that need to be removed, the coils can be cleaned by one of three methods: chemical cleaning, manual brushing or a pressure wash.

Despite the simplicity of this procedure, it is often overlooked by technicians who don’t fully understand the principles of close control cooling units and the role they play in a data centre. If left uncleaned, the coils will clog up, resulting in refrigerant high-pressure trips and ultimately, system downtime.

It may sound obvious, but parts like air filters and humidifier cylinders should be replaced regularly. It’s vital to change the former in its units twice a year, and the latter, depending on water quality, two to four times a year.

Old humidifier cylinders prevent optimum system performance, raising energy usage and costs. Clogged air filters can lead to system failures, which can be expensive to rectify. By correctly maintaining components that affect compressor operation, the cost of repairs, not to mention risk of downtime, can be minimised.

Redundancy strategy

In addition to proper maintenance, best practice should also be applied during normal operation of the cooling facility. This ensures backup equipment kicks in when needed, and protects any management systems that control the cooling from intrusions.

One or two additional units are often specified to provide back-up should any unit fail. To ensure that duplicate units can function and help meet the cooling load needed, these should be used regularly in the same way normal units are employed.

Non-IT infrastructure plays a crucial role in keeping a data centre operation and represents a significant proportion of hardware investment

This can be achieved by networking all the units so that the operator can set them up to work on a rotational basis and control changeovers. By spreading the load, the risk of breakdowns in the normal units can also be reduced.

To protect the network of cooling systems from cyberattacks, all units, including duplicate ones, should be linked to the data centre’s IT infrastructure. Some solutions can be connected to a building’s network via a pCOWEB card, which restricts access to the units’ IP addresses, and allows the data centre operator to monitor the system’s security.

If power supply to cooling equipment is interrupted, fans, compressors and humidifiers will stop working. Therefore, any cooling solution used in data centres should be able to switch to a different feed in case of mains power outage.

The best units have a remote start/stop signal that can be connected to a backup generator via the building management system. An auto transfer switch can also be built into the units so that they have dual power supply and can run on electricity from a UPS or generator if needed.

Non-IT infrastructure plays a crucial role in keeping a data centre operation and represents a significant proportion of hardware investment. Maintaining, servicing and running cooling facilities on the cheap may cost less in the short term, but will jeopardise the availability of both regular and redundant equipment in the long run, which could lead to much higher losses later down the line.

By working with a manufacturer from the start, data centre managers can be advised on best operational practices and have greater reassurance that redundant units can take on the required load during unplanned outages.


This post originated at Data Centre Management magazine, from the same publisher as The Stack. Click here to find out more about the UK’s most important industry publication for the data centre space.

 

Experts featured:

Darren Hardy

Senior Service Manager
DencoHappel

Tags:

data centre feature
Send us a correction about this article Send us a news tip