Data center operators face a multitude of challenges that they must solve on a consistent basis. One of them is the dreaded cooling. It’s one of the most important aspects of data centers as it has the biggest impact on their overall performance and costs.
It’s also among the most complex, energy intensive and expensive part of data centers. Especially considering that the computing demand is constantly rising, along with rack density which means a lot more heat generation and less free space to help with cooling.
Creating and installing efficient cooling in data centers is not an easy task. This is why not only the entire industry is working to develop better solutions, but also big government institutions around the world are helping. This is where the COOLERCHIPS initiative comes in. It’s a project by the Department of Energy of the USA which can and most probably will have implications for the global data center industry. Let’s explore more about what the project’s about and some of the interesting innovations it’s already giving us.
What is COOLERCHIPS?
Are you ready for this? COOLERCHIPS is an Advanced Research and Projects Agency–Energy (ARPA-E) program that stands for Cooling Operations Optimized for Leaps in Energy, Reliability, and Carbon Hyperefficiency for Information Processing Systems. Now that’s a name, right? It probably took them a lot of meetings and brainstorming to come up with that acronym but in the end it was worth it.
In the end of June 2023, the Department of Energy announced a group of 15 projects which received a total of $40 million to fund their data center cooling technologies over the next one to three years. All of these projects seek to achieve a minimum of tenfold improvement in cooling efficiency. Yes, that much. We are talking about massive improvements and basically near moonshot projects in terms of expectations, but all of them are very, very real.
Cooling is the biggest energy consumption in a data center. It consumes about 40% of all the electricity a data center requires and data centers already consume about 2% of the total electricity consumption in the US. Naturally, COOLERCHIPS aims to reduce this figure, especially considering the incoming widespread adoption of artificial intelligence (AI) which is expected to dramatically increase data center usage and thus heat generation. As a result, AI will lead to bigger computing demand, more cooling and bigger energy consumption along with big bills.
Servers running hot has been a challenge even before AI, Forbes notes. For example, processor thermal designs are expected to reach 500W by 2025. GPUs on the other hand are already approaching 700W. The increased need for more and more cooling is becoming the main bottleneck to achieve maximum power and efficiency. So, good cooling solutions are much, much needed. And COOLERCHIPS aims to provide them within three years. If that does happen, it will be a much-needed breath of fresh air for data centers which are already running quite hot. However, it’s also important that those solutions are sustainable.
The 15 chosen projects are focusing on different ideas, solutions and areas of cooling. Together they should help solve most use cases and give data center operators multiple choices to mix and match according to their needs. It’s definitely a very ambitious goal to completely revolutionize data center cooling while at the same time addressing sustainability issues and costs. So, let’s see some of the ideas.
Nvidia is again very ambitious
The 15 grant recipients are companies and institutions with various sizes and capabilities. Among them are big names like Intel and Nvidia – and while most people think about Nvidia as the maker of video cards, over the past few years the company has been expanding into multiple segments, including cloud, AI and data center technologies. The company’s CEO, Jensen Huang, even says he wants Nvidia to become a data center company. He envisions the data center to become a new computer unit just like a CPU or a GPU are.
So, it should be of no surprise that Nvidia is not only among the grant recipients, but it is also the holder of the largest grant – $5 million. Nvidia has devised a new approach to use two well-known cooling methods: direct liquid cooling and immersion cooling. Both are used widely but they are far from being fully utilized. There’s still a lot more both methods can give and Nvidia has a new idea on how to achieve that.
The Nvidia COOLERCHIPS idea is to take the best of both methods and blend them together. So, it will take a traditional direct liquid cooling system which is attached to the CPUs and accelerators in the system, but at the same time, it also hermetically seals the entire server and fills it up with cooling liquid. After you add a few connectors, cold plates, and flow regulators, you basically have a submerged server that is also directly cooled.
There have been other similar solutions, but their setups were much bigger and needed wider footprints. Nvidia’s solution keeps the size to a minimum, so the double-cooled server can be placed in a traditional rack with no further modifications to it.
Nvidia is working with several technology and research partners to achieve its idea. Each of them specializes in a specific area. Boyd Corporation, for example, provides cold plate technology. Durbin Group oversees the pumping system, and Honeywell is helping with cooling fluid selection, etc.
Nvidia has set up a precise roadmap for the development of the project. It aims to achieve a milestone each year: the first year is to finish the tests of components, the second year will see the creation of a partial rack and evaluating it, and at the end of the third year, we should have a fully built and tested system. That solution will be used for a data center within a mobile container and should provide sufficient cooling for environmental temperatures reaching up to 40C.
Intel inside, vapor outside
As mentioned, Intel is among the other grant recipients. The company’s idea for COOLERCHIPS earned it a grant totaling $1.71 million. It features a two-phase immersion cooling system which will use 3D printing to create some of the components.
3D printing allows for the creation of more complex and nature-like structures, and it can be cheaper. The nature-inspired structures can be much more efficient for cooling, so, Intel will create a coral-like heat sink on the inside of a 3D vapor chamber. “The team will use computational methods to identify the optimal design for the coral-shaped heat sinks. (For comparison, today’s heat sinks are typically made of long, parallel ribs.),” Intel says. The said chamber will also feature new coatings which reduce thermal resistance and improve the overall cooling.
The setup is then immersed in water and the combination of the chamber, 3D structures and coatings will result in water vapor. This will help take the heat away from the chips and quickly transfer it into the outside air. There, additional cooling solutions can help. Overall Intel aims to achieve a significant reduction of energy consumption from 0.025 degrees C per watt to less than 0.01 C/watt.
Naturally Intel says the main benefit of the solution will be the continuation of Moore’s Law. Better cooling means more cores and transistors for processors. “Immersion cooling is used for its simplicity, sustainability and ease of upgrades. This proposal will enable two-phase immersion cooling to align with the exponential increase in power expected by processors over the next decade,”says Tejas Shah, principal engineer and lead thermal architect for Intel’s Super Compute Platforms Group.
“To meet the growing demands for computing capacity and performance, future data center processors are expected to require power in excess of 2 kilowatts (kW), which would be challenging to cool with existing technologies. (Today’s most powerful chips are fast approaching 1 kW of power use.) The cooling solutions developed through the program will enhance the capabilities of Intel’s processors and those produced through Intel Foundry Services, enable the continuation of Moore’s law and further Intel’s commitment to energy efficiency and sustainable solutions,” Intel adds.
The company is not working alone on the project. It’s collaborating with several academic and industry partners who will handle different aspects of the work. Intel will oversee the effort, provide test configurations and define the form factor and constraints along with hot spot locations.
More cool ideas
Other companies also have some good ideas to cool data centers. HP is working on a liquid cooling solution to reduce thermal resistance. It will allow heat to dissipate into external air even at 40 degrees Celsius and 60% humidity.
Flexnode is developing a modular data center which also uses liquid cooling solutions including a new microchannel heatsink and a hybrid immersion cooling approach. The goal is to reduce costs and add a dry cooling heat exchanger system.
Next, the University of Texas at Arlington is working on a hybrid cooling technology for high-power data centers. The project features a direct-to-chip cooling module and teams it up with an air-cooling heat exchanger. The team says this approach is robust, easy to extend and to retrofit to legacy data centers, as well.
All these projects are described in short here (PDF). They feature solutions for brand new data centers, for old ones, for edge setups, modular ones, etc. Basically, the 15 projects cover all types of data centers and cooling scenarios. Hopefully if not all of them, at least the majority will be completed successfully and in time, as all data centers certainly already need them.