Data center management is a form of science. There are a lot of variables, controls and factors to keep track of and act and react accordingly at all times. As data centers become bigger and more complex, managing their operations will be more of a challenge.
So, this is why we are entering a new “sub-era” for data center management. It’s the era of the AAA: AI, Automation, and Algorithms. This “AAA team” is going to have a vital role in the future and even now. It will be very difficult, almost impossible to run a modern data center, especially a hyperscale one, without applying all three technologies.
Yes, the “AAA team” is not three different names for the same technology. It’s actually three different technologies that can and should be used together to achieve the maximum potential of their capabilities and manage a data center the best possible way.
In order to do so, we need to get to know the three technologies better. When we know the differences, we can better understand for what use cases they are going to be most beneficial. So, let’s begin exploring.
What is AI in data centers?
AI in data centers is quite a vast topic. There are multiple vectors of this technology when it comes to data centers. For example, if the facility is simply providing the hardware resources for AI services which are going out to businesses and end users. Or if AI is actually used for the efficient operations of the data center.
And then we can have multiple AI versions, depending on the use. Generative AI for example can help with analyzing operational data and give insights and ideas to improve the overall work. There can also be predictive AI setups which are running models to calculate and offer various solutions or warnings depending on the outcomes. Here digital twins can be used to run a mirror of the data center in a virtual world and see how different changes will reflect the actual work.
What is data center Automation?
Automation in data centers is a bit different. There are no machine learning abilities or Large Language Models or any other AI abilities. Automation simply performs predefined tasks in a certain way. It may be sound crude, but it’s much closer to the basic “if this, then that” approach.
Automation is the oldest concept of the three as it’s the simplest. Of course, today, the instruction can be quite complex and are much more capable than ten or even five years ago. Automation is a key way for data center operators to optimize repetitive operations and free up resources and employees’ time to work on more important and/or specific tasks.
What are data center Algorithms?
Algorithms are technically the base of it all. Every instruction is an algorithm. But in this case it’s a term mostly for a set of procedures that have to be followed to achieve a certain goal. It can change and vary depending on the conditions and results of previous steps and procedures. Algorithms are more than software and can include different programs, actions, physical changes to setups and more.
How can the data center AAA team help?
As we can see, each member of the AAA team has a different role and abilities. While there’s a lot of common ground, the differences are key. Bringing them together opens a lot of opportunities to use them for their overlapping features and to complement each other for the areas they aren’t covering.
Using the AAA team is becoming a trend in the industry, DataCenterKnowledge reports. It gives an example of how to use them together: generative AI tools depend on algorithms to perform the training that allows them to simulate human interactions and reactions. The whole process can automate some of the tasks and AI can program that automation and then monitor it to make sure it’s performing as intended. It can then make changes if and when needed.
Implementing the AAA team in a data center can be a bit of a challenge. It will require a lot of effort and careful consideration of how to combine the technologies in order to achieve positive results. Otherwise, a lot of money, time and effort could go to waste. If that happens, companies most often blame the technologies. Data center operators can avoid this by preparing ahead of time and taking into account several key points.
Define the roles
The first step is to clearly define the goals and roles for each member of the AAA team. There’s no point to throw massive AI resources for workloads which can be simply automated. Or trying to automate tasks which will benefit of the flexibility and responsibility of AI.
So, how to define the roles? “AI can help monitor the health and configuration of the network, identifying anomalies and potentially taking corrective actions automatically… For the industry to deliver on the promise of a self-healing or self-correcting WAN, AI tools can help automate routine network operation tasks, set policies, measure network performance against set targets, and respond to and rectify the networks as needed,” says Marc Herren, network advisory director with the technology research and advisory firm ISG, to DataCenterKnowledge.
“Although human operators can more effectively triage complex and multi-step problems, AI is a powerful tool that can supplement the work of network engineers to add robust controls and automation to mature networks. AI should be considered an addition to a company’s network team rather than a replacement, accelerating the work of engineers and creating new efficiency improvements to developed workflows,” says David Brauchler, a principal security consultant with cybersecurity and software assurance services firm NCC Group.
As established, automation can help with solving most basic tasks. This is beneficial for the employees as it frees them up to work on other matters. Algorithms are like a happy medium between automation and AI.
Know the limitations
It’s tempting to rush into research of suitable services and solutions and start implementing the AAA team into the data center. But in reality we still have more preparation to do. And it’s an important step. It’s vital to discover the limitations of the AAA technologies.
For example, for AI there are three challenges that the data center operator must overcome before being “ready for AI,” says DataCenterFrontier. The first challenge is cooling. AI will require a lot of resources even if just for local use, and as workloads increase in intensity and quantity, cooling is becoming even more important.
“At some point in the not-distant future, heat fluxes for the most powerful processors will be too high to manage with direct air cooling. Simply put, air is not nearly as effective a heat transfer medium as liquid, and at some point, air is unable to remove all the heat generated by high-power chips, resulting in artificial performance limits or equipment damage,” says Stuart Lawernce, VP of product innovation and sustainability at Stream Data Centers.
Next is the reconfiguration of the facilities for liquid cooling. A lot of data center operators have already started to redo their facilities so that they can add this type of cooling. This can be an expensive task, but it will be needed in the long run anyway.
The third challenge is choosing the hardware, power and cooling specifications. “AI workloads perform optimally at densities of at least 20 kW per rack. But they’re not going to stop there. Density will keep increasing, rapidly and by huge margins, as each new generation of chips is increasingly power-hungry and IT infrastructure leaders are designing ever-denser configurations,” says Lawrence.
There are also some limitations for automation. Server deployment can’t be automated. That’s possible for virtual services, but for data centers we actually need someone to physically install the machines. The same goes for hardware maintenance. Someone has to go to the server and change or fix the faulty hardware. With that said, AI and algorithms could forecast possible issues and flag them ahead of time, thus limiting the possible damage and shortening down time.
Disaster recovery is another situation that can’t be automated. At least not fully. There’s sound reason to automate initial tasks, or use AI for faster incident detection and response. The main disaster recovery though has to be carried by the employees.
There are a lot of options and possibilities for the data center AAA team. By examining them carefully, data center operators can make the most out of these technologies and be ready for the next generation.