Data Center Infrastructure Management (DCIM) Solutions

EastmanBy Robert Eastman

This article is part of Wood Harbinger’s newsletter series.

Put aside the capital expense of building your data center. Put aside the operational expense of energy consumption, system maintenance and refreshment, property maintenance and staffing. The cost of system or process outages or system downtime can be staggering, from both a monetary and reputation point of view. The Emerson-Ponemon Institute “2013 Cost of Data Center Outages” report estimates that an unplanned data center outage costs an average of $7,900 a minute. You can’t afford to lose.

No matter the size of the data center, systems and their parts must be reliable and available, must be efficiently operated, and must be managed effectively. Well-designed systems are important to business continuity, as are well-maintained and well-managed systems. Business continuity is paramount to success, whether you are the designer or manager of a server room or a small data center with space for 20-250 equipment cabinets, or are involved with a massive operation with thousands of equipment cabinets.

The depth and scale of resources available to efficiently maintain and support the operations of your data center may be different depending on size. That’s where data center management infrastructure (DCIM) solutions come in. The suite of stand-alone and integrated business tools that assist and support your data center’s operation can play a central role in the maintenance and management of your data center’s infrastructure as a whole and many of the systems deployed within.

Real-Time Information

DCIM is a holistic way of thinking about an on-going problem—balancing the data center’s mission critical components with efficient and economical application of the building systems that keep them in continuous operation: available capacity of cabinet/floor space, power systems and cooling systems. Reliable and available systems management is key, but so is efficient and sustainable building and energy management. The DCIM software integrates with existing monitoring and alert programs, allowing operators to collect, centralize, store and analyze real-time usage information to optimize decisions and actions for peak performance, efficiency and value.

Predictive Modeling for a Dynamic Institution

The operation of a data center is dynamic. Something is always changing, because your business environment is always evolving. At any given time on any given day, someone in a data center is making a decision or series of decisions on how to add, upgrade or replace some part of myriad network or information system components, like optical fiber cable patch panels, twisted pair cable patch panels, core switches, aggregation switches, access switches, power strips/PDUs, application servers or storage devices. The ability to successfully make the change will be predicated on the constraints of the building systems. DCIM solutions can give managers access to accurate, real-time system information that will help best plan for a successful implementation of a component change, proactively maintain systems before potential failures occur, and quickly respond to failures and outages for resolution and recovery.

Comprehensive Asset Management

A core element of DCIM is the ability for operators to build and maintain a database of the data center’s inventory of assets and accurately capture the physical and logistical attributes of each asset/device, including: manufacturer, model, component option or build, purchase date, installation date, cabinet location, rack unit positions, power connection (circuit, plug type, PDU port), electrical draw (wattage, amperage), network connection (quantity, switch port, LAN), responsible party, resident applications, service level agreement (SLA details), and so forth. Such detailed information allows operators to rigorously manage the collective system capacity of the data center–what can be added, deleted, upgraded, and replaced, and just as importantly, how the resulting changes can be managed. The authors of The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps note that 80% of all outages are caused by change, and that 80% of MTTR (mean time to repair) is spent trying to find out what changed. Using this statistic as a guide, it is critical to enact a change with a purposeful and documented process. Disaster avoidance is better than disaster recovery. For those outage/downtime events not associated with a deliberate change (equipment or support system failures and human errors), having the tools to quickly isolate an adverse condition and plan a recovery is valuable.

Real-Time Monitoring

Electrical and mechanical system components can be integrated with a DCIM solution, enabling the collection of real-time system usage and performance information. Trends are captured, analyzed and compared to established usage and performance metrics. Variances in the ranges set in the metrics are monitored with alerts of alarm conditions sent to the appropriate decision makers as usage thresholds are approached and reached. All of this can be managed from dashboards installed on Operation Center workstations or even from managers’ mobile devices. Monitoring can be localized to a system, a room, a specific area, a certain row or a single cabinet. What you can measure can be monitored and managed, making it possible to predict and plan for the impact of potential changes to the business.

Governance and Compliance

Even a data center that has been operated, managed and maintained without a memorable outage or failure incident will be scrutinized when an outage does occur. Business leaders will critically review the event and ask why it happened, and why it was allowed to happen. DCIM solutions provide a framework for data center governance and documentation for data center governance compliance. An organization’s reputation can be damaged by a system outage with potential monetary losses especially if the outage is allowed to linger. Depending on the circumstances, an organization could face regulatory review and fines. A DCIM system aligned with formal system operational policies and procedures can provide a defense against inquires and complaints.

The end goal of data center infrastructure management is streamlined operations, energy efficiency and reduced downtime risk through proactive and accurate monitoring, asset information management and predictive modeling. Data centers, and the devices and systems they support, are becoming more critical and more pervasive in our increasing technology-driven environment. Mitigating downtime by eliminating vulnerability is priority number one and requires a committed investment in measures to achieve it. DCIM solutions provide a solid and scalable option to ensure the maximum uptime for a data center of any size.

This entry was posted in All Engagements, E-Newsletter and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>