你好!欢迎来到深圳市品慧电子有限公司!
语言
当前位置:首页 >> 技术中心 >> 传感技术 >> Product VP at Cloudcy.cn Introduced How AIOps and Observability Streamline Cloud-native Oper 原创 精选

Product VP at Cloudcy.cn Introduced How AIOps and Observability Streamline Cloud-native Oper 原创 精选


While cloud computing brings a?number of benefits such as intensification, efficiency, elasticity, and?business agility, it also poses unprecedented challenges to the field of cloud?operations. In this regard, adapting to new technology trends, creating an?intelligent monitoring platform in the cloud era, and achieving better?protection for cloud-based applications have become imperative for enterprises?today.

In this article, we invited Mr.?Zhang Huaipeng, product VP of Cloudcy.cn, to share his experience and expertise?on what it takes to build a digital observation tool in an era of cloud computing.


Operational challenges under digital transformation

We live in a digital age where?digitalization influences all aspects of our daily lives, including how we?work, consume, shop, and travel. It is fair to say that now we have shifted?from the IT era into the "DT" (digital transformation) one.

Since the advent of digital?transformation, enterprises and their customers have experienced a fundamental?change in how they conduct business. It should be noted, however, that as the?digital revolution continues to advance across a range of industries,increasing numbers of accidents have been reported related to digital?applications.?

A recent survey reported that 60% of?CEOs believe digital transformation is essential, and businesses should make?significant progress towards digital transformation and artificial intelligence?evolution. Conversely, 95% of enterprise applications are not monitored?effectively and may pose some problems.

It is important to note that most of?the current digital operation tools were developed during the era of?traditional data centers, and a significant number of tools and technologies do?not consider cloud computing scenarios. Due to the popularity of cloud?computing, the information technology scenario has dramatically changed.Increasingly complex, distributed, and dependent applications are evolving at a?rapid pace. Thus, enterprises must develop DT-based solutions based on business?and data flows to succeed in a competitive market.

Cloud-native technology is among the?many new technologies and scenarios emerging during the DT era. With the?introduction of cloud-native technologies, operation manners have evolved at an?accelerated pace. Traditional scenarios have a large amount of physical?infrastructure, which may require enterprises to care about conventional server?room management, weak power management, hardware monitoring of bare metal, UPS?power distribution, and temperature & humidity.

However, with business in the cloud,infrastructure will be managed by operators or providers, so enterprises will?no longer need to worry about these issues.

Thus, the traditional equipment?operations have evolved into site reliability, which means that enterprise?investments in old-fashioned operations will diminish over time.

Currently, we are undergoing a?transition to AIOps. Now is the time to make digital operations and IT?operations more lightweight, more efficient, and less costly. Operations teams?must focus on the enterprise business, which is the key to success.


The road to AIOps for enterprises

What is AIOps?

Defined by some research services?such as Forrester and Gartner, AIOps is a software system that uses artificial?intelligence and data science in business and operations to establish data?correlation and provide real-time prescriptive and predictive guidance. As a?software system, AIOps can be used in a commercial product. AIOps may enhance?and partially replace traditional critical IT operations functions, such as?monitoring availability and performance, organizing and analyzing events, and?automatically managing IT services.

AIOps, as the name suggests, is?concerned with operations, such as observation, management, and disposal. As?Forrester suggested in their reports, AIOps promises greater observability and?stability, both of which are important in this field.

According to Forrester, one of the?core values of current AIOps is the enhancement and extension of ex-ante?capabilities.

What is?observability?

It was in the field of Cybernetics?that the term "observability" was first used, and it defined a?system's output to be used to infer the internal state of the system. IT?research firm Gartner defines observability as a characteristic of software and?systems. In particular, it refers to the ability to determine a system's?current state and condition based on telemetry data it generates.

Why is observability a key concept?

Observability is essential to?improving the control of complex systems. Traditional monitoring techniques and?tools have difficulty tracking communication paths and dependencies of today's?increasingly distributed architectures. Meanwhile, cloud-native or cloud-based?application dependencies are much more complex than those in traditional?monolithic applications. In addition, the three pillars of observability?facilitate an intuitive understanding of all aspects of the complex system.

Department of Ops, Development, SRE,Marketing, and Business could benefit from observability. Therefore, if AIOps?and observability can be integrated into one integrated platform, we will?receive a perfect product and will be able to accomplish two things at once.


The paths to AIOps for enterprises

AIOps can be achieved exogenously?and endogenously in enterprises. An exogenous AIOps platform is integrated into?an IT operations environment as a sidecar platform. In this case, AIOps is an?independent algorithm platform that accesses heterogeneous data from various?sources. Data engineers process the data using big data analytics to resolve?interdependencies between data sources and produce project-based results.

While endogenous AIOps emphasize the?integrated technology route. It can facilitate the closing loop of the whole?data processing without involving data engineers. This is like sending a?courier package, but with data as the "items". Data is encapsulated,stored, scheduled, and transported by the "courier", eliminating any?need for the sender or final recipient to be involved in these tasks.Endogenous AIOps emphasize this capability by embedding AI capabilities into?one single integrated observation platform.

Exogenous vs. endogenous AIOps in technological implementation

Generally, exogenous AIOps use?traditional machine learning techniques. They are essentially statistical?approaches that correlate and analyze information such as metrics, logs, and?events to reduce alert noise. Machine learning enables us to obtain a set of?correlated alerts, which requires a specific period. Exogenous AIOps need?manual or historical records to establish a recommended or probable root cause.

Meanwhile, exogenous AIOps require a?large amount of external data dependency, and vendors usually only design their?algorithms. Data cleansing, dependencies between entities within a CMDB, etc.,all rely on external data. Exogenous AIOps, therefore, require a mature?information technology operations system, products with APM, the prerequisite?of calling data, and excellent observability.

Endogenous AIOps provide a?deterministic AI analysis with deterministic results being the targets.Therefore, the root cause of the problem is deterministic in real-time?following the occurrence of the problem. Endogenous AIOps maintain a matrix?dependency map in real-time. The technology does not depend on a static CMDB?but rather on a dependency map that acts like a real-time CMDB, allowing the?dependency to be changed in real-time and management analysis to be carried out?by an endogenous relationship.

How to choose your?technological pathway?

For managers, the trade-offs between?cost, stability, and efficiency must be considered along with fundamental?issues such as cost and team. Using AIOps can solve problems rationally,optimizing the stability and efficiency of enterprise business and keeping?costs to a minimum.

According to a report from?Forrester, enterprises should focus on the following key aspects when?implementing AIOps:

?Whether?the AIOps platform integrates seamlessly with the ITOM toolchain and is highly?automated;

?AIOps?will place great emphasis on native data, including cloud-native dependencies?and machine data;

?An automated and comprehensive mapping?of full-service dependencies;

?AIOps?will require intelligent observation awareness and automation implementation;

?Automating?the analysis of root causes and the planning of remediation for incidents;

?Technical?operations today require intelligence and automation.

Data processing differences:

Traditional AIOps?platforms(exogenous AIOps platforms) have used various tools to build up a?rickety big data processing system. In this system, team members are likely to?leave new employees with a significant amount of technical debt following their?resignation.

Data collection begins with the use?of a variety of open source and commercial tools.

After the data has been collected,the next step is to inject it into the big data platform.

The data relationships will be?manually sorted and cleaned. These require a lot of time and effort.

Identify the issues in which the?AIOps vendor would be involved in the field. Vendors will ask for?specifications and provide services following those specifications.

Develop the dashboard.

Scale up the system. The system?will, therefore, grow linearly as the application system scales.

It is common to see data engineers?spend almost 80% of their time cleaning, collecting, and organizing data. The?process requires cutting-edge ops talents and an understanding of ops,algorithms, and development. Meanwhile, AIOps is a tool that helps solve?problems, but exogenous AIOps may increase ops workload and require a dedicated?maintenance team.

As for endogenous AIOps, their data?processing is very simplistic, and one tool can handle all aspects of the data?collection. As a highly commercialized product, endogenous AIOps have?out-of-the-box capabilities, such as a dashboard, engine, etc., and do not?require business engineers to understand algorithms or SRE.

In addition, as the enterprise?business system scales, endogenous AIOps will grow non-linearly. It is?important to note that the entire system, including the user's team and the?product, increases non-linearly. Using Cloudcy as an example, once the solution?has been deployed, enterprises only need to install Databuff OneAgent, and many?of the subsequent tasks can be automated. Consequently, operations personnel?can devote their attention to the enterprise's core business.

Rather than presenting raw data, the?industry requires a new generation of software AI platforms that can cover the?entire data processing process. AIOps, which belongs to the new paradigm of?AIOps, is recommended over the two paths of exogenous and endogenous AIOps.


Endogenous AIOps facilitate cloud-native operations

The objective of the endogenous?AIOps platform is to build an integrated platform that combines AIOps and?observability. For it to be observable, it needs to be centered on application?monitoring, which is the phenomenon layer for end users. Meanwhile, it is?necessary to integrate infrastructure monitoring, including monitoring of cloud?platforms as well as black boxes. Lastly, it is essential to provide a digital?experience that is focused on the front end.

The new AIOps platform should?provide continuous automation from data access to results output. In addition,it needs to be capable of predicting and warning.

AIOps platforms must provide?high-level observability to enterprises, not just raw data and raw parts, but?focus on phenomena and experience and provide accurate results to minimize the?impact of massive noise on enterprises.

An endogenous AIOp can have many?different data processing models, such as the strength of Databuff OneAgent to?the data collection process. Data processing emphasizes the metrics system, and?our implementation of the metrics system differs from the traditional AIOps?platform, resulting in a true endogenous AIOps.

Endogenous AIOps platforms will?simplify cloud-native operations in the following five areas:

?Direct?access to high-quality observation data;

?Developing?continuous automation that is more efficient for operations;

?Platforms?can construct real-time matrix topologies for querying;

?An?instant assessment of the impact surface;

?Disclose?root causes to prove results.

1.Directly access to high-quality
observation data

High-quality back-end analysis?requires high-quality front-end telemetry data. Tracking data, indicators, log?data, and critical topology and code data are essential for high-level?observability and endogenous AIOps analysis. The data quality directly?indicates how high a model can go.

Monitoring data that can be directly?accessed must be non-invasive, automatedly collected, related to business and?applications, a combination of context and automation, and without modifying?source code. Context is an indispensable component of real root cause analysis.It can help extract accurate background information and help the platform build?real-time service flow and topology diagrams for dependencies, including matrix?relationship topology.

These diagrams display the?dependencies of the application environment, primarily in the form of vertical?and horizontal stacks. A service flow diagram provides an overview of the?entire transaction from the perspective of service or request. A service flow?diagram or topology diagram can demonstrate the calling sequence among?services. The service flow diagram illustrates all transactions orderly,whereas the topology diagram represents dependencies more abstractly.

While there are already many?open-source and free powerful monitoring tools on the market, commercialized?Databuff OneAgent technology has several advantages that open-source tools?lack, including:

?Agent?probes collected are guaranteed for stability, security, and reliability;

?Ensured?resource overhead and performance impact on core business servers;

?A?reduced amount of manual labor is required for deployment, insertion, and?changes;

?Dynamic?methods and container classes can be automatically monitored

?High?fidelity native sampling of metrics;

?Obtaining?sufficient information and context to construct a unified data model.

These are some of the advantages?that many free tools don't have. Endogenous AIOps rely on OneAgent technology,which is designed to perform a great deal of aggregation and cleaning work at?the endpoints using edge computing.

2.Continuous
automation

Endogenous AIOps platforms are?designed to enable continuous automation, which is essential to the monitoring?of cloud-native environments. This includes automating deployment, adaptation,discovery, monitoring, injection, cleaning, etc. It isn't easy to understand?the end-to-end business process in the complex cloud-native environment with?human intelligence, so automated operations are necessary as an additional tool.

3.Construct a real-time matrix diagram

Endogenous AIOps platforms are?capable of building real-time topology matrices. You can check the diagram?horizontally to view the dependencies between the service, container, host, and?process levels. The vertical side of the table displays which container the?service is running on, which process this container corresponds to, and on?which cloud host the service is running.

4.Instant assessment of the impact surface

It is similar to the analysis?performed on network security but pertains to the operations. If a system?failure occurs, the team should analyze which users, services, and applications?are affected and what is the root cause of the failures. By automating the?process, users can view the results without performing manual analysis.

5.Disclose root causes to prove results

Lastly, it is vital to identify the?root causes of the problem to prove the results. AIOps offers a solution based?on endogenous root cause location instead of traditional methods such as?knowledge base, CMDB, and causal inference. In addition, it can bridge data?dependencies between objects and data types, such as call chains, logs, and?metrics. With low overhead, it provides a real-time root cause location with?high adaptability and high accuracy. Furthermore, its unsupervised learning?capability requires little human intervention.


Conclusion

For digital transformation to be?successful, enterprises must ensure that all applications, digital services,and the dynamic multi-cloud platforms that underpin them work flawlessly.

Compared to traditional scenarios,these highly dynamic, distributed, cloud-native technologies present different?challenges. Micro-services, containers and software-defined cloud?infrastructure contribute to the current complexity. These complexities far?exceed the capacity of a team to manage, and they are growing exponentially.Therefore, it is necessary to increase observability and AIOps capabilities to?remain abreast of the changes in these rapidly changing environments.

Cloud-native operations must be made?lightweight, more efficient, and less costly using highly automated and?artificial intelligence technologies so that enterprise teams can focus on the?core business and truly transition into the era of AI-assisted operations.


Guest Introduction

Mr. Zhang Huaipeng is the Product Vice President?for Cloudcy.cn. He joined the company in 2017 and is responsible for the daily?management of the DataBuff Integrated Observation and AIOps product line. As?manager of the IPD integrated product development team, he is in charge of?market management, requirements analysis, team collaboration, process?structuring, quality assurance, etc.

相关文章

    用户评论

    发评论送积分,参与就有奖励!

    发表评论

    评论内容:发表评论不能请不要超过250字;发表评论请自觉遵守互联网相关政策法规。

    深圳市品慧电子有限公司