Wednesday, January 14, 2015

Azure Operational Insights (AOI): What Does The MMA Do? The Curtains Partially Unveiled

Why this posting?
Azure Operational Insights (AOI) (still in ‘beta’ AKA preview mode) is a cloud based monitoring solution with it’s own mode of operation.

Even though I do see the added value of a service like this, IMHO more transparency is needed about WHAT is exactly happening on your servers and HOW when using AOI. Simply because no matter how you look at it, sensitive data from your servers, services and so on is collected and sent over the internet to the Azure cloud where it’s processed to be presented in your own ‘partition’ of the AOI console.

Therefore the deeper the insights, the better the understanding of all mechanisms in place and the easier it becomes to sell this solution.

So far the information provided by Microsoft is high level about what’s really happening. Yes, we know an Agent is required, internet access, a Workspace ID and a Workspace Key. But how about the details?

Information already available
Of course, Microsoft doesn’t leave you totally in the dark. On this MSDN website you can find a lot of information about Operational Insights. For instance, on this webpage Microsoft explains high level how this Azure based service collects data and secures it all. This picture is a high level overview, found on the same webpage:
image

  1. You sign up for AOI, install the Agent and data collection starts. Whether the Agent is directly connected to AOI or connects through your on-prem SCOM solution, the data collection process is the same;
  2. The collected data is send to the Operational Insights service, using a secure channel (HTTPS, port 443);
  3. The data received by the Operational Insights service is validated in order to check whether it comes from a trusted source and it isn’t tampered with. When all is okay the unprocessed raw data is stored as a blob in Azure Storage. The raw data is processed by the Operational Insights service and stored as aggregated data in a Azure SQL database. Now this data is available for you in the Operational Insights console;
  4. By accessing the AOI portal (AKA ‘console’) and logging on, you can view the data. For this connection a HTTPS connection is used as well.

Security is paramount for Microsoft
As you can see, security is paramount for Microsoft. So certificates are used for SSL transmission and authentication as well. On top of it all Microsoft provides this PDF document, stating how your data is securely managed. This document can be easily found by clicking the link System Security found on the main page of AOI:
image

Still ‘some’ questions are left unanswered…
But how about the data collection process itself on your servers? How about the sending of the collected data? How many times does that happen? And how? How about those certificates? Where do they come from and what do they look like? How about those Intelligence Packs? Where are they stored? How about the events? How about the registry?

And these are just some of the questions I’ve got. For this posting I’ve decided NOT to contact the people at Microsoft. Of course, I can do that (and will do so later on) but as a MVP I am bound to a NDA which I take very serious. So when I would get answers from Microsoft changes are it will be a mix of ‘share and DON’T share’, resulting in a blog posting with a real change of breaking that NDA by accident.

Therefore I’ve decided to investigate by myself what happens on any given server which isn’t monitored by an on-premise SCOM environment (just to make things easier in order to see what Azure Operational Insights does) and connects to AOI.

So by using some basic investigation methods AND some good guessing this posting came to be. So changes are that some items aren’t spot on. Feel free to comment on this posting and to point out those items OR to add your findings as well. I’ll update this posting accordingly.

Let’s start!

Information which I found
As stated before, the server I am going to connect directly to AOI isn’t monitored at all by an on-premise SCOM MG. So there is no Microsoft Monitoring Agent in place. The server is running the latest & greatest Windows Server OS, WS 2012 R2.

  1. Certificate
    Before the installation there isn’t a certificate present on the server, in the Local Computer certificates store:
    image

    When the installation of the Microsoft Monitoring Agent (MMA) is finished (AND the MMA is configured for connecting with Azure Operational Insights), a new self-signed certificate is in place:
    image

    When looking at that newly created certificate I’ll see that’s not a plain self-signed certificate but CA Root certificate(!):
    image
    Since it’s placed in the wrong store (Personal and not Trusted Root Certification Authorities), the above yellow highlighted error statement is shown. The validity of this certificate is three months, counted from the date the MMA is installed, see red circle.


    A small side step:
    When exporting this certificate in the PKCS #7 format image
    and importing it in the Trusted Root Certification Authorities store,
    image
    the certificate in the Personal store looks to be healthy:
    image


    This certificate is issued by the MMA itself with Microsoft as organization:
    image
    The first CN is the GUID assigned to the server on which the MMA is installed and the second CN is the Workspace ID provided by Azure Operational Insights when you sign up for that service.

    This certificate is used for communication with AOI. It encrypts the data send to AOI and it authenticates the sender as well.

    So far the certificate itself. I am anything but a PKI specialist but questions I do have so far are:
    - Why create a self-signed CA ROOT Certificate and not a self-signed basic SSL certificate and import the required root certificate as well?
    - When a self-signed CA Root Certificate is required, which eludes me, why not import it in the proper stores so the certificate shows no error states?


  2. Registry
    Of course, when a MMA is installed, many entries are added to registry as well. Let’s take a closer look to the most important entries.

    HKLM\Software\Microsoft\Microsoft Operations Manager\Agent Management Groups:
    image
    Normally you’ll find a new entry here (up to a maximum of 4), one per on-premise SCOM MG the MMA reports to. However, since this MMA is directly connected to AOI, there is no entry here.

    All other regkeys located under HKLM\Software\Microsoft\Microsoft Operations Manager\ are almost empty as well. So how does the MMA know to communicate with AOI? Perhaps the setting of the related service (HealthService) will tell us more?

    HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\AOI<WORKSPACE ID>
    image
    Aha! Here are more details to be found about the configuration. Apparently the MMA is configured for communication with AOI on the MMA service (HealthService) level.

    So the name of the ‘SCOM Management Group’ in AOI has the prefix AOI <dash> followed by the AOI Workspace ID, like this AOI-<Workspace ID>.

    Please note that the registry value Service Connector Health Service Id is the GUID assigned to the server where the MMA is installed on. So that matches the issuer of the certificate.

    HKLM\CurrentControlSet\Services\HealthService\Parameters\Service Connector Services\Azure Operational Insights
    image

    Wow! It get’s more interesting!!! The regkey Topology Request Url shows us to what AOI URL exactly the HealthService connects to, for example: https://<Workspace ID>.oms.opinsights.azure.com/AgentService.svc/AgentTopologyRequest.

    Again you see how important the Workspace ID – provided when signing up for AOI – is. It enables Azure to partition the service on a per customer basis. Without it, AOI wouldn’t know how to differentiate between all the signed up customers.

    Also notice HTTPS, so a secure channel is used. The earlier mentioned certificate is the one used here, also noted in the registry value Authentication Certificate Thumbprint.

    HKLM\SOFTWARE\Microsoft\System Center Operations Manager\12\Advisor
    image
    Here we find the configuration for System Center Advisor (SCA the old product)

    These settings are for SCA only so they don’t affect the MMA reporting to AOI.

  3. Event log
    Since it’s a MMA, there is also a dedicated event log, titled Operations Manager. Let’s take a look what EventIDs are important here, in chronological order.

    EventID 3000, Source Service Connector
    This event states that a new online Management Group has been added, example: Adding a new online service management group "AOI-<Workspace ID>".

    EventID 2002, source HealthService
    This event states the newly found MG is started, example: Management Group "AOI-<Workspace-ID>" was started.

    EventID 7006, source HealthService
    This event states the public key is published for communication with the online MG, example: The Health Service has published the public key [16 82 DC FD 69 B6 F7 97 4C 89 3F BA 1C 90 9A DB ] used to send it secure messages to management group AOI-<Workspace ID>.   This message only indicates that the key is scheduled for delivery, not that delivery has been confirmed.

    EventID 7019, source HealthService
    This event states all RunAs accounts are validated, example:
    The Health Service has validated all RunAs accounts for management group AOI-<Workspace ID>. Number of accounts or passwords near expiration: 0.

    EventID 5000, source Service Connector
    This event states there is no config and the MMA request it, example:Management group "AOI-<Workspace ID>" has no configuration and will request configuration from the service.

    EventID 5002, source Service Connector
    This event states there is new config is received, example: Management group "AOI-<Workspace ID>" has received new configuration from the service. Previous Configuration Cookie:  New Configuration Cookie: 0x8D1FA8D0F9ED2F2 Public 9ba540dd-ebdd-437b-9737-b7bdeb086238 4469A136-D323-45A8-862E-C2132424C612 True antimalware;changetracking;logmanagement.1.0.1;sqlassessment;updates.

    EventIDs 1200, 1201 & 1210, source HealthService
    These events are exactly the same when the MMA is reporting to an on-prem SCOM MG. EventID 1200: New MPs requested, EventID 1201: Per received MP this event is logged. With AOI it also states Intelligence Packs received by the MMA and EventID 1210: The new configuration, based on the loaded MPs & Intelligence Packs, became active.

    EventID 9999, source Health Service Script
    Per executed script this event is logged, example: AntiMalware Collection Script Finished : AntiMalware Collection Script Returned.

  4. Files and their locations
    Since it’s the MMA doing the work (collecting the required information based on the enabled Intelligence Pack in AOI), the folder layout and placement of the corresponding files is the same as when the MMA reports to an on-premise SCOM MG.

    Folder ~:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Management Packs
    Contains all the MPs downloaded by the MMA from AOI. This folder contains mostly Intelligence Packs since those are used by AOI, all using the prefix Microsoft.IntelligencePacks.
    image
    The network footprint is really small. In total 25 MPs and Intelligence Packs, 3,86 MB in total!

    As you can see these MPs and Intelligence Packs adhere directly to the Intelligence Packs I enabled in AOI for my test servers:
    image

    Folder ~:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Resources
    Contains all the resources (DLL files) used by the MMA in order to carry out the monitoring as required, defined in AOI.
  5. Data collection & sending
    This is done as configured per Intelligence Pack.

    For instance when AOI is configured to collect system events with a critical state, that kind of information will be send to AOI ASAP when such an event is triggered. When AOI also assesses your SQL servers, that kind of information will be collected and send in a scheduled manner to AOI once per a given set of days.

    As stated before, all data will be send over a secure channel using SSL. For this the previous mentioned certificate is used. Also the authentication process will use that certificate.

Wrapping up
As you can see the MMA is still the MMA, doing what a regular MMA does. Only in the case of a MMA reporting to AOI, there are some new Events to be found in the event log and some other additional registry keys. And yes, instead of only MPs we mostly find Intelligence Packs instead.

It’s good to see that while installing the MMA – and having it configured to report to AOI – it automatically creates a self-signed certificate. As we all know in every on-premise SCOM version certificates are a challenge. Too bad this functionality has never seen the light in those version, but good to see it has in the cloud based version since it would have been a huge show stopper for Microsoft to push this service.

Later postings
In a later posting I’ll take a closer look at those Intelligence Packs. For instance, what kind of load they create on your servers. Also I’ll take a look inside those Intelligence Packs. To be continued.

No comments: