Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /var/www/rassie.dk/blog.rassie.dk/wp-content/themes/suffusion/functions/media.php on line 666

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /var/www/rassie.dk/blog.rassie.dk/wp-content/themes/suffusion/functions/media.php on line 671

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /var/www/rassie.dk/blog.rassie.dk/wp-content/themes/suffusion/functions/media.php on line 684

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /var/www/rassie.dk/blog.rassie.dk/wp-content/themes/suffusion/functions/media.php on line 689

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /var/www/rassie.dk/blog.rassie.dk/wp-content/themes/suffusion/functions/media.php on line 694
Alerting when traffic manager endpoint is down | Rasmus' Ramblings on IT and other stuff
Aug 092019
 

Azure has the capability to create alerts on all sorts of events. If it can be expressed as a metric on a resource in Azure, then you can get an alert if the metric value reaches a state you don’t like.

One of those metrics is the “Endpoint Status by Endpoint” of Azure Traffic Manager.

Traffic Manager has a number of backend endpoints to which it directs traffic. If an endpoint is down for some reason, it doesn’t get any traffic. This is part of what makes Traffic Manager smart.

But you probably want to know about it, when an endpoint is down. That is alerting on the “Endpoint Status by Endpoint” metric can do.

The chart for it can look like this:

endpointstatus-chart

While it is possible to go to the Azure Portal and create an alert on the metric, I prefer having my environments in complete sync, so I use ARM templates for all resource deployments. That includes alert configuration.

There is documentation on how to define metric alerts in ARM templates, however the documentation is missing one important thing: What if we decide later to add a new endpoint? The samples on the documentation page require you to know the exact resources to monitor.

In the Portal, it is possible to check “select *” when creating a metric alert. This means it will monitor all current and future resources of the type.

The alert configuration in the Portal should look like this:

endpoints-status-portal-alert

Note the “Select *” checkbox.

When all endpoints are online, the value is 1. So anything less than that should trigger an alert.

Since the documentation leaves out how to configure the “Select *” checkbox, I tried generating the ARM template using the Azure Portal, but the generated template leaves out the alert criteria completely. Luckily the metric’s resource ID is documented, so using that and some google-fu, I was able to create an ARM template which  creates an alert like the above:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {},
    "variables": {
        "actiongroup-name": "arm-action-group-name",
        "actiongroup-short-name": "arm-ag-short",
        "email-receiver": "alerts@example.com",
        "trafficManagerProfile": "prod-site-tm"
    },
    "resources": [
        {
            "type": "microsoft.insights/actionGroups",
            "apiVersion": "2019-03-01",
            "name": "[variables('actiongroup-name')]",
            "location": "Global",
            "properties": {
                "groupShortName": "[variables('actiongroup-short-name')]",
                "enabled": true,
                "emailReceivers": [
                    {
                        "name": "arm-email-action-name",
                        "emailAddress": "[variables('email-receiver')]",
                        "useCommonAlertSchema": true
                    }
                ],
                "smsReceivers": [],
                "webhookReceivers": [],
                "itsmReceivers": [],
                "azureAppPushReceivers": [],
                "automationRunbookReceivers": [],
                "voiceReceivers": [],
                "logicAppReceivers": [],
                "azureFunctionReceivers": []
            }
        },
        {
            "type": "microsoft.insights/metricAlerts",
            "apiVersion": "2018-03-01",
            "name": "arm-test-alert",
            "location": "global",
            "dependsOn": [
   //             "[resourceId('Microsoft.Network/trafficManagerProfiles', variables('trafficManagerProfile'))]",
                "[resourceId('microsoft.insights/actionGroups', variables('actiongroup-name'))]"
            ],
            "properties": {
                "description": "alert-rule-description",
                "severity": 2,
                "enabled": true,
                "scopes": [
                    "[resourceId('Microsoft.Network/trafficManagerProfiles', variables('trafficManagerProfile'))]"
                ],
                "evaluationFrequency": "PT1M",
                "windowSize": "PT1M",
                "criteria": {
                    "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
                    "allOf": [
                        {
                            "name":"ProbeAgentCurrentEndpointStateByProfileResourceId",
                            "metricName": "ProbeAgentCurrentEndpointStateByProfileResourceId",
                            "metricNamespace": "Microsoft.Network/trafficmanagerprofiles",
                            "threshold": 1,
                            "operator": "LessThan",
                            "dimensions": [
                                {
                                    "name": "EndpointName",
                                    "operator": "Include",
                                    "values": [ "*" ]
                                }
                            ],
                            "timeAggregation": "Maximum"

                        }
                    ]
                },
                "actions": [
                    {
                        "actionGroupId": "[resourceId('microsoft.insights/actionGroups', variables('actiongroup-name'))]",
                        "webHookProperties": {}
                    }
                ]
            }
        }
    ]
}

Note the criteria section near the bottom. The metric name must be ProbeAgentCurrentEndpointStateByProfileResourceId, and the metricNamespace must be Microsoft.Network/trafficmanagerprofiles, as that is what defines the endpoint status metric.

The threshold, operator and timeAggregation properties define the alert criteria – in case the “Maximum” value is less than 1, the alert is triggered.

The hard to find “select *” configuration is in the dimensions object. Instead of listing each of the Traffic Manager Endpoints in the values property, it is possible to simply put “*”.

I have left a commented out line in the template. The contents of that line is used if you create the Traffic Manager in the same template as the alert. In that case, you want to be sure that the Traffic Manager is created first. That is what the dependsOn property declares: You want those resources to be provisioned before this one.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)