Today we will discuss how client management is done internally at Microsoft. At a high level we will share how clients are onboarded, managed, and the custom reports used for tracking SLAs and KPIs.
Onboarding:
We have two environments so to speak; devices that are joined to a domain (AD) and devices that are joined to Azure Active Directory (AAD).
Domain Joined:
A Logon script deployed via group policy is used as the primary and only method for installing Configuration Manager client. Given that we support multiple regions, ADM templates are used to stamp regional specific parameters required for installing agent.
Command line: CCMSetup.exe /MP:XXXXXXXXXXX /MP:XXXXXXXXXXXX SMSSITECODE=XXX FSP=XXXXXXXXXXX CCMLOGMAXSIZE=2000000 CCMLOGLEVEL=1 DISABLESITEOPT=TRUE DISABLECACHEOPT=TRUE CCMLOGMAXHISTORY=10 SMSCACHESIZE=10000 IGNOREAPPVVERSIONCHECK=TRUE CCMEVALSENDALWAYS=TRUE
All domain joined devices with Windows OS version RS3 and above will onboard to Intune for leveraging Co-Management (Co-Mgmt) capabilities. ConfigMgr is used as primary management authority in this scenario except for the workloads like Compliance policies transitioned to Intune.
Fig 1: Current workload configuration
Azure AD devices:
All devices connected to AAD are onboard into Intune for management. Intune is the primary management authority in this scenario. We also deploy ConfigMgr client via App for supporting deploying win32 applications and to benefit from rich reporting capabilities.
Command line: msiexec /i "ccmsetup.msi" CCMSETUPCMD="CCMHOSTNAME=XXXXXXXXX.CLOUDAPP.NET/CCM_Proxy_MutualAuth/XXXXXXXXX SMSSiteCode=XXX CCMLOGLEVEL=1 CCMLOGMAXHISTORY=5 SMSCACHESIZE=10000 FSP=XXXXXXXXXXXXX /nocrlcheck" /qn
Monitoring and Metrics:
Monitoring ConfigMgr health is a critical aspect of client management. We track various metrics for monitoring agent health and reach daily. We also have to auto detect and remediate known issues – we keep expanding the functionality based on issues detected during investigations – if mitigation is safe, and issues can be detected programmatically. These remediations are performed using a logon script for domain joined devices. This method is used for tackling both client install failures and health issues. Below are a few most frequently triggered remediations:
- WMI repository remediation
- Policy provider issues
- Provisioning mode configuration
- Client registration issues
- Dependent service misconfiguration (we see these often given the nature of our environment)
We recently started expanding this functionality to use Proactive remediation scripts in Intune to target Co-Managed devices. We capture telemetry via these scripts to understand issues as well as track effectiveness of these scripts and make improvements. You can refer to the blog my colleague recently wrote about how these capabilities can be used to various scenarios.
Agent Reach:
To ensure the coverage of ConfigMgr client meets the SLA (95%), we closely track over all reach by comparing it against overall discovered devices. For AAD devices, we compare them against overall devices registered to AAD. We use a few reports to track this, examples of the datapoints we monitor – figure 2 and 3 below.
Fig 2: Report to track client install status per domain in the last 24 hours
Fig 3: Reports for tracking onboarding methods. In this case most of them are auto upgrade since site is going through upgrade
Agent Health:
For tracking agent health, we look at several aspects like policy, heartbeat, and hardware inventory for tracking day to day trends; we call these operational metrics. For reporting health numbers, we rely on CCMEVAL data. We use a Power BI dashboard (fig:4) during daily standup calls for tracking and triggering investigations accordingly. Note that these are custom dashboards built on top of Power BI using transformations that aggregate data into Azure SQL.
Fig 4: Dashboard for tracking operational metrics
Co-Mgmt enables us to look at different data points proactively and remediate issues from Intune. Given that these devices are communicating both with ConfigMgr and Intune, it enables us to look at interesting datapoints like devices that are active in one system and not in the other and vice versa. We use the dashboard below to track some of these aspects. We will be expanding to add more datapoints to this in the future. In fig:5, Intune CoMgmt means Intune is primary management authority and SCCM CoMgmt means Configuration Manager is primary management authority.
Fig 5: Co-Mgmt. Devices data points
On a weekly basis we track below insights and share it with leadership, partners, and broader teams.
Fig 6: Weekly Insights
Log Collection:
As everyone can relate to, having the ability to capture logs is critical for understanding the issues at hand. In our environment we use various methods for capturing required logs without contacting users.
- Using ConfigMgr Console using Client Diagnostics, this can be used for both domain joined and AADJ (Azure Active Directory join) devices
- Through Just Enough Access (JEA)
- Through Intune Client Diagnostics
JEA – Just Enough Access functionality for getting restricted access to remote machines. We implemented a custom module to enable a few functionalities like log copy and client remediations. Refer to JEA public documentation for additional information.
Conclusion:
I hope this blog gave high level understanding of how we track and address Client health and reach trends internally. Most of the reporting we have is built in-house using ETLs and ADF pipelines to capture aggregated data from both ConfigMgr and Intune to determine the trends, setup alerting and take appropriate actions. This is unique to our environment and will take considerable hours to maintain which is outside the scope of this post. We plan to cover in a future blog post. We are working towards expanding health and reach for devices managed by Intune as primary management authority across all platforms. We will plan to blog about it once operationalize it. Thank you for reading! Please do share feedback in comments section.
Posted at https://sl.advdat.com/30P7dvU