Monitoring connectivity pattern, errors, disconnections, connection drill down or their duration for a PostgreSQL Flexible Server is not a straightforward process. The purpose of this blog is to guide you how to deploy a dashboard using an ARM template that will show you the connectivity insights for you Azure Database for PostgreSQL Flexible Server. I will also explain the meaning of the charts and tables, as well as how to interpret the data for troubleshooting or monitoring purpose.
The ARM template provided incorporates multiple Kusto Queries that leverages Azure Monitor Log to search through the server logs for the related information to connectivity. In the chapter How to deploy the connectivity dashboard you can walk through all the steps needed to deploy the dashboard in few minutes. Please make sure you have configured first the requirements described in chapter Prerequisites needed for the dashboard before deploying the actual dashboard.
After you deployed the dashboard, to have a better understanding how to interpret the charts follow the chapter How to use the dashboard and understand its metrics
In the last chapter you can see all the Kusto Queries used in the background for the dashboard. If you want to customize it even further or to adapt based on your use case the queries can be used directly in Azure Monitor Logs.
Prerequisites needed for the dashboard
-
The Kusto Queries that are running in the background of the dashboard relies on Azure Monitor Logs, therefore you need to configure from the PostgreSQL blade the Diagnostic settings to send PostgreSQL server logs to Azure Log Analytics workspace. For more details see how to set Azure Monitor diagnostic settings in Azure Database for PostgreSQL Flexible Server. After configuring the Diagnostic Settings you should see a similar picture as shown below:
- For the corresponding messages related to connections and disconnections to be emitted in the server logs the parameters log_connections and log_disconnections must be set to ON. These parameters should already have these values by default and be read-only.
How to deploy the connectivity dashboard
In order to deploy the connectivity dashboard for a PostgreSQL Flexible Server use one of the following ARM template based on the PostgreSQL version:
- ConnectivityDashboardPostgresqlFlexibleServerTemplate.json for PostgreSQL version greater than 11
- ConnectivityDashboardPostgreSQLv11FlexibleServerTemplate.json only for PostgreSQL version 11
If you jumped straight away to this chapter, please make sure you have followed the prerequisites described in previous chapter Prerequisites needed for the dashboard before deploying the dashboard.
Follow below steps to deploy the dashboard:
-
Get the resource URI of the Azure Database for PostgreSQL Flexible Server for which the dashboard is intended to be created.
-
From the PostgreSQL server blade in Azure portal click on JSON view.
-
Copy the Resource ID as shown below
-
-
From the Azure Portal search and click on "Deploy a custom template"
-
Click on "Build your own template in the editor"
-
Depending on the PostgreSQL version use the appropriate ARM template. If you downloaded the ARM template click on the "Load File" and upload it. Alternatively, you can just copy and paste the ARM template content in the input textarea field. Afterwards press the "Save" button from the left bottom side of the screen.
-
Complete required fields as indicated below. Afterwards press "Review + create" button.
- Choose Resource Group where the dashboard will be deployed
- Provide the "Postgresql Flexible Server URI" using the Resource ID retrieved in step 1 from this tutorial.
- Enter "Dashboard Name". Note that the name is restricted to only alphanumeric and hyphen characters. The ARM template will automatically replace spaces with hyphens.
-
After validation passed press on "Create" button.
-
After the deployment is complete press "Go to resource". From the resource press on "Go to dashboard" link to view the newly deployed dashboard.
How to use the dashboard and understand its metrics
The dashboard itself is structured to present all the insights in 3 main ways:
-
Data presented in Charts
All charts' data is summarized on an adaptive aggregation that is based on the result set for the chosen time range in Azure UI as follow:
- If resultant time range is less or equal with 12h the data is summarized on 1 minute aggregation.
- If resultant time range is greater then 12h but less or equal with 24h, then the data is summarized on 5 minutes aggregation.
- If resultant time range is greater then 24h but less or equal with 48h, then the data is summarized on 10 minutes aggregation.
- If resultant time range is greater then 48h but less or equal with 7 days, then the data is summarized on 30 minutes aggregation.
- For any resultant time range grater then 7 days the data is summarized on 1h aggregation.
-
Summary of data
All the summary tables of connectivity is aggregated for the entire time range selected in the Azure UI.
-
Connectivity insights listing
The listing tables will present all the respective data in descending order for the entire time range selected in the Azure UI and it will include IP, exact time of the event, PID and were appropriate user, database/application_name or error message.
Because the dashboard contains multiple charts in each widget title automatically will be displayed the server’s name in case you are using multiple dashboards to monitor the connectivity for more than a server at a time.
Understanding the Charts.
-
Connections drill down by IP
In the chart "All Connections by IP" are displayed all connections by IP including Azure internal ones. This chart is useful to understand from which IP the connections are established to your PostgreSQL Flexible Server.
-
Comparison between successful Azure connections vs Non-Azure connections
Since Azure Database for PostgreSQL Flexible Server is a managed service, Azure is connecting to your server for monitoring and management purpose. To distinguish between successful connection created by Azure and your own connections this chart is extremely useful.
-
Comparison between failed and successful connections
This chart is useful to understand the connection pattern specially to identify failed connections time. The successful connections will exclude azure connections, therefore it will be an accurate picture of your own connections. Also, in successful connections are included regular and replication connections. Nevertheless, replication connection for the hot standby in case HA is enabled are included though might be considered azure connections.
-
Replication vs Non-Replication connections
The chart represents the comparison between successful replication connections and regular connections. Azure connections are excluded, thus the chart represents only your connections. In a similar fashion replication connection from the hot standby if HA is enabled are included as well.
-
Successful Connections drilled down by user or database
These 2 charts will represent the summary of your own regular connections by user or database and are useful to understand the pattern of connections especially which users are creating most of the connections or which databases are mostly used. In these charts replication connections from your side and Azure connections are excluded.
-
Successful Replication connections drilled down by user or application_name
These 2 charts will represent the summary of replication connections by user or application_name and are useful to understand the pattern of replication connections in terms of the user or application_name that is creating those connections. You can also understand the number and the pattern in time when those replication connections are created.
-
Connection Lifetime drilldown
This chart will indicate the duration pattern of the connections. The data presented is taken in the moment of the disconnection and is grouped in 3 categories by the duration as follow:
- Short-Lived Connections are connections that got disconnected in less then 1 minute.
- Normal-Lived Connections are connections that had a life span between 1 minute and 20 minutes.
- Long-Lived Connections are connections that had a life span grater then 20 minutes.
This chart is extremely useful to understand the life span of your connections and for better performance we suggest avoiding short-lived connections. Azure disconnections are excluded, while it will include all your regular and replication connections. Another important thing that must be taken into consideration is that connection failures also emit a disconnection message, thus are included in the chart and since the duration of such connections is couple of seconds, connection failures will be reported as short-lived connections.
-
Average connection duration in seconds by IP
If using previous chart "Connection Lifetime drilldown" you can understand the duration pattern, using this chart you can deep dive on the average duration in seconds for the connections based on the IP. The average duration is computed on the adaptive aggregation based on the result set for the chosen time range in Azure UI. Therefore, for troubleshooting purpose it is advised to choose a narrow time range, preferrable less then 12 hours to have an average on 1 minute aggregation. The aggregation time of the charts is explained in the beginning of this chapter. If the duration is 0 (zero) it means that at that particular time was no disconnection from the corresponding IP. The chart itself is mostly intended to see from which IP the average duration is small or to detect dips in connectivity duration that might indicate some issues.
-
Connection Errors by SQL Error Code
The chart represents the summary of the connection failures based on the SQL error code, hence you can understand the reason and the amount of the failed connections. It is understandable that the error code might not mean much, nevertheless if this chart is used in conjunction with Summary table of connection failures, than in that table you can see the actual error message that corresponds to the SQL error code.
-
Connection Errors by IP
The chart represents the summary of connection failures based on the IP and will help you to understand from which IP the connection failures are originating.
Understanding Summary tables.
-
Summary of new connections by IP
It will represent the summary of all connections (including Azure ones) created on you PostgreSQL Flexible Server in the selected time range from Azure UI. The data presented is grouped by IP and will report the total number of connections denoted by Counter as well as the first and last connection within the time range.
-
Summary of Azure Successful connections
For transparency reasons you can use this table to see all the successful connections created by Azure. The data presented is grouped by IP, user and database and will report the total number of connections denoted by Count as well as the first and last connection within the time range.
-
Summary of Successful connections
This table will represent the summary of all successful connections created in the selected time range from Azure UI. Replication and Azure successful connections are not included. The data is grouped by IP, user and database, therefore you can understand the total number of connections denoted by Count within the time frame for each pair (IP, user, database).
-
Summary of Replication connections
This table will represent only the summary of replication connections created in the selected time range from Azure UI. The data is grouped by IP and application_name and you can use the summary to understand the total number of replication connections denoted by Count within the time frame.
-
Summary of disconnections
The table will represent the summary of all disconnections in the selected time range from Azure UI. The summary includes disconnections for successful and failed connections, while Azure disconnections are excluded. In successful connections are included both types: replication and regular connections. The data is grouped by IP, user and database. Therefore, for each pair (IP, user and database) is reported the total number of disconnections denoted by Count, minimum duration in seconds encountered denoted by minDurationSec, maximum duration in seconds encountered denoted by maxDurationSec and the average duration in seconds of the connections denoted by AvgDurationSec for the time window between FirstDisconnection and LastDisconnection.
-
Summary of all failed logins
The data in here is grouped by IP, error message and SqlErrorCode, where Count represents the total number of failures within the time frame. The SqlErrorCode presented in this table can be used to map the error message with the Chart that represent the connections failures by the SQL Error Code.
Understanding Listing tables.
-
All successful connections
This table will include all successful connections. Azure and replication connections are excluded. The data will present the IP, user, database, ConnRecvTime, ConnAuthTime, SetupTime and PID of each connection. The ConnRecvTime is the connection received time, ConnAuthTime is the time when the authentication processes completed and the connection was established. ConnSetupTime is the amount of time it took to establish the connections: ConnSetupTime – ConnRecvTime.
-
All replication connections
This table will include all replication connections. The data will present the IP, user, application_name, ConnRecvTime, ConnAuthTime, ConnSetupTime and PID for each replication connection. Like in previous table ConnRecvTime, ConnAuthTime and ConnSetupTime have the same meaning.
-
All disconnections
This table will include all the disconnections for all connection types including failures, whereas Azure disconnections are excluded. The data will present IP, user, database, DisconnectTime, duration, DurationInSec and PID for each disconnection. The DisconnectTime is the time when the respective connection got disconnected. The duration is the duration string as logged in the server log, whereas the DurationInSec is the conversion in seconds from the duration string for better understanding.
-
All login failures
This table will include all login failures. The data will present IP, ErrorMessage, SqlErrorCode, ConnectTime, AuthFailedTime, ConnSetupTime and PID for each login failure. The ErrorMessage is the error for the failure and SqlErrorCode is the respective PostgreSQL Error Code for the respective message. The ConnecTime is the time when the connection was received, while the AuthFailedTime is the time when the authentication failed and was returned the error. Respectively the ConnSetupTime is the amount of time it took from the establishment of the connection till the failure, in other words: AuthFailedTime – ConnSetupTime.
Limitation towards PostgreSQL versions.
PostgreSQL v11 does not emit in the server logs the application_name for replication connections. For this reason, a separate ARM template is provided specially for PostgreSQL v11 that will not include the chart "Replication connections by application_name" and in the tables "Summary of Replication connections" and "All replication connections" the application_name is not included.
Further Customization of the Dashboard.
The dashboard is intended to capture all connection types, however if for example you are not having replication connections you can edit the dashboard and remove the corresponding widget by clicking on the "…" sign from the upper right corner of the widget then click on "Remove from dashboard". In a similar way if you want to reduce the number of charts or tables you also can remove them. Afterwards you can re-arrange the widgets based on your own preferences or screen resolution.
Caveats.
In some corner cases the same connection PID can be reported as a successful and failed connection at the same time. This situation can happen for example for following login failure:
database "xxxx" does not exist
The explanation is that the authentication completed successful for the user that connected to the PostgreSQL server and the corresponding log message was emitted. Further on as the database to which the connection was already established did not exist a corresponding fatal error was emitted and connection terminated. Nevertheless, in such cases having a connection successful emitted is not an issue as the main purpose is to detect the reason of the failure and to take appropriate actions.
Kusto Queries Language (KQL) used in the dashboard
In this section you will find all Kusto queries used in the background. As already explained for the charts will be used an adaptive aggregation that is based on the result set for the chosen time range in Azure UI. This is accomplished using the following code:
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
The chart data is displayed using make-series
function where the step is the adaptive interval "Intervals" computed above. Missing points (no data recorded in that particular time interval) will be populated with zero values. As make-series
function requires a start and end time these values are always computed in each Kusto Query to be able to integrate with Azure time range selector in the UI.
As the server can be stopped or at times for the selected time interval in Azure UI can be no record, than the start and end time might be null. This behavior will lead to an error as the bin()
function run by make-series
cannot be run on a null value. To overcome this issue, it will be used a dummy start and end time for such cases:
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
…
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
To make it clear which time interval was used for the aggregation it will plotted in the legend of each chart. This is accomplished using the following code:
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
…
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: All Connections by IP
The Kusto Query searches in the server logs for messages of type "connection received" that denotes the entry for each connection in the PostgreSQL log. Using a regular expression, the IP is extracted from the log entry: IP=extract(@'host=(.\S+)', 1, Message)
. It is used count()
function to sum up all the connections from the given Ip in the Interval. This query will plot a chart will all the Ips that created a connection to the PostgreSQL Flexible server (including Azure internal connections).
//All new connections (including replication and azure internal connections) regardless of failed or successful by Ip
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let AllConnections=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
AllConnections
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
AllConnections
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
AllConnections
| extend IP=extract(@'host=(.\S+)', 1, Message)
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by IP
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Azure vs Non-Azure Successful Connections
The Kusto Query searches in the server logs for messages of type "connection authorized" that denotes the entry for a successful connection in the PostgreSQL log. Using a regular expression the user and the database is extracted from the log entry: extract_all(@'user=(.+)database=(.\S+)', Message)
. If the user is "azuresu" then the connection is marked as Azure connection, otherwise as a Non-Azure connection. Lastly is used count()
function to sum up all the connections in the Interval. This query will plot a chart that is a comparison of successful connections created by Azure vs Non-Azure (your own connections).
//Successful Azure vs NonAzure Connections
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let SuccessConn=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
SuccessConn
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
SuccessConn
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
SuccessConn
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend Connection=iff(user == "azuresu","AzureConnections","NonAzureConnections")
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by Connection
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Failed vs Successful Connections
Below chart is generated from the union of 2 queries: one query is searching for successful connections and the other for failed ones. Successful logins will include both replication and non-replication connections, while azure connections denoted by user "azuresu" are excluded. The failed connection will search for "authentication failed" messages or all FATAL errors with any of the below PostgreSQL error codes:
Error Code | Condition Name |
---|---|
Class 28 — Invalid Authorization Specification | |
28000 | invalid_authorization_specification |
28P01 | invalid_password |
Class 3D — Invalid Catalog Name | |
3D000 | invalid_catalog_name |
Class 08 — Connection Exception | |
08P01 | protocol_violation |
Class 53 — Insufficient Resources | |
53300 | too_many_connections |
Class 42 — Syntax Error or Access Rule Violation | |
42501 | insufficient_privilege |
The resultant query will plot a chart that is a comparison of successful connections vs Failed connections aggregated based on the Interval.
//Failed vs Successful Connections [including replication]
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let MAX_PID_JOIN_SEC=30;
let SuccessConn=AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DBFORPOSTGRESQL'
| where Category == 'PostgreSQLLogs'
| where Message contains 'connection authorized'
| extend ConnectTime=todatetime(timestamp_s);
let PreFailedConn=AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DBFORPOSTGRESQL'
| where Category == 'PostgreSQLLogs'
| where Message contains 'authentication failed' or sqlerrcode_s in ('28000', '28P01', '3D000', '53300', '42501', '08P01')
| where errorLevel_s =='FATAL'
| extend AuthFailedTime=todatetime(timestamp_s);
let searchpids=PreFailedConn
| distinct processId_d;
let FailedConnectionReceived=AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DBFORPOSTGRESQL'
| where Category == 'PostgreSQLLogs'
| where Message contains 'connection received'
| where processId_d in (searchpids)
| extend ConnectTime=todatetime(timestamp_s);
let FailedConn=PreFailedConn
| join kind=leftouter FailedConnectionReceived on $left.processId_d == $right.processId_d
| extend deltatime=datetime_diff('second', AuthFailedTime, ConnectTime)
| where deltatime >= 0 and deltatime < MAX_PID_JOIN_SEC;
let AllConnections=SuccessConn
| union FailedConn;
let starttime=toscalar(
AllConnections
| summarize mintime=min(ConnectTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
AllConnections
| summarize xmaxtime=max(ConnectTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,'1 minute',
iff (Intervals==5m, '5 minutes',
iff (Intervals==10m, '10 minutes',
iff (Intervals==30m, '30 minutes', '1 hour')
)
)
);
AllConnections
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@' ', tostring(allmatches[0][0]))
| where user != 'azuresu'
| extend Connection=iff(Message contains 'connection authorized', 'SuccessfulConnection', 'FailedConnection')
| make-series Connections=count() default=0 on ConnectTime from starttime to endtime step Intervals by Connection
| render timechart with (xtitle=strcat('Time (aggregation interval ', IntervalStr, ')'))
Chart: Replication vs Non-Replication Successful connections
The Kusto Query searches initially for all messages type "connection authorized" that denotes the entry for a successful connection in the PostgreSQL log. Afterwards if the message matches "replication connection authorized" that indicates a replication connection than the respective connection is marked as a replication one, otherwise as a non-replication connection. Azure connections denoted by user "azuresu" are excluded. The chart generated will be a comparison of replication vs non-replication connections aggregated based on the Interval.
// Replication vs Non-Replication connections
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let AllConnections=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
AllConnections
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
AllConnections
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
AllConnections
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| where user != 'azuresu' // exclude azure connections from successful connections.
| extend Connection=iff(Message contains "replication connection authorized","ReplicationConnection","Non-ReplicationConnection")
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by Connection
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Successful Non-replication connections by database
The Kusto Query searches for messages of type "connection authorized" that denotes the entry for a successful connection and excludes "replication connection authorized" messages, therefore it will include only successful non-replication connections. Subsequently using regular expression the user and the database to which the connection was established are retrieved extract_all(@'user=(.+)database=(.\S+)', Message)
. All connections denoted by user "azuresu" are excluded. The chart generated will represent a counter of successful connection aggregated by database based on the Interval.
// Successful Non-replication connections by database
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let SuccessConnNonReplication=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| where Message !contains "replication connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
SuccessConnNonReplication
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
SuccessConnNonReplication
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
SuccessConnNonReplication
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend database=tostring(allmatches[0][1])
| where user != 'azuresu' // exclude azure connections
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by database
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Successful Non-replication connections by user
The Kusto Query searches for messages of type "connection authorized" that denotes the entry for a successful connection and excludes "replication connection authorized" messages, therefore it will include only successful non-replication connections. Subsequently using regular expression the user and the database to which the connection was established are retrieved extract_all(@'user=(.+)database=(.\S+)', Message)
. All connections denoted by user "azuresu" are excluded. The chart generated will represent a counter of successful connection aggregated by user based on the Interval.
// Successful Non-replication connections by user
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let SuccessConnNonReplication=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| where Message !contains "replication connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
SuccessConnNonReplication
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
SuccessConnNonReplication
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
SuccessConnNonReplication
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend database=tostring(allmatches[0][1])
| where user != 'azuresu' // exclude azure connections
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by user
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Replication Connections by user
The Kusto Query searches for messages of type "replication connection authorized" that denotes the entry for a replication connection. Subsequently using regular expression the user and the application_name for the replication connection are retrieved extract_all(@'user=(.+)application_name=(.\S+)', Message)
. The chart generated will represent a counter of replication connection aggregated by user based on the Interval.
// Replication Connections by user
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let SuccessConn=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
SuccessConn
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
SuccessConn
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
SuccessConn
| where Message contains "replication connection authorized"
| extend allmatches=extract_all(@'user=(.+)application_name=(.\S+)', Message)
| extend user=tostring(allmatches[0][0])
| extend application_name=tostring(allmatches[0][1])
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by user
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Replication Connections by application_name
The Kusto Query searches for messages of type "replication connection authorized" that denotes the entry for a replication connection. Subsequently using regular expression the user and the application_name for the replication connection are retrieved extract_all(@'user=(.+)application_name=(.\S+)', Message)
. The chart generated will represent a counter of replication connection aggregated by application_name based on the Interval.
// Replication Connections by application_name
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let SuccessConn=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| extend RealTime=todatetime(timestamp_s);
let starttime=toscalar(
SuccessConn
| summarize mintime=min(RealTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
SuccessConn
| summarize xmaxtime=max(RealTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
SuccessConn
| where Message contains "replication connection authorized"
| extend allmatches=extract_all(@'user=(.+)application_name=(.\S+)', Message)
| extend user=tostring(allmatches[0][0])
| extend application_name=tostring(allmatches[0][1])
| make-series Connections=count() default=0 on RealTime from starttime to endtime step Intervals by application_name
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Connection Lifetime drill-down
The Kusto Query searches for messages of type "disconnection" that denotes the entry for a disconnection in PostgreSQL log. Subsequently using regular expression the duration, user, database and Ip are retrieved extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
. Initially the duration is reported as a string and is converted in seconds. Disconnections from Azure denoted by user "azuresu" are excluded. Further on the connection is marked as Short-Lived if had a duration less then 60 sec, as Normal-Lived if the duration was between 60 sec and 1200 sec and as Long-Lived for a duration grater then 1200 sec. The chart generated will represent a counter of Connection Duration Type aggregated based on the Interval.
//Connection durations drill-down
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let Disconnections=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "disconnection"
| extend DisconnectTime=todatetime(timestamp_s);
let starttime=toscalar(
Disconnections
| summarize mintime=min(DisconnectTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
Disconnections
| summarize xmaxtime=max(DisconnectTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
Disconnections
| extend allmatches=extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
| extend duration=totime(allmatches[0][0])
| extend user=trim(@" ",tostring(allmatches[0][1]) )
| extend database=tostring(allmatches[0][2])
| extend ip=tostring(allmatches[0][3])
| where user != 'azuresu'
| extend ts = totimespan(duration)
| extend DurationInSeconds = ts / 1s
| extend ConnType=iff(
DurationInSeconds <= 60, 'Short-Lived Connection' ,
iff(DurationInSeconds > 60 and DurationInSeconds < 1200, 'Normal-Lived Connection','Long-Lived Connection')
)
| make-series Connections=count() default=0 on DisconnectTime from starttime to endtime step Intervals by ConnType
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Average Connection Duration in Seconds by Ip
The Kusto Query searches for messages of type "disconnection" that denotes the entry for a disconnection in PostgreSQL log. Subsequently using regular expression the duration, user, database and Ip are retrieved extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
. Disconnections from Azure denoted by user "azuresu" are excluded. As Initially the duration is reported as a string it is converted in seconds. The chart generated will represent the average connection duration in seconds aggregated on the Ip based on the Interval.
//Average connection duration by IP
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let Disconnections=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "disconnection"
| extend DisconnectTime=todatetime(timestamp_s);
let starttime=toscalar(
Disconnections
| summarize mintime=min(DisconnectTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
Disconnections
| summarize xmaxtime=max(DisconnectTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
Disconnections
| extend allmatches=extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
| extend duration=totime(allmatches[0][0])
| extend user=trim(@" ",tostring(allmatches[0][1]) )
| extend database=tostring(allmatches[0][2])
| extend ip=tostring(allmatches[0][3])
| where user != 'azuresu'
| extend ts = totimespan(duration)
| extend DurationInSeconds = ts / 1s
| make-series avgConnectionDurationInSec=avg(DurationInSeconds) on DisconnectTime from starttime to endtime step Intervals by ip
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Connection Errors by SqlErrorCode
To get all login failures, the Kusto query first searches for "authentication failed" messages or all FATAL errors with any of the below PostgreSQL error codes:
Error Code | Condition Name |
---|---|
Class 28 — Invalid Authorization Specification | |
28000 | invalid_authorization_specification |
28P01 | invalid_password |
Class 3D — Invalid Catalog Name | |
3D000 | invalid_catalog_name |
Class 08 — Connection Exception | |
08P01 | protocol_violation |
Class 53 — Insufficient Resources | |
53300 | too_many_connections |
Class 42 — Syntax Error or Access Rule Violation | |
42501 | insufficient_privilege |
The chart generated will represent a counter of Connection SQL Error codes aggregated based on the Interval and will indicate the pattern of SQL error code in time.
//Connection errors counter by SqlErrorCode
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let MAX_PID_JOIN_SEC=30;
let ConnectErrors=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "authentication failed" or sqlerrcode_s in ('28000', '28P01', '3D000', '53300', '42501','08P01')
| where errorLevel_s =='FATAL'
| extend SqlErrorCode=sqlerrcode_s
| extend AuthFailedTime=todatetime(timestamp_s);
let starttime=toscalar(
ConnectErrors
| summarize mintime=min(AuthFailedTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
ConnectErrors
| summarize xmaxtime=max(AuthFailedTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
let searchpids=ConnectErrors
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| extend ConnectTime=todatetime(timestamp_s);
ConnectErrors
| join kind=leftouter ConnectionReceived on $left.processId_d == $right.processId_d
| extend deltatime=datetime_diff('second', AuthFailedTime, ConnectTime)
| where deltatime >= 0 and deltatime < MAX_PID_JOIN_SEC
| make-series ConnectionErrors=count() default=0 on ConnectTime from starttime to endtime step Intervals by SqlErrorCode
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Chart: Connection Errors by IP
To get all login failures, the Kusto query first searches for "authentication failed" messages or all FATAL errors with any of the below PostgreSQL error codes:
Error Code | Condition Name |
---|---|
Class 28 — Invalid Authorization Specification | |
28000 | invalid_authorization_specification |
28P01 | invalid_password |
Class 3D — Invalid Catalog Name | |
3D000 | invalid_catalog_name |
Class 08 — Connection Exception | |
08P01 | protocol_violation |
Class 53 — Insufficient Resources | |
53300 | too_many_connections |
Class 42 — Syntax Error or Access Rule Violation | |
42501 | insufficient_privilege |
To get the IP for the failure message it is used extra manipulations. First it is retrieved all distinct PIDs for the failures and then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. At this stage there are 2 tables: one holding the failures and another the connections. Further on, these 2 tables are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual failure is computed the difference in seconds between the failure time and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30. Using this approach, it is ensured the correct mapping is achieved.
The chart generated will represent a counter of Connection failures aggregated based on the Interval and will indicate number of errors for an IP in time.
//Connection errors counter by Ip
let dummystart=datetime('2000-01-01 00:00');
let dummyend=datetime('2000-01-01 00:01');
let MAX_PID_JOIN_SEC=30;
let ConnectErrors=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "authentication failed" or sqlerrcode_s in ('28000', '28P01', '3D000', '53300', '42501','08P01')
| where errorLevel_s =='FATAL'
| extend AuthFailedTime=todatetime(timestamp_s);
let starttime=toscalar(
ConnectErrors
| summarize mintime=min(AuthFailedTime)
| project xmintime=bin(mintime, 1m)
| extend mintime=iff(isnotempty(xmintime), xmintime, dummystart)
| project mintime
);
let endtime=toscalar(
ConnectErrors
| summarize xmaxtime=max(AuthFailedTime)
| extend maxtime=iff(isnotempty(xmaxtime), xmaxtime, dummyend)
| project maxtime
);
let TimeRange = endtime - starttime;
let Intervals = iff(
TimeRange <= 12h, 1m,
iff(TimeRange <= 24h, 5m,
iff(TimeRange <= 48h, 10m,
iff(TimeRange <= 7d, 30m,1h)
)
)
);
let IntervalStr=iff (
Intervals==1m,"1 minute",
iff (Intervals==5m, "5 minutes",
iff (Intervals==10m, "10 minutes",
iff (Intervals==30m, "30 minutes", "1 hour")
)
)
);
let searchpids=ConnectErrors
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| extend ConnectTime=todatetime(timestamp_s);
ConnectErrors
| join kind=leftouter ConnectionReceived on $left.processId_d == $right.processId_d
| extend deltatime=datetime_diff('second', AuthFailedTime, ConnectTime)
| where deltatime >= 0 and deltatime < MAX_PID_JOIN_SEC
| make-series ConnectionErrors=count() default=0 on ConnectTime from starttime to endtime step Intervals by IP
| render timechart with ( xtitle=strcat("Time (aggregation interval ",IntervalStr,")") )
Summary: New connections by IP
The Kusto Query searches in the server logs for messages of type "connection received" that denotes the entry for each connection in the PostgreSQL log. Using a regular expression, the IP is extracted from the log entry: IP=extract(@'host=(.\S+)', 1, Message)
. Subsequently the query will summarize all new connections by IP for the entire time range selected in Azure UI. The summary will represent a counter of Connections for each IP showing as well the first and last connection encountered across the entire time range.
//Summary of all new connections by IP
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| extend RealTime=todatetime(timestamp_s)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| summarize FirstConnection=min(RealTime),LastConnection=max(RealTime), Counter=count() by IP
| order by Counter desc
Summary: Azure Successful Connections
The Kusto Query searches in the server logs for messages of type "connection authorized" that denotes the entry for each successful connection in the PostgreSQL log. Replication connections are excluded. Using regular expression the user and the database to which the connection was established are retrieved extract_all(@'user=(.+)database=(.\S+)', Message)
. The user is filtered to be "azuresu", therefore are considered only Azure Connections. The time of the event denoted by "connection received", basically the time when authentication completed is denoted as ConnAuthTime.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the connection authorized table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. The time when the connection was received is denoted as ConnRecvTime.
Further on, these 2 tables (Connection Authorized and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection is computed the difference in seconds between the authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
Lastly from the resultant table will be summarized all Azure connections for the entire time range selected in Azure UI. The data is grouped by Ip, user and database and it will represent a counter of connections showing as well the first and last connection time encountered across the entire time range for each grouping pair (Ip, user and database).
//Summary of Azure Only Successful Connections
let MAX_PID_JOIN_SEC=30;
let ConnectionAuthorized=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| where Message !contains "replication connection"
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend database=tostring(allmatches[0][1])
| where user == 'azuresu' // only azuresu connections
| project ConnAuthTime=todatetime(timestamp_s), processId_d,user, database, ConAuthMSG=Message;
let searchpids=ConnectionAuthorized
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnRecvTime=todatetime(timestamp_s), processId_d,IP, ConRecvMSG=Message;
ConnectionAuthorized
| join kind=leftouter ConnectionReceived on $left.processId_d==$right.processId_d
| project processId_d,IP,user,database,ConnRecvTime,ConnAuthTime,ConRecvMSG,ConAuthMSG
| extend ConnSetupTime=ConnAuthTime-ConnRecvTime
| extend deltatime=datetime_diff('second',ConnAuthTime,ConnRecvTime)
| where deltatime >=0 and deltatime < MAX_PID_JOIN_SEC
| order by ConnRecvTime desc
| project IP,user,database,ConnRecvTime,ConnAuthTime,ConnSetupTime,processId_d//,ConRecvMSG,ConAuthMSG
| summarize FirstConnection=min(ConnRecvTime),LastConnection=max(ConnRecvTime), Count=count() by IP, user, database
| order by Count desc
Summary: Successful Connections
The Kusto Query searches in the server logs for messages of type "connection authorized" that denotes the entry for each successful connection in the PostgreSQL log. Replication connections are excluded. Using regular expression the user and the database to which the connection was established are retrieved extract_all(@'user=(.+)database=(.\S+)', Message)
. Azure connections are excluded to have a picture only of the connections created by you or your apps. The time of the event denoted by "connection received", basically the time when authentication completed is denoted as ConnAuthTime.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the connection authorized table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. The time when the connection was received is denoted as ConnRecvTime.
Further on, these 2 tables (Connection Authorized and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection is computed the difference in seconds between the authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
Lastly from the resultant table will be summarized all your non-replication connections for the entire time range selected in Azure UI. The data is grouped by Ip, user and database and it will represent a counter of connections showing as well the first and last connection time encountered across the entire time range for each grouping pair (Ip, user and database).
//Summary of Successful Connections
let MAX_PID_JOIN_SEC=30;
let ConnectionAuthorized=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| where Message !contains "replication connection"
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend database=tostring(allmatches[0][1])
| where user != 'azuresu' // exclude azuresu connections
| project ConnAuthTime=todatetime(timestamp_s), processId_d,user, database, ConAuthMSG=Message;
let searchpids=ConnectionAuthorized
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnRecvTime=todatetime(timestamp_s), processId_d,IP, ConRecvMSG=Message;
ConnectionAuthorized
| join kind=leftouter ConnectionReceived on $left.processId_d==$right.processId_d
| project processId_d,IP,user,database,ConnRecvTime,ConnAuthTime,ConRecvMSG,ConAuthMSG
| extend ConnSetupTime=ConnAuthTime-ConnRecvTime
| extend deltatime=datetime_diff('second',ConnAuthTime,ConnRecvTime)
| where deltatime >=0 and deltatime < MAX_PID_JOIN_SEC
| order by ConnRecvTime desc
| project IP,user,database,ConnRecvTime,ConnAuthTime,ConnSetupTime,processId_d//,ConRecvMSG,ConAuthMSG
| summarize FirstConnection=min(ConnRecvTime),LastConnection=max(ConnRecvTime), Count=count() by IP, user, database
| order by Count desc
Summary: Replication connections
The Kusto Query searches in the server logs for messages of type "replication connection authorized" that denotes the entry for each replication connection in the PostgreSQL log. In this way non-replication connections are excluded. Using regular expression the user and the application_name for which the connection was established are retrieved extract_all(@'user=(.+)application_name=(.\S+)', Message)
. Azure connections are excluded, to have a picture only of the connections created by you or your apps. The time of the event denoted by "connection received", basically the time when authentication completed is denoted as ConnAuthTime.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the replication connection authorized table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. The time when the connection was received is denoted as ConnRecvTime.
Further on, these 2 tables (Replication Connection Authorized and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection is computed the difference in seconds between the authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
Lastly from the resultant table will be summarized all your replication connections for the entire time range selected in Azure UI. The data is grouped by Ip, user and application_name and it will represent a counter of connections showing as well the first and last replication connection time encountered across the entire time range for each grouping pair (Ip, user and application_name).
//Summary of Replication connections
let MAX_PID_JOIN_SEC=30;
let ReplicationConn=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "replication connection authorized"
| extend allmatches=extract_all(@'user=(.+)application_name=(.\S+)', Message)
| extend user=tostring(allmatches[0][0])
| extend application_name=tostring(allmatches[0][1])
| project ConnAuthTime=todatetime(timestamp_s), user, application_name, processId_d, ConAuthMSG=Message;
let searchpids=ReplicationConn
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnRecvTime=todatetime(timestamp_s), processId_d,IP, ConRecvMSG=Message;
ReplicationConn
| join kind=leftouter ConnectionReceived on $left.processId_d==$right.processId_d
| extend ConnSetupTime=ConnAuthTime-ConnRecvTime
| extend deltatime=datetime_diff('second',ConnAuthTime,ConnRecvTime)
| where deltatime >=0 and deltatime < MAX_PID_JOIN_SEC
| summarize FirstConnection=min(ConnRecvTime),LastConnection=max(ConnRecvTime), Count=count() by IP, user, application_name
| order by Count desc
Summary: Disconnections
The Kusto Query searches for messages of type "disconnection" that denotes the entry for a disconnection in PostgreSQL log. Subsequently using regular expression the duration, user, database and Ip are retrieved extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
. Disconnections from Azure denoted by user "azuresu" are excluded. As Initially the duration is reported as a string it is converted in seconds.
Subsequently will be summarized all your disconnections for the entire time range selected in Azure UI. The data is grouped by Ip, user and database and it will represent a counter of disconnections showing as well the first and last disconnection time, minimum, maximum and average connection duration encountered across the entire time range for each grouping pair (Ip, user and database).
// Summary of disconnections
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "disconnection"
| extend allmatches=extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
| extend duration=totime(allmatches[0][0])
| extend user=trim(@" ",tostring(allmatches[0][1]) )
| extend database=tostring(allmatches[0][2])
| extend IP=tostring(allmatches[0][3])
| where user != 'azuresu'
| extend ts = totimespan(duration)
| extend DurationInSeconds = ts / 1s
| extend DisconnectTime=todatetime(timestamp_s)
| summarize FirstDisconnection=min(DisconnectTime), LastDisconnection=max(DisconnectTime), minDurationSec=min(DurationInSeconds),
maxDurationSec=max(DurationInSeconds), AvgDurationSec=avg(DurationInSeconds),
NrOfDisconnections=count() by IP,user, database
| order by NrOfDisconnections desc
Summary: All Failed logins
To get all login failures, the Kusto query first searches for "authentication failed" messages or all FATAL errors with any of the below PostgreSQL error codes:
Error Code | Condition Name |
---|---|
Class 28 — Invalid Authorization Specification | |
28000 | invalid_authorization_specification |
28P01 | invalid_password |
Class 3D — Invalid Catalog Name | |
3D000 | invalid_catalog_name |
Class 08 — Connection Exception | |
08P01 | protocol_violation |
Class 53 — Insufficient Resources | |
53300 | too_many_connections |
Class 42 — Syntax Error or Access Rule Violation | |
42501 | insufficient_privilege |
To the error message itself it is appended the details of the failure if the corresponding field is not empty. The time of the failure is denoted as AuthFailedTime.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the authentication failure table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression and the time when the connection was received is denoted as ConnectTime.
Further on, these 2 tables (Authentication Failure and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection failure is computed the difference in seconds between the failed authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
Lastly from the resultant table will be summarized all authentication failures for the entire time range selected in Azure UI. The data is grouped by Ip, ErrorMessage and SqlErrorCode and it will represent a counter of logins failures showing as well the first and last failure time encountered across the entire time range for each grouping pair (Ip, ErrorMessage and SqlErrorCode).
//Summary of All Failed logins
let MAX_PID_JOIN_SEC=30;
let ConnectErrors=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "authentication failed" or sqlerrcode_s in ('28000', '28P01', '3D000', '53300', '42501','08P01')
| where errorLevel_s =='FATAL'
| extend preErrorMsg=extract(@'[FATAL|LOG|ERROR]:(.+)', 1, Message)
| extend nextparseErrorMsg=extract(@':\s(.+)', 1, preErrorMsg)
| extend ErrorMsgWithoutDetail=iff(isempty(nextparseErrorMsg), preErrorMsg, nextparseErrorMsg)
| extend ErrorMessage=iff(isnotempty(detail_log_s), strcat(ErrorMsgWithoutDetail, '. ', detail_log_s), ErrorMsgWithoutDetail)
| project AuthFailedTime=todatetime(timestamp_s), processId_d, ErrorMessage, SqlErrorCode=sqlerrcode_s;
let searchpids=ConnectErrors
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnectTime=todatetime(timestamp_s), IP, processId_d;
ConnectErrors
| join kind=leftouter ConnectionReceived on $left.processId_d == $right.processId_d
| extend ConnSetupTime=AuthFailedTime - ConnectTime
| extend deltatime=datetime_diff('second', AuthFailedTime, ConnectTime)
| where deltatime >= 0 and deltatime < MAX_PID_JOIN_SEC
| summarize FirstErrorTime=min(ConnectTime), LastErrorTime=max(ConnectTime), TotalErrors=count() by IP, ErrorMessage, SqlErrorCode
| order by TotalErrors desc
Listing: All Successful connections
The Kusto Query searches in the server logs for messages of type "connection authorized" that denotes the entry for each successful connection in the PostgreSQL log. Replication connections are excluded. Using regular expression the user and the database to which the connection was established are retrieved extract_all(@'user=(.+)database=(.\S+)', Message)
. Azure connections are also excluded, to have a picture only of the connections created by you or your apps. The time of the event denoted by "connection authorized", basically the time when authentication completed is denoted as ConnAuthTime. The full message for the connection authorization is denoted by ConAuthMSG.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the connection authorized table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. The time when the connection was received is denoted as ConnRecvTime, while the full message is denoted as ConRecvMSG.
Further on, these 2 tables (Connection Authorized and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection is computed the difference in seconds between the authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
The data from the resultant table will be sorted in descending order and will be projected for each successful connection the IP, user, database, connection received time (ConnRecvTime), connection authentication time (ConnAuthTime), time needed to establish the connection ConnSetupTime (which is the time difference between ConnAuthTime and ConnRecvTime) and the PID of the connection. ConAuthMSG and ConRecvMSG are commented, nevertheless if you require those messages, optionally you can uncomment by yourself.
//All Successful Connections in descending order
let MAX_PID_JOIN_SEC=30;
let ConnectionAuthorized=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection authorized"
| where Message !contains "replication connection"
| extend allmatches=extract_all(@'user=(.+)database=(.\S+)', Message)
| extend user=trim(@" ",tostring(allmatches[0][0]) )
| extend database=tostring(allmatches[0][1])
| where user != 'azuresu' // exclude azuresu connections
| project ConnAuthTime=todatetime(timestamp_s), processId_d,user, database, ConAuthMSG=Message;
let searchpids=ConnectionAuthorized
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnRecvTime=todatetime(timestamp_s), processId_d,IP, ConRecvMSG=Message;
ConnectionAuthorized
| join kind=leftouter ConnectionReceived on $left.processId_d==$right.processId_d
| extend ConnSetupTime=ConnAuthTime-ConnRecvTime
| extend deltatime=datetime_diff('second',ConnAuthTime,ConnRecvTime)
| where deltatime >=0 and deltatime < MAX_PID_JOIN_SEC
| project IP,user,database,ConnRecvTime,ConnAuthTime,ConnSetupTime,PID=tostring(toint(processId_d))//,ConRecvMSG,ConAuthMSG
| order by ConnRecvTime desc
Listing: All replication connections
The Kusto Query searches in the server logs for messages of type "replication connection authorized" that denotes the entry for each replication connection in the PostgreSQL log. In this way non-replication connections are excluded. Using regular expression the user and the application_name for which the connection was established are retrieved extract_all(@'user=(.+)application_name=(.\S+)', Message)
. Azure connections are excluded, to have a picture only of the connections created by you or your apps. The time of the event denoted by "connection received", basically the time when authentication completed is denoted as ConnAuthTime. The full message for the connection authorization is denoted by ConAuthMSG.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the connection authorized table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression. The time when the connection was received is denoted as ConnRecvTime, while the full message is denoted as ConRecvMSG.
Further on, these 2 tables (Replication Connection Authorized and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection is computed the difference in seconds between the authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
The data from the resultant table will be sorted in descending order and will be projected for each replication connection the IP, user, application_name, connection received time (ConnRecvTime), replication connection authentication time (ConnAuthTime), time needed to establish the connection ConnSetupTime (which is the time difference between ConnAuthTime and ConnRecvTime) and the PID of the connection. ConAuthMSG and ConRecvMSG are commented, nevertheless if you require those messages, optionally you can uncomment by yourself.
//All Replication connections
let MAX_PID_JOIN_SEC=30;
let ReplicationConn=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "replication connection authorized"
| extend allmatches=extract_all(@'user=(.+)application_name=(.\S+)', Message)
| extend user=tostring(allmatches[0][0])
| extend application_name=tostring(allmatches[0][1])
| project ConnAuthTime=todatetime(timestamp_s), user, application_name, processId_d, ConAuthMSG=Message;
let searchpids=ReplicationConn
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnRecvTime=todatetime(timestamp_s), processId_d,IP, ConRecvMSG=Message;
ReplicationConn
| join kind=leftouter ConnectionReceived on $left.processId_d==$right.processId_d
| extend ConnSetupTime=ConnAuthTime-ConnRecvTime
| extend deltatime=datetime_diff('second',ConnAuthTime,ConnRecvTime)
| where deltatime >=0 and deltatime < MAX_PID_JOIN_SEC
| order by ConnRecvTime desc
| project IP, user, application_name, ConnRecvTime,ConnAuthTime,ConnSetupTime, PID=tostring(toint(processId_d))//,ConRecvMSG,ConAuthMSG
Listing: All Disconnections
The Kusto Query searches for messages of type "disconnection" that denotes the entry for a disconnection in PostgreSQL log. Subsequently using regular expression the duration, user, database and Ip are retrieved extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
. Disconnections from Azure denoted by user "azuresu" are excluded. As Initially the duration is reported as a string it is converted in seconds.
Subsequently will be summarized all your disconnections for the entire time range selected in Azure UI. The data is grouped by Ip, user and database and it will represent a counter of disconnections showing as well the first and last disconnection time, minimum, maximum and average connection duration encountered across the entire time range for each grouping pair (Ip, user and database).
The data will be sorted in descending order and will be projected for each disconnection the IP, user, database, disconnection time DisconnectTime, duration string duration, duration converted in seconds DurationInSeconds and the PID of the respective connection.
// All disconnections in descending order
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "disconnection"
| extend allmatches=extract_all(@'session time:(.+)user=(.+)database=(.+)host=(.\S+)', Message)
| extend duration=totime(allmatches[0][0])
| extend user=trim(@" ",tostring(allmatches[0][1]) )
| extend database=tostring(allmatches[0][2])
| extend IP=tostring(allmatches[0][3])
| where user != 'azuresu'
| extend ts = totimespan(duration)
| extend DurationInSeconds = ts / 1s
| project IP,user,database,DisconnectTime=todatetime(timestamp_s), duration,DurationInSeconds, PID=tostring(toint(processId_d))
| order by DisconnectTime desc
Listing: All Login failures
To get all login failures, the Kusto query first searches for "authentication failed" messages or all FATAL errors with any of the below PostgreSQL error codes:
Error Code | Condition Name |
---|---|
Class 28 — Invalid Authorization Specification | |
28000 | invalid_authorization_specification |
28P01 | invalid_password |
Class 3D — Invalid Catalog Name | |
3D000 | invalid_catalog_name |
Class 08 — Connection Exception | |
08P01 | protocol_violation |
Class 53 — Insufficient Resources | |
53300 | too_many_connections |
Class 42 — Syntax Error or Access Rule Violation | |
42501 | insufficient_privilege |
To the error message itself it is appended the details of the failure if the corresponding field is not empty. The time of the failure is denoted as AuthFailedTime, while the PostgreSQL error code as SqlErrorCode.
To get the Ip and the time when connections were received are retrieved all distinct PIDs for the authentication failure table. Then is searched all the messages of type "connection received" for the corresponding PIDs. From the connection received table the IP is parsed using regular expression and the time when the connection was received is denoted as ConnectTime.
Further on, these 2 tables (Authentication Failure and Connection Received) are outer joined based on the PID. Since the PID can repeat in time, to have the proper pair that corresponds to the actual connection failure is computed the difference in seconds between the failed authentication and the connection time. The maximum allowed difference in seconds for the join is denoted by MAX_PID_JOIN_SEC=30 to ensure the correct mapping is achieved.
The data from the resultant table will be sorted in descending order and will be projected for each login failure the Ip, ErrorMessage, SqlErrorCode , connection received time (ConnectTime), connection authentication failure time (AuthFailedTime), time needed to establish the connection ConnSetupTime (which is the time difference between AuthFailedTime and ConnectTime) and the PID of the respective connection.
//All Connection failures in descending order
let MAX_PID_JOIN_SEC=30;
let ConnectErrors=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "authentication failed" or sqlerrcode_s in ('28000', '28P01', '3D000', '53300', '42501','08P01')
| where errorLevel_s =='FATAL'
| extend preErrorMsg=extract(@'[FATAL|LOG|ERROR]:(.+)', 1, Message)
| extend nextparseErrorMsg=extract(@':\s(.+)', 1, preErrorMsg)
| extend ErrorMsgWithoutDetail=iff(isempty(nextparseErrorMsg), preErrorMsg, nextparseErrorMsg)
| extend ErrorMessage=iff(isnotempty(detail_log_s), strcat(ErrorMsgWithoutDetail, '. ', detail_log_s), ErrorMsgWithoutDetail)
| project AuthFailedTime=todatetime(timestamp_s), processId_d, ErrorMessage, SqlErrorCode=sqlerrcode_s;
let searchpids=ConnectErrors
| distinct processId_d;
let ConnectionReceived=AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where Message contains "connection received"
| where processId_d in (searchpids)
| extend IP=extract(@'host=(.\S+)', 1, Message)
| project ConnectTime=todatetime(timestamp_s), IP, processId_d;
ConnectErrors
| join kind=leftouter ConnectionReceived on $left.processId_d == $right.processId_d
| extend ConnSetupTime=AuthFailedTime - ConnectTime
| extend deltatime=datetime_diff('second', AuthFailedTime, ConnectTime)
| where deltatime >= 0 and deltatime < MAX_PID_JOIN_SEC
| project IP,ErrorMessage,SqlErrorCode,ConnectTime,AuthFailedTime,ConnSetupTime, PID=tostring(toint(processId_d))
| order by ConnectTime desc
We are hoping that you find this blog article helpful. Additional information can be found in following documents:
- Azure Database for PostgreSQL - Flexible Server
- Server parameters in Azure Database for PostgreSQL - Flexible Server
- Logs in Azure Database for PostgreSQL - Flexible Server
We’re always eager to get your feedback, so please reach out via email to Ask Azure DB for PostgreSQL.
Posted at https://sl.advdat.com/32TEA1W