Analyzing Memory Dumps for Azure Network Issues

Advanced Memory Dump Analysis for Networking Issues in Azure App Services


Let’s face it, networking issues in Azure PaaS are like ghosts in the machine: invisible, unpredictable, and always showing up at 2 AM. Whether it’s a vanishing SQL connection, a DNS lookup that takes forever, or an HttpClient call that decides to vacation in CLOSE_WAIT, the debugging process can feel like spelunking in a dark cave with a broken flashlight.

This guide aims to change that.

By combining WinDbg + MEX + NetExt + SOS/SOSEX and Visual Studio 2022, we’re giving you the floodlight. This is your practical, battle-tested toolkit for turning a raw memory dump into a clear picture of what’s going wrong—and where to focus. You’ll find real-world examples, good vs. bad patterns, and commands that surface everything from socket exhaustion to SQL pool starvation.

Let’s dive in and make the invisible… visible.

Goal

To help Azure engineers troubleshoot network-related issues (timeouts, DNS failures, socket stalls) using memory dumps captured from App Services (Windows/Linux), with a focus on tools like WinDbg + MEX/SOS/NetExt and Visual Studio 2022.


Common Scenarios

  • Intermittent timeouts to backend APIs, SQL, or storage
  • Socket exhaustion or hanging outbound connections
  • Stuck DNS resolutions or malformed endpoints
  • Slow or blocked HttpClient/ServicePoint interactions
  • High latency due to proxy misconfigurations

Tools Required

  • WinDbg (Preview or Legacy)
  • Visual Studio 2022 (with .NET dump viewer)
  • MEX.dll, NetExt.dll, SOS.dll, SOSEX.dll (for WinDbg)
  • .dmp file from App Service crash or hang

Workflow Overview

1. Load the Dump

.symfix
.reload
.load C:\path\to\mex.dll
.loadby sos clr   (or coreclr)
.load C:\path\to\NetExt.dll
.windex


  1. Diagnose Network Context (NetExt + MEX)

🔍 View All Sockets

!wsocket                     ; List all open socket objects
!wsocket -c <connectionID>  ; Show socket connection info
!netext.connections         ; TCP + UDP state by process

✅ Good Output Example:

Proto  Local Address          Remote Address         State
TCP    10.0.0.4:52512         13.107.42.14:443       ESTABLISHED

❌ Bad Output Example:

Proto  Local Address          Remote Address         State
TCP    10.0.0.4:54421         10.1.1.5:443           CLOSE_WAIT
TCP    10.0.0.4:54423         10.1.1.5:443           CLOSE_WAIT
...
(Thousands of entries)

Interpretation: Connections stuck in CLOSE_WAIT indicate the server has closed but client hasn’t released.


  1. Identify High Wait Threads (MEX)

!mex.t
!mex.runaway2

Switch to any suspicious thread:

~[id]s
!clrstack

✅ Good Thread Sample:

System.Threading.ThreadPoolWorkQueue.Dispatch()
System.Net.Http.HttpClient.SendAsync()
Task.Run()

❌ Bad Thread Sample:

System.Net.Sockets.Socket.Receive()
System.Threading.ManualResetEventSlim.Wait()

Interpretation: Threads blocking in .Receive() or .Wait() likely indicate socket hang.


  1. Inspect Network-Related Objects

🔍 Search Sockets on Heap

!dumpheap -type System.Net.Sockets.Socket

Then:

!do <addr>

Look for fields:

  • _state
  • _endPointSnapshot
  • _connectTask

🔍 Dump ServicePoint Pool

!dumpheap -type System.Net.ServicePoint

Each entry shows connection limit and current use.

✅ Good Sample:

ConnectionLimit: 10
CurrentConnections: 2
IdleSince: <recent timestamp>

❌ Bad Sample:

ConnectionLimit: 2
CurrentConnections: 2
IdleSince: 2023-05-15T02:10:00

Interpretation: Pool exhaustion, connections not recycling properly.


  1. Trace DNS Failures

!mex.feo -type System.Net.DnsEndPoint

Or from stack trace:

System.Net.Dns.InternalGetHostByName

Check Async DNS Resolution

!dumpheap -type System.Threading.Tasks.Task
!do <addr>  → check .m_action and .m_stateObject

✅ Good Example:

System.Net.DnsEndPoint("api.example.com", 443)
Resolved IP: 52.174.11.14

❌ Bad Example:

System.Net.DnsEndPoint("invalid..endpoint", 443)
System.Net.Sockets.SocketException (No such host is known)

Interpretation: Malformed DNS entry or stale/broken resolution path. Also validate against output from !mex.cn to ensure the dump comes from the affected instance. Check resolver stalls:

!threads
!clrstack

Look for long stalls in:

System.Net.Dns.HostResolutionEndHelper()
System.Threading.Tasks.TaskCompletionSource`1.GetResult()


  1. Analyze SQL Connection Pool

!mex.sqlcn
!mex.sqlcn -detail
!SqlClientPerfCounters
!sqlcmd
!sqlports
!sqlcn

🔍 !SqlClientPerfCounters

Displays SQLClient performance metrics from the .NET counter manager.

✅ Good Output:

NumberOfActiveConnections: 4
NumberOfPooledConnections: 12
HardConnectsPerSecond: 0
HardDisconnectsPerSecond: 0
NumberOfNonPooledConnections: 0

❌ Bad Output:

NumberOfActiveConnections: 100
NumberOfPooledConnections: 100
HardConnectsPerSecond: 15
HardDisconnectsPerSecond: 10
NumberOfNonPooledConnections: 25

Interpretation:

  • High non-pooled connections → bypassing pooling, likely due to incorrect connection string.
  • High connects/disconnects → connection churn, may be app-side leak or lack of using statement.
  • Active = max → starvation possible, check for SQL wait handles or blocking queries.

  1. Diagnose Common Exceptions

!mex.dae
!mex.dae -d

Look for types like:

  • System.Net.Sockets.SocketException
  • System.Net.Http.HttpRequestException
  • System.Threading.Tasks.TaskCanceledException
  • System.Net.WebException

Example Exception Object

Exception Message: A connection attempt failed
ErrorCode: 10060 (Connection timed out)

Use:

.foreach (ex {!dumpheap -type System.Exception -short}){.echo \"***\";!pe ${ex}}

To analyze nested inner exceptions.

✅ Good Pattern:

  • Few transient timeouts, no inner exceptions, caught and retried.

❌ Bad Pattern:

  • Repeated timeouts with SocketException 10060
  • Inner exception: System.IO.IOException
  • Message: Unable to read data from the transport connection

  1. Investigate ThreadPool or Async Starvation

!mex.mthreadpool
!sosex.mwaits
!sosex.dlk

Use:

!dumpheap -type System.Threading.Tasks.Task
!dumpheap -type System.Runtime.CompilerServices.AsyncTaskMethodBuilder

Look for thousands of pending or never-started tasks:

Status: WaitingForActivation

Also useful:

!threadpool
!threads

✅ Good:

MinThreads: 8  Idle: 6  Active: 2
QueueLength: 0

❌ Bad:

MinThreads: 8  Active: 8
QueueLength: 550

Interpretation: ThreadPool starvation or Task scheduling backlog. Use:

!clrstack
!tt

To trace back the workload source.


  1. Visual Studio 2022 Path

  • Open .dmp → Debug with Managed Only
  • Use Parallel Stacks to inspect blocked threads
  • Search for HttpClientSocketSqlConnection, or Dns in Memory View
  • Right-click object → View Details

✅ Summary

Use a combination of:

  • NetExt: Real-time socket and connection state
  • MEX: SQL + thread insights, high CPU, exception analysis
  • SOS/SOSEX: GC root, object inspection, deadlocks
  • Visual Studio: Rapid inspection of tasks, thread states, memory structures

Common Pitfalls & Mistakes

MistakeWhy it mattersHow to fix
❌ Wrong SOS versionIncompatible runtime viewMatch using .loadby sos clr/coreclr
❌ No .symfix/.reloadUnresolved symbolsAlways run .symfix; .reload /f
❌ Missing extensionsLimited visibilityConfirm all .dlls are loaded
❌ Inspecting wrong threadFalse diagnosisUse !mex.t or !runaway2 to focus
❌ Over-relying on VSMisses deep stateUse VS + WinDbg combo for full insight

❓ Frequently Asked Questions (FAQ)

Q: Where do I get mex.dll and netext.dll?
👉 Download from Microsoft GitHub or internal repo. Unblock if needed (Right-click > Properties > Unblock).


Q: How do I confirm the dump is from the affected instance?
👉 Use !mex.cn or inspect .cordll output and confirm environment variables or module paths.


Q: Why are connections stuck in CLOSE_WAIT?
👉 The server closed the connection, but the client didn’t. Check for unhandled Receive() or missing disposals.


Q: Why do I see high SQL connections but low activity?
👉 Possible pool leaks or misuse of connection strings. Check for NonPooledConnections or frequent open/dispose.


Q: Can I debug entirely in Visual Studio?
👉 You can start there, but for async tasks, DNS failures, or deadlocks, WinDbg is more powerful.


Leave a comment