Advanced Memory Dump Analysis for Networking Issues in Azure App Services
Let’s face it, networking issues in Azure PaaS are like ghosts in the machine: invisible, unpredictable, and always showing up at 2 AM. Whether it’s a vanishing SQL connection, a DNS lookup that takes forever, or an HttpClient call that decides to vacation in CLOSE_WAIT, the debugging process can feel like spelunking in a dark cave with a broken flashlight.
This guide aims to change that.
By combining WinDbg + MEX + NetExt + SOS/SOSEX and Visual Studio 2022, we’re giving you the floodlight. This is your practical, battle-tested toolkit for turning a raw memory dump into a clear picture of what’s going wrong—and where to focus. You’ll find real-world examples, good vs. bad patterns, and commands that surface everything from socket exhaustion to SQL pool starvation.
Let’s dive in and make the invisible… visible.
Goal
To help Azure engineers troubleshoot network-related issues (timeouts, DNS failures, socket stalls) using memory dumps captured from App Services (Windows/Linux), with a focus on tools like WinDbg + MEX/SOS/NetExt and Visual Studio 2022.
Common Scenarios
- Intermittent timeouts to backend APIs, SQL, or storage
- Socket exhaustion or hanging outbound connections
- Stuck DNS resolutions or malformed endpoints
- Slow or blocked HttpClient/ServicePoint interactions
- High latency due to proxy misconfigurations
Tools Required
- WinDbg (Preview or Legacy)
- Visual Studio 2022 (with .NET dump viewer)
- MEX.dll, NetExt.dll, SOS.dll, SOSEX.dll (for WinDbg)
.dmpfile from App Service crash or hang
Workflow Overview
1. Load the Dump
.symfix
.reload
.load C:\path\to\mex.dll
.loadby sos clr (or coreclr)
.load C:\path\to\NetExt.dll
.windex
- Diagnose Network Context (NetExt + MEX)
🔍 View All Sockets
!wsocket ; List all open socket objects
!wsocket -c <connectionID> ; Show socket connection info
!netext.connections ; TCP + UDP state by process
✅ Good Output Example:
Proto Local Address Remote Address State
TCP 10.0.0.4:52512 13.107.42.14:443 ESTABLISHED
❌ Bad Output Example:
Proto Local Address Remote Address State
TCP 10.0.0.4:54421 10.1.1.5:443 CLOSE_WAIT
TCP 10.0.0.4:54423 10.1.1.5:443 CLOSE_WAIT
...
(Thousands of entries)
Interpretation: Connections stuck in CLOSE_WAIT indicate the server has closed but client hasn’t released.
- Identify High Wait Threads (MEX)
!mex.t
!mex.runaway2
Switch to any suspicious thread:
~[id]s
!clrstack
✅ Good Thread Sample:
System.Threading.ThreadPoolWorkQueue.Dispatch()
System.Net.Http.HttpClient.SendAsync()
Task.Run()
❌ Bad Thread Sample:
System.Net.Sockets.Socket.Receive()
System.Threading.ManualResetEventSlim.Wait()
Interpretation: Threads blocking in .Receive() or .Wait() likely indicate socket hang.
- Inspect Network-Related Objects
🔍 Search Sockets on Heap
!dumpheap -type System.Net.Sockets.Socket
Then:
!do <addr>
Look for fields:
_state_endPointSnapshot_connectTask
🔍 Dump ServicePoint Pool
!dumpheap -type System.Net.ServicePoint
Each entry shows connection limit and current use.
✅ Good Sample:
ConnectionLimit: 10
CurrentConnections: 2
IdleSince: <recent timestamp>
❌ Bad Sample:
ConnectionLimit: 2
CurrentConnections: 2
IdleSince: 2023-05-15T02:10:00
Interpretation: Pool exhaustion, connections not recycling properly.
- Trace DNS Failures
!mex.feo -type System.Net.DnsEndPoint
Or from stack trace:
System.Net.Dns.InternalGetHostByName
Check Async DNS Resolution
!dumpheap -type System.Threading.Tasks.Task
!do <addr> → check .m_action and .m_stateObject
✅ Good Example:
System.Net.DnsEndPoint("api.example.com", 443)
Resolved IP: 52.174.11.14
❌ Bad Example:
System.Net.DnsEndPoint("invalid..endpoint", 443)
System.Net.Sockets.SocketException (No such host is known)
Interpretation: Malformed DNS entry or stale/broken resolution path. Also validate against output from !mex.cn to ensure the dump comes from the affected instance. Check resolver stalls:
!threads
!clrstack
Look for long stalls in:
System.Net.Dns.HostResolutionEndHelper()
System.Threading.Tasks.TaskCompletionSource`1.GetResult()
- Analyze SQL Connection Pool
!mex.sqlcn
!mex.sqlcn -detail
!SqlClientPerfCounters
!sqlcmd
!sqlports
!sqlcn
🔍 !SqlClientPerfCounters
Displays SQLClient performance metrics from the .NET counter manager.
✅ Good Output:
NumberOfActiveConnections: 4
NumberOfPooledConnections: 12
HardConnectsPerSecond: 0
HardDisconnectsPerSecond: 0
NumberOfNonPooledConnections: 0
❌ Bad Output:
NumberOfActiveConnections: 100
NumberOfPooledConnections: 100
HardConnectsPerSecond: 15
HardDisconnectsPerSecond: 10
NumberOfNonPooledConnections: 25
Interpretation:
- High non-pooled connections → bypassing pooling, likely due to incorrect connection string.
- High connects/disconnects → connection churn, may be app-side leak or lack of
usingstatement. - Active = max → starvation possible, check for SQL wait handles or blocking queries.
- Diagnose Common Exceptions
!mex.dae
!mex.dae -d
Look for types like:
System.Net.Sockets.SocketExceptionSystem.Net.Http.HttpRequestExceptionSystem.Threading.Tasks.TaskCanceledExceptionSystem.Net.WebException
Example Exception Object
Exception Message: A connection attempt failed
ErrorCode: 10060 (Connection timed out)
Use:
.foreach (ex {!dumpheap -type System.Exception -short}){.echo \"***\";!pe ${ex}}
To analyze nested inner exceptions.
✅ Good Pattern:
- Few transient timeouts, no inner exceptions, caught and retried.
❌ Bad Pattern:
- Repeated timeouts with
SocketException10060 - Inner exception:
System.IO.IOException - Message:
Unable to read data from the transport connection
- Investigate ThreadPool or Async Starvation
!mex.mthreadpool
!sosex.mwaits
!sosex.dlk
Use:
!dumpheap -type System.Threading.Tasks.Task
!dumpheap -type System.Runtime.CompilerServices.AsyncTaskMethodBuilder
Look for thousands of pending or never-started tasks:
Status: WaitingForActivation
Also useful:
!threadpool
!threads
✅ Good:
MinThreads: 8 Idle: 6 Active: 2
QueueLength: 0
❌ Bad:
MinThreads: 8 Active: 8
QueueLength: 550
Interpretation: ThreadPool starvation or Task scheduling backlog. Use:
!clrstack
!tt
To trace back the workload source.
- Visual Studio 2022 Path
- Open
.dmp→ Debug with Managed Only - Use Parallel Stacks to inspect blocked threads
- Search for
HttpClient,Socket,SqlConnection, orDnsin Memory View - Right-click object → View Details
✅ Summary
Use a combination of:
- NetExt: Real-time socket and connection state
- MEX: SQL + thread insights, high CPU, exception analysis
- SOS/SOSEX: GC root, object inspection, deadlocks
- Visual Studio: Rapid inspection of tasks, thread states, memory structures
Common Pitfalls & Mistakes
| Mistake | Why it matters | How to fix |
|---|---|---|
| ❌ Wrong SOS version | Incompatible runtime view | Match using .loadby sos clr/coreclr |
❌ No .symfix/.reload | Unresolved symbols | Always run .symfix; .reload /f |
| ❌ Missing extensions | Limited visibility | Confirm all .dlls are loaded |
| ❌ Inspecting wrong thread | False diagnosis | Use !mex.t or !runaway2 to focus |
| ❌ Over-relying on VS | Misses deep state | Use VS + WinDbg combo for full insight |
❓ Frequently Asked Questions (FAQ)
Q: Where do I get mex.dll and netext.dll?
👉 Download from Microsoft GitHub or internal repo. Unblock if needed (Right-click > Properties > Unblock).
Q: How do I confirm the dump is from the affected instance?
👉 Use !mex.cn or inspect .cordll output and confirm environment variables or module paths.
Q: Why are connections stuck in CLOSE_WAIT?
👉 The server closed the connection, but the client didn’t. Check for unhandled Receive() or missing disposals.
Q: Why do I see high SQL connections but low activity?
👉 Possible pool leaks or misuse of connection strings. Check for NonPooledConnections or frequent open/dispose.
Q: Can I debug entirely in Visual Studio?
👉 You can start there, but for async tasks, DNS failures, or deadlocks, WinDbg is more powerful.