-
Notifications
You must be signed in to change notification settings - Fork 690
Description
Check List
- I checked my issue doesn't exist yet
- My issue is valid with mirror default sample and not specific to my user-mode driver implementation
- I can always reproduce the issue with the provided description below.
- I have updated Dokany to the latest version and have reboot my computer after.
- I tested one of the last snapshot from appveyor CI
Describe the bug
We've a customer that is using our Dokan2 based network drive on a pool of about 70 Azure Virtual Desktop machines.
Every day 5-6 of them hang completely (the whole system is not responding anymore), which stops about 60 people from working.
The Azure support connected to a hanging system using (virtual) serial port and created a Kernel Dump by causing a BSOD (only thing still possible).
They found a problem with an Outlook process trying to read from our drive, and blocking lots of other kernel threads.
To Reproduce
Steps to reproduce the behavior:
Unfortunately, the issue cannot be reproduced by us, it (currently) only happens in the customers environment.
Expected behavior
System should not hang.
Logs
Please attach in separate files: mirror output, library logs and kernel logs.
In case of BSOD, please attach minidump or dump analyze output.
Memory Dump is about 1 GB, I'll ask the customer to allow the upload.
Environment:
- Windows version: Microsoft Windows 11 Enterprise multi-session
- Processor architecture: x64
- Dokany version: 2.3.0.1000
- Library type (Dokany/FUSE): Dokany
Additional context
The dump has been analyzed by Microsoft already, and they'll work together with us to solve the issue.
Also their analysis can be provided.
Mainly I need some information about internals of the driver to analyze the dump myself.
For example, in the dump I don't see any DokanProcessAndPullEvents thread running, and I'm not sure, if this is normal (because it's currently working on a request ?) or not.
And, is there a list of IRP/Requests, that have been pulled to user land ?
I know about the PendingIRP list, but I think that list contains all pending IRP, regardsless if they already have been pulled.
We see a lot of "No matching IRPs found for a reply" just before the issue occurs.
As far a I know, this means that a response for a request has been delivered from user land to the driver after the IRP has been canceled because of a timeout.
But this means, that requests are still pulled and executed by the user process, but too slow, right ?