Warm tip: This article is reproduced from serverfault.com, please click

System.EngineExecutionException when PInvoking native code with callbacks

发布于 2020-11-27 20:43:33

I'm trying to figure out the root cause of an EngineExecutionException. I've narrowed it down to what I think is a minimal reproducible example.

I have two projects, one unmanaged C++ DLL and one managed C# Console app. The unmanaged code has two functions, one which stores a callback and another which invokes it:

#define WINEXPORT extern "C" __declspec(dllexport)

typedef bool (* callback_t)(unsigned cmd, void* data);
static callback_t callback;

WINEXPORT void set_callback(callback_t cb)
{
    callback = cb;
}

WINEXPORT void run(void)
{
    callback(123, nullptr);
}

On the C# side:

using System;
using System.Runtime.InteropServices;
using System.Threading.Tasks;

namespace ExecutionExceptionReproConsole
{
    class Program
    {
        private const string dllPath = "ExecutionExceptionReproNative.dll";

        [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
        [return: MarshalAs(UnmanagedType.I1)]
        private delegate bool callback_t(uint cmd, IntPtr data);

        [DllImport(dllPath, CallingConvention = CallingConvention.Cdecl)]
        private static extern void set_callback(callback_t callback);

        [DllImport(dllPath, CallingConvention = CallingConvention.Cdecl)]
        private static extern void run();

        static async Task Main(string[] args)
        {
            set_callback(Callback);
            while (!Console.KeyAvailable)
            {
                run();
                await Task.Delay(1);
            }
        }

        static bool Callback(uint cmd, IntPtr data)
        {
            return true;
        }
    }
}

When I run the Console app, it runs fine for three and a half minutes before crashing with System.EngineExecutionException on the run() call.

Call stack:

    [Managed to Native Transition]      Annotated Frame
>   ExecutionExceptionReproConsole.dll!ExecutionExceptionReproConsole.Program.Main(string[] args = {string[0x00000000]}) Line 26    C#  Symbols loaded.
    [Resuming Async Method]     Annotated Frame
    System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)   Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<ExecutionExceptionReproConsole.Program.<Main>d__4>.MoveNext(System.Threading.Thread threadPoolThread) Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Runtime.CompilerServices.TaskAwaiter.OutputWaitEtwEvents.AnonymousMethod__12_0(System.Action innerContinuation, System.Threading.Tasks.Task innerTask = Id = 0x000036d4, Status = RanToCompletion, Method = "{null}") Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining)   Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.Tasks.Task.RunContinuations(object continuationObject)  Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.Tasks.Task.TrySetResult()   Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise.CompleteTimedOut()  Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.CallCallback(bool isThreadPool) Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.Fire(bool isThreadPool) Unknown No symbols loaded.
    System.Private.CoreLib.dll!System.Threading.TimerQueue.FireNextTimers() Unknown No symbols loaded.

What could be causing the crash?

Some other information:

  • Visual Studio version is 16.8.2.
  • I'm building for x64. The issue still happens with x86, but it takes about twice as long to throw.
  • I'm using .NET 5.0, but I can also reproduce the issue with .NET Core 3.1 and 2.1.
    • With .NET Core 2.1 in particular, it crashes much sooner, in about 20 seconds instead of three and a half minutes.
  • I notice the memory usage climbing steadily over the app's runtime, but not nearly enough for it to run out. It climbs at about 16 kB/s and ends up totaling 13 MB at the time of the crash (as reported from Diagnostic Tools).
  • I cannot reproduce the issue if I lower the Task.Delay time to 0, or if I run in a synchronous loop instead of async. I don't notice the memory usage increasing in these scenarios.
  • I cannot reproduce the issue if I comment out the callback invocation from run() in the C++ code.
  • I can reproduce the issue if I use C# 9.0 function pointers with LoadLibrary and GetProcAddress instead of DllImport and static extern ....
Questioner
Jeff
Viewed
0
Stephen Cleary 2020-11-28 10:08:54

As others have noted, this is due to .NET garbage collecting the actual delegate. This is a somewhat common problem with .NET p/Invoke.

Specifically, this code:

set_callback(Callback);

is actually syntactic sugar for this code:

set_callback(new callback_t(Callback));

And as you can see, the callback_t instance is not actually saved anywhere. So, after set_callback returns, it is no longer rooted and is eligible for GC.

The easiest solution is to save it in a rooted variable until it is no longer referenced by the C++ code:

static async Task Main(string[] args)
{
    _callback = Callback;
    set_callback(_callback);
    while (!Console.KeyAvailable)
    {
        run();
        GC.Collect();
        await Task.Delay(1);
    }
}

private static callback_t _callback;

Note that making this synchronous or changing the Task.Delay to 0 will remove the Task allocation that eventually causes a GC, freeing the delegate.