DSoftStudio.Mediator achieves near-direct-call latency (~7 ns Send, ~0.6 ns overhead) through a combination of compile-time code generation and runtime-free dispatch. This document explains the key design decisions that make it the fastest .NET mediator tested.
Source generators analyze your handlers at build time and emit strongly-typed dispatch methods. When you call mediator.Send(new Ping()), the compiler resolves the exact handler type — no dictionary lookup, no Type.GetType(), no MakeGenericType().
Build time:
Source Generator → discovers PingHandler
→ emits direct call: handler.Handle(request, ct)
Runtime:
Send(new Ping()) → compiled dispatch → PingHandler.Handle()
This is fundamentally different from reflection-based mediators that resolve handlers at runtime via IServiceProvider.GetService(typeof(IRequestHandler<,>).MakeGenericType(...)).
Pipeline behaviors are chained via interface dispatch, not delegate allocation. Each behavior receives the next handler as IRequestHandler<TRequest, TResponse> rather than a Func<> or RequestHandlerDelegate<>:
// DSoftStudio.Mediator — interface dispatch (zero allocation)
public ValueTask<TResponse> Handle(
TRequest request,
IRequestHandler<TRequest, TResponse> next, // no allocation
CancellationToken ct)
// MediatR — delegate allocation (heap allocation per call)
public Task<TResponse> Handle(
TRequest request,
RequestHandlerDelegate<TResponse> next, // allocates delegate
CancellationToken ct)
The difference: interface dispatch is a virtual method call (near-zero cost), while delegate creation allocates on the heap every time the pipeline executes.
Send(object) uses FrozenDictionary<Type, Func<...>> — a read-optimized, immutable dictionary built once at startup. Lookups are faster than Dictionary<> and allocation-free, while remaining fully AOT-compatible (no reflection at runtime).
Handlers without constructor parameters are automatically registered as Singleton. This eliminates per-call IServiceProvider.GetService() resolution and object allocation:
| Registration | Per-call cost |
|---|---|
| Transient (MediatR default) | GetService() + new Handler() |
| Singleton (DSoft auto) | Cached reference — zero allocation |
The source generator detects whether a handler has constructor parameters and emits the appropriate registration.
All handlers return ValueTask<T> instead of Task<T>. When the result is available synchronously (common for in-memory operations, cache hits, validation failures), ValueTask<T> avoids the Task heap allocation entirely.
Notifications are dispatched by exact compile-time type, not by walking the inheritance hierarchy at runtime. This means:
Type.GetInterfaces() reflectionPrecompilePipelines() resolves and chains all pipeline behaviors at startup, so the first Send() call is as fast as subsequent calls — no lazy initialization, no lock contention, no JIT surprise on the first request.
The source generators emit C# interceptors that replace ISender.Send, IPublisher.Publish, and IMediator.CreateStream call sites at compile time with direct pipeline invocations — eliminating virtual dispatch entirely.
The generated code adapts to the build’s OptimizationLevel:
| Build | Generated pattern | Overhead vs direct call | Mock-safe |
|---|---|---|---|
| Release | Branchless castclass IServiceProviderAccessor |
~0.6 ns | ❌ throws InvalidCastException on mocks |
| Debug | is not IServiceProviderAccessor + virtual fallback |
~3 ns | ✅ graceful fallback to virtual dispatch |
A single isinst + branch instruction prevents the JIT’s Guarded Devirtualization (GDV) from fully devirtualizing the interface call, adding ~3 ns on every invocation. The branchless castclass pattern in Release lets GDV optimize the dispatch to a method-table pointer comparison + direct field load — effectively zero overhead.
In Debug builds (where tests typically run), the generated interceptors detect test doubles (Moq, NSubstitute, etc.) that don’t implement IServiceProviderAccessor and fall back to virtual dispatch.
Typed extensions (e.g. Send(Ping)) always use the defensive isinst + fallback pattern regardless of build configuration — they are the fallback path when interceptors are suppressed and must remain mock-safe in dotnet test -c Release CI pipelines. The ~1–2 ns overhead is negligible. See Design Notes for the full comparison table.
Tip: If you mock
ISenderin a project that references the generator and build in Release mode, the interceptor will throwInvalidCastException. Set<DSoftMediatorSuppressInterceptors>true</DSoftMediatorSuppressInterceptors>or reference onlyDSoftStudio.Mediator.Abstractionsin your test project. See Source Generators for DSOFT004.
NotificationPublisherFlag BypassMost applications never register a custom INotificationPublisher. Without optimization, every generated Publish interceptor would call GetService<INotificationPublisher>() on every invocation — even when the result is always null. That DI probe alone costs ~3–4 ns per call.
NotificationPublisherFlag is a write-once global Volatile boolean that eliminates this probe:
false. The generated interceptor reads HasCustomPublisher (~0.1 ns) and short-circuits directly to NotificationCachedDispatcher.DispatchSequential — zero DI lookup.INotificationPublisher is registered (e.g. ParallelNotificationPublisher, OpenTelemetry’s InstrumentedNotificationPublisher), the Mediator constructor calls MarkRegistered() once. All subsequent interceptors see the flag and take the GetService path as before.Publish(notification)
│
▼
NotificationPublisherFlag.HasCustomPublisher? (~0.1 ns Volatile.Read)
│
├─ false ──▶ NotificationCachedDispatcher.DispatchSequential() ← fast path
│
└─ true ──▶ GetService<INotificationPublisher>() ← custom path
└─▶ customPublisher.Publish(handlers, notification)
| Scenario | Before optimization | After optimization |
|---|---|---|
| No custom publisher (default) | GetService returns null (~3–4 ns) |
Volatile.Read (~0.1 ns) |
| Custom publisher registered | GetService returns instance (~3–4 ns) |
Volatile.Read + GetService (~3–4 ns) |
Net effect: ~4 ns saved per Publish call in the default (no custom publisher) path — bringing the Publish interceptor from ~2.2× to ~1.1× overhead vs direct dispatch.
| Technique | Impact |
|---|---|
| Compile-time dispatch | Eliminates reflection + dictionary lookup |
| Interface dispatch pipeline | Zero delegate allocation per call |
| FrozenDictionary | Fast AOT-safe runtime dispatch |
| Auto-Singleton handlers | Eliminates per-call object creation |
| ValueTask returns | Avoids Task allocation for sync paths |
| Exact-type notification dispatch | O(1) dispatch, no duplicate handlers |
| Pipeline precompilation | Eliminates cold-start penalty |
| Interceptor code generation | Release: ~0.6 ns overhead (GDV-optimized) |
NotificationPublisherFlag bypass |
Skips DI probe when no custom publisher (~4 ns saved) |
Send() to handler