Unexpected SQL Output from EF Core GroupBy-Sum LINQ Query

I’m working with the following LINQ statement:

var result = await _context.Responses
    .Where(r => r.Query != null)
    .GroupBy(r => r.Option)
    .Select(group => new
    {
        group.Key,
        Total = group.Sum(r => r.Query.Value)
    })
    .ToListAsync();

Here are my data models:

public class Response
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public Guid Id { get; set; }

    public Option? Option { get; set; } // Values stored are 0, 1, 2

    public Guid? QueryId { get; set; }
    public virtual Query? Query { get; set; }
}

public class Query
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public Guid Id { get; set; }

    public ushort Value { get; set; }
}

The generated SQL for this LINQ query is:

SELECT [r].[Option] AS [Key],
       (
           SELECT COALESCE(SUM(CAST([q1].[Value] AS int)), 0)
           FROM [Responses] AS [r0]
               LEFT JOIN [Queries] AS [q0]
                   ON [r0].[QueryId] = [q0].[Id]
               LEFT JOIN [Queries] AS [q1]
                   ON [r0].[QueryId] = [q1].[Id]
           WHERE ([q0].[Id] IS NOT NULL)
                 AND (
                         [r].[Option] = [r0].[Option]
                         OR (
                                ([r].[Option] IS NULL)
                                AND ([r0].[Option] IS NULL)
                            )
                     )
       ) AS [Total]
FROM [Responses] AS [r]
    LEFT JOIN [Queries] AS [q]
        ON [r].[QueryId] = [q].[Id]
WHERE [q].[Id] IS NOT NULL
GROUP BY [r].[Option]

I’m puzzled as to why a subquery is being used for the SUM operation. I anticipated a simpler SQL output, something like:

SELECT [r].[Option] AS [Key],
       COALESCE(SUM(CAST([q].[Value] AS int)), 0) AS [Total]
FROM [Responses] AS [r]
    LEFT JOIN [Queries] AS [q]
        ON [r].[QueryId] = [q].[Id]
WHERE [q].[Id] IS NOT NULL
GROUP BY [r].[Option]

Is there a method to eliminate the subquery from the resulting SQL?

Have you tried updating your entity framework core version to the latest one? Sometimes these kinds of quirks are optimized in newer releases. Just remember that upgrading might involve additional changes in your project, but it could be worth it if a more efficient SQL is generated.

Another approach you could consider is using raw SQL queries with the FromSqlRaw method in EF Core. This enables you to write the exact SQL statement you expect, avoiding the subquery altogether. Although this reduces the abstraction provided by LINQ, it can be beneficial for performance and understanding the query execution plan. However, ensure you manage query inputs carefully to prevent SQL injection risks. This might help until EF Core gets improved capabilities in handling such LINQ-to-SQL translation.

Hey there, have you looked into how EF core handles nullable fields in LINQ queries? The subquery might have to do with that. maybe experimenting with different query structures or using compiler directives to see how they affect outputs. Any experiences to share with other ORM frameworks?

You can check if removing the nullable types from your model impacts the SQL generation. Nullable fields sometimes lead to unexpected SQL. It might require adjusting your business logic but could simplify the query translation.