SAS SQL: Optimize counting with conditional grouping to avoid data retrieval

Max_Energetic · July 24, 2025, 3:45pm

I’m working with a huge database table and need to count records efficiently. My basic counting query works fine when grouping by existing columns:

select p.category,
       p.status,
       count(p.recordid) as total_records
from datatable as p
group by p.category, p.status

However, my real challenge involves grouping by a computed field using conditional logic. Here’s my current approach:

SELECT final.category, 
       final.status, 
       Count(recordid) AS record_total 
FROM   (
        SELECT p.category, 
               CASE 
                 WHEN p.vendor = 'toyota' 
                      AND Left(p.serialcode, 4) <> 'XXXX' THEN 'active' 
                 WHEN Left(p.serialcode, 5) = 'YYYYY' THEN 'pending' 
                 WHEN Left(p.serialcode, 3) = 'ZZZ' THEN 'complete' 
                 ELSE 'inactive' 
                 END AS status, 
               Left(p.recordid, 8) AS recordid 
        FROM   datatable AS p
        ) AS final 
GROUP  BY final.category, final.status

This query creates a subquery that processes all records first, then groups them. The problem is performance - it takes over 4 hours to process a year’s worth of data because it pulls everything into memory before counting.

Is there a way to do server-side counting without the derived table? I want to avoid pulling all records and then grouping them. The current method is extremely slow with millions of records.

Leo_Curious · July 30, 2025, 6:51am

Your performance problem is that SAS SQL evaluates the CASE statement for every single row before grouping. You can fix this by moving the logic into the GROUP BY clause with computed columns and indexing hints.

Here’s a better way to structure your query:

SELECT p.category,
       CASE 
         WHEN p.vendor = 'toyota' AND Left(p.serialcode, 4) <> 'XXXX' THEN 'active'
         WHEN Left(p.serialcode, 5) = 'YYYYY' THEN 'pending' 
         WHEN Left(p.serialcode, 3) = 'ZZZ' THEN 'complete'
         ELSE 'inactive'
       END AS status,
       COUNT(*) AS record_total
FROM datatable AS p
GROUP BY p.category, CALCULATED status;

The magic is CALCULATED status in the GROUP BY - this tells SAS to reference the computed column directly instead of creating a derived table. Also throw some indexes on category, vendor, and serialcode columns if you haven’t already. This should cut your processing time way down since it skips the intermediate result set.

Ryan_Nebula · July 29, 2025, 7:34am

interesting performance issue! try conditional aggregation instead of subqueries - sum(case when conditions then 1 else 0 end) should cut down on memory overhead. what’s your execution plan showing? full table scan? also, do you have indexes on serialcode and vendor columns?

ZoeString42 · July 28, 2025, 7:25pm

the subquery’s tanking your performance - it’s materializing every row upfront. switch to proc sql with a summary step instead. it handles conditional counting way better than regular sql. try proc sql; create table counts as select category, computed_status, count(*) from datatable group by category, computed_status; - should run faster depending on your sas version.