I’m working on a Teradata SQL query that needs to handle flexible criteria. My project gets different types of criteria (C, M, P) but not always all three. I want the query to use only the provided criteria without failing when one is missing.
Here’s what I’ve got so far:
WITH
cte_c AS (SELECT c FROM criteria_c WHERE proj_id = 1),
cte_m AS (SELECT m FROM criteria_m WHERE proj_id = 1),
cte_p AS (SELECT p FROM criteria_p WHERE proj_id = 1)
SELECT * FROM main_data
JOIN cte_c ON main_data.c = cte_c.c
JOIN cte_m ON main_data.m = cte_m.m
JOIN cte_p ON main_data.p = cte_p.p;
The problem is if any CTE is empty, I get no results. How can I make the joins work only for non-empty CTEs?
For example, if I have criteria for C and P but not M, I want to ignore the M join and still get results.
My main_data table has over a million rows, so I need to filter efficiently based on the given criteria.
Any ideas on how to make this work? Thanks for the help!
hey, interesting challenge! maybe try left joins with coalesce, like:
SELECT * FROM main_data
LEFT JOIN cte_c ON main_data.c = cte_c.c
LEFT JOIN cte_m ON main_data.m = cte_m.m
LEFT JOIN cte_p ON main_data.p = cte_p.p
WHERE COALESCE(cte_c.c, cte_m.m, cte_p.p) IS NOT NULL;
what perf issues u seen?
Your approach using CTEs is a good start, but we can optimize it further for flexibility and performance. Consider using dynamic SQL to construct the query based on the available criteria. Here’s a high-level approach:
Create a stored procedure that accepts the project ID as a parameter.
Within the procedure, check which criteria tables have data for the given project ID.
Dynamically build the SQL string, including only the necessary joins.
Execute the dynamic SQL using EXECUTE IMMEDIATE.
This method ensures you’re only joining the relevant tables, reducing unnecessary processing. It’s particularly efficient for large datasets like yours. Remember to properly parameterize your dynamic SQL to prevent SQL injection vulnerabilities.
For even better performance, consider creating covering indexes on the criteria tables and the main_data table for the columns used in the joins.
yo zack, have u tried using EXISTS in ur WHERE clause? like this:
SELECT * FROM main_data
WHERE EXISTS (SELECT 1 FROM cte_c WHERE main_data.c = cte_c.c)
OR EXISTS (SELECT 1 FROM cte_m WHERE main_data.m = cte_m.m)
OR EXISTS (SELECT 1 FROM cte_p WHERE main_data.p = cte_p.p);
this way it’ll only check the criteria that exist. might be faster too. lmk if it helps!