Hey folks, I’m struggling to turn my Python script into an SQL query. Here’s what I’m trying to do:
-- Sample table structure
CREATE TABLE tickets (
issue_id INT,
timestamp DATETIME,
state VARCHAR(10)
);
-- Example data
INSERT INTO tickets VALUES
(1001, '2024-03-01 10:00:00', 'New'),
(1001, '2024-03-02 14:30:00', 'In Progress'),
(1001, '2024-03-05 09:15:00', 'Resolved'),
(1002, '2024-03-03 11:45:00', 'New'),
(1002, '2024-03-04 16:20:00', 'Resolved');
My goal is to match ‘New’ states with the next ‘Resolved’ state for each issue_id. The tricky part is that an issue can go through this cycle multiple times.
I’ve tried using RANK() in SQL, but it only catches the first pair. Any ideas on how to capture all pairs? I’m using Snowflake, if that helps.
Thanks in advance for any suggestions!
ooh, interesting problem! have you considered using LAG() or LEAD() functions? they could help match ‘new’ with the next ‘resolved’ state. maybe combine it with a CTE to handle multiple cycles? just brainstorming here. what other approaches have you tried?
To capture all pairs of ‘New’ and ‘Resolved’ states for each issue_id, you could use a combination of window functions and a self-join. One approach is to assign a unique rank to each ‘New’ and ‘Resolved’ record using ROW_NUMBER() over partitions defined by issue_id, then join these results so that each ‘New’ record pairs with the subsequent ‘Resolved’ record, ensuring that the ‘Resolved’ timestamp follows the ‘New’ timestamp. This technique should handle cases where an issue goes through multiple cycles. Adjust the query for your specific Snowflake SQL dialect and data, and test thoroughly to confirm accuracy.
hey, you can try a self-join, pairing a ‘new’ record with the next ‘resolved’. you might use a NOT EXISTS condition to avoid an in-between status. might need tweaks based on your data. hope that helps!