Finding the nearest 'sleep' record after each 'eat' action in SQL

I’m working with a table called activity_log that tracks people’s actions. It has columns for id, person, and activity. I want to match each ‘eat’ entry with the next ‘sleep’ entry for the same person.

Here’s what I’ve tried:

SELECT 
    a.person, 
    a.id AS meal_id, 
    (SELECT MIN(id) FROM activity_log b 
     WHERE a.person = b.person AND b.id > a.id AND b.activity = 'rest') AS rest_id 
FROM activity_log a 
WHERE a.activity = 'meal'
ORDER BY a.id;

This works but uses a subquery. It might be slow with lots of data. Is there a faster way to do this? Maybe using a join or temp table? I can add indexes if needed.

I’m looking for a solution that’s more efficient and easier to expand if I need more info from the ‘rest’ entries. Any ideas?

yo silvia! another approach could be using LEAD() function. it’d look like:

SELECT person, id AS meal_id, LEAD(id) OVER (PARTITION BY person ORDER BY id) AS rest_id
FROM activity_log
WHERE activity IN (‘meal’, ‘rest’)
ORDER BY id;

this might be faster than subqueries. give it a shot and lemme know how it goes!

hey there! have you considered using a window function for this? something like ROW_NUMBER() OVER (PARTITION BY person ORDER BY id) might help match eat and sleep entries more efficiently. what database system are you using? some systems have specific optimizations for this kinda thing. have you tried any performance testing yet?

Have you considered using a self-join approach? It might offer better performance for larger datasets. Here’s a potential solution:

SELECT a.person, a.id AS meal_id, MIN(b.id) AS rest_id
FROM activity_log a
JOIN activity_log b ON a.person = b.person AND a.id < b.id
WHERE a.activity = ‘meal’ AND b.activity = ‘rest’
GROUP BY a.person, a.id
ORDER BY a.id;

This query joins the table with itself, matching each ‘meal’ entry to all subsequent ‘rest’ entries for the same person. The MIN function then selects the nearest ‘rest’ entry. Adding appropriate indexes on the person, id, and activity columns could further optimize this query. Consider testing this against your current method to compare performance.