SQL: Computing mean of previous entries, excluding current value?

Hey folks! I’m stuck on a SQL problem. I’ve got this table with columns for id, value, and date. What I’m trying to do is find the average of all the values for each id, but only for the dates before the most recent one. Here’s a quick example of what I mean:

CREATE TABLE sample_data (
    user_id INT,
    score INT,
    entry_date DATE
);

INSERT INTO sample_data VALUES
(101, 80, '2023-03-01'),
(101, 75, '2023-02-28'),
(101, 85, '2023-02-27'),
(102, 90, '2023-03-01'),
(102, 88, '2023-02-28'),
(102, 92, '2023-02-27');

For user 101, I want to get the average of 75 and 85, ignoring the 80 from the most recent date.

I tried using ROWS N PRECEDING but it’s including the current row in the calculation. Any ideas on how to exclude that last entry? Thanks in advance for your help!

Your approach is on the right track, but you might want to consider using a window function for better performance, especially with larger datasets. Here’s an alternative solution:

SELECT user_id, AVG(score) as avg_score
FROM (
    SELECT user_id, score, entry_date,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY entry_date DESC) as rn
    FROM sample_data
) sub
WHERE rn > 1
GROUP BY user_id

This method uses ROW_NUMBER() to rank the entries for each user by date, then excludes the most recent entry (where rn = 1) before calculating the average. It’s efficient and scales well with larger datasets.

hey there! have u considered using a lag function? it might be a neat solution:

SELECT user_id, AVG(prev_score) as avg_score
FROM (
SELECT *, LAG(score) OVER (PARTITION BY user_id ORDER BY entry_date) as prev_score
FROM sample_data
)
WHERE prev_score IS NOT NULL
GROUP BY user_id

what do you think? this could work for your problem, right?

hey CreativeChef89, have u tried using a subquery to get the max date for each user_id, then joining that back to exclude the latest entry? smthn like:

SELECT user_id, AVG(score) as avg_score
FROM sample_data s
WHERE entry_date < (SELECT MAX(entry_date) FROM sample_data WHERE user_id = s.user_id)
GROUP BY user_id

might work for ya!