I am working on partitioning existing records into smaller, distinct sets but am encountering challenges. I have achieved partial success with various grouping and window functions, yet the output still does not match my expectations. Below is an example of my current table and the output I want to achieve.
-- Current Table Data
Member event_date Categories
Alice 2024-02-10 X,Y,Z
Bob 2024-02-12 X,Y,Z
Alice 2024-02-14 X,Y,Z,X,Y,Z
Bob 2024-02-16 X,Y,Z,X,Y,Z,X,Y,Z
-- Desired Output
Member event_date Categories
Alice 2024-02-10 X,Y,Z
Bob 2024-02-12 X,Y,Z
Alice 2024-02-14 X,Y,Z
Alice 2024-02-14 X,Y,Z
Bob 2024-02-16 X,Y,Z
Bob 2024-02-16 X,Y,Z
Bob 2024-02-16 X,Y,Z
I would appreciate guidance on how to write a PostgreSQL query that correctly splits and groups the data as shown in the desired result.
One solution that worked for me started by converting the Categories column into an array using the string_to_array function. I then used the unnest function to split these values into multiple rows and applied a window function such as row_number() to assign a unique sequence number. By grouping on a derived index (for example, floor((row_number()-1)/3) if you expect three values per set), I was able to reassemble the grouped sets exactly as desired. This method gives more flexibility in handling data with varying category counts.
hey flyingeagle, u might try regexp_split_to_table with a lateral join. it lets u break the categories in a neat way and then reassemble them by member and event_date. it worked well for me when dealing with repetitive elements
hey, i tried using a recursive cte to split rows and then reassemble them by adding a row counter. it felt more intuitive than grouping in a single query. have any of you played around with recursive approaches and found interesting twists in your data splitting?