How to limit DB2 query results to retrieve records for single identifier value at a time

Need Help with DB2 Query Modification

I’m working on a data processing task where I need to pull information from multiple database tables and send it to another system. My current query works but returns too many records at once.

Current Setup:

  • Orders table: contains order_id, item_seq, process_date, description, notes
  • Items table: has order_id, category, ship_date
  • Status table: includes order_id, step_num, update_date

Right now I’m using this query structure:

select order_id, item_seq, process_date, description, notes, category, ship_date, step_num, update_date
  from orders
    left outer join items
      on (orders.order_id = items.order_id) and (orders.item_seq = items.item_seq)
    left outer join status  
      on (orders.order_id = status.order_id)
  where process_date = '2019-01-01'
Except
select order_id, item_seq, process_date, description, notes, category, ship_date, step_num, update_date
  from orders
    left outer join items
      on (orders.order_id = items.order_id) and (orders.item_seq = items.item_seq) 
    left outer join status
      on (orders.order_id = status.order_id)
  where process_date = '2019-01-01'
    and orders.notes = 'COMPLETE'

This returns hundreds of rows across many different order numbers. But I want to process one order at a time. How can I modify this to only get records for the first order_id, then after I process those and update their status, the next query run would get the next order_id?

For example, if my result set has orders 1001, 1002, 1003, I want just the 1001 records first time, then 1002 records next time, etc.

add order by order_id limit 1 to your query, but you’ll need to track which orders u’ve already processed. store the last processed order_id somewhere and use where order_id > last_processed_id in your next query.

I encountered a similar challenge when building an order processing system. A solution that proved effective was adding a processing_status column to your orders table, allowing values such as ‘PENDING’, ‘PROCESSING’, and ‘COMPLETE’. Whenever you retrieve an order, change its status to ‘PROCESSING’, and then to ‘COMPLETE’ upon finishing. Your query can then be simplified to: SELECT * FROM orders WHERE processing_status = 'PENDING' ORDER BY order_id FETCH FIRST 1 ROWS ONLY. This approach is advantageous over tracking last processed IDs, as it offers better handling of failures and concurrent processing. In case of a crash during processing, you can easily identify and retry any orders stuck in the ‘PROCESSING’ state.

Interesting problem! What if processing fails halfway through an order though? Got any rollback mechanism for partial processing? Also - are you running this multi-threaded where different processes might grab the same order at once?