I’m struggling with a performance issue in SQL Server. I need to update a column in a big table (150 million rows) based on data from a smaller table (300,000 rows). Here’s what I’m doing:
UPDATE bigTable
SET targetColumn = smallTable.sourceColumn
FROM smallTable
WHERE bigTable.key1 = smallTable.key1
AND bigTable.key2 = smallTable.key2;
I’ve added indexes to the smaller table to speed things up. When I tested it on a million rows, it took about 20 seconds. But running it on the full dataset has been going for over two days without finishing.
Is there a faster way to do this? Maybe breaking it into smaller chunks or using a different approach? I’m pretty new to SQL Server, so any tips would be really helpful.
One approach you might consider is using a temporary table or table variable to stage the updates. First, insert the matching rows from both tables into this temporary structure. Then, perform the update on the big table using this staging table. This method can often be more efficient for large-scale updates.
Additionally, ensure you have appropriate indexes on both tables, especially covering the join conditions and the columns involved in the update. You might also want to look into table partitioning if your data lends itself to that structure.
Lastly, consider running the update during off-peak hours and setting a longer timeout for the query. Sometimes, these operations just need more time to complete, especially on large datasets.
have u considered using a merge statement instead? it might be more efficient for big dataset updates. also, what’s ur hardware setup like? sometimes throwing more resources at the problem can help. curious to hear if u’ve tried any other approaches yet?
hey, try batching your updates - break it into smaller chunks, like 100k rows each time. also, make sure your indexes match the join keys. if you can, partition the big table too. hope this helps, cheers!