Merging data from multiple SQL databases in R using dplyr: What's the trick?

Hey folks, I’m scratching my head over this one. I’m trying to combine tables from different SQL databases using R and dplyr, but I’m hitting a wall.

I’ve got dplyr 0.7.0, dbplyr 1.0.0, DBI 0.6-1, and odbc 1.0.1.9000 installed. Here’s what I’m attempting:

connection1 <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "MyBox",
  Database = "FirstDB"
)

connection2 <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "MyBox",
  Database = "SecondDB"
)

result <- tbl(connection1, "TableA") %>%
  dplyr::left_join(tbl(connection2, "TableB"), by = "LinkColumn")

But it’s not working. The error message is vague and unhelpful. When I use show_query, it looks like it’s trying to join the tables without considering they’re in separate databases.

I’ve also tried adding copy = TRUE to the join, like this:

result <- tbl(connection1, "TableA") %>%
  dplyr::left_join(tbl(connection2, "TableB"), by = "LinkColumn", copy = TRUE)

But no luck there either. Same error, same problem.

Any ideas on how to merge tables from different databases on the same server? I’m stumped!

Hmm, that’s a tricky one! Have u tried using dplyr::collect() before joining? Like this:

result <- tbl(connection1, "TableA") %>%
  collect() %>%
  left_join(tbl(connection2, "TableB") %>% collect(), by = "LinkColumn")

This might work cuz it pulls data into R’s memory first. What do you think? Does it solve ur problem or create new ones?

yo, have u tried using SQL Server’s linked servers feature? it lets u query across databases. you could set up a linked server for SecondDB, then do the join in a single SQL query. might be easier than messing with R. just a thought!

I encountered a similar issue while working with multiple SQL databases in R. The solution that worked for me was to create temporary tables. I first created a temporary table in one of the databases (for example, FirstDB), then inserted the data from SecondDB.TableB into this temporary table, and finally performed the join operation all within FirstDB. The code might look like this:

DBI::dbExecute(connection1, 'CREATE TABLE #TempTableB (... columns ...)')
DBI::dbExecute(connection1, 'INSERT INTO #TempTableB SELECT * FROM SecondDB.dbo.TableB')

result <- tbl(connection1, 'TableA') %>%
  left_join(tbl(connection1, '#TempTableB'), by = 'LinkColumn')

This method keeps all operations within a single database context and should effectively resolve the cross-database join issues.