I’m having trouble setting up Data Fusion to talk to our SQL Server that runs on our own servers. We use a custom VPC instead of the default Google Cloud network.
This is for a healthcare app so security matters a lot. We already have App Engine Flex working fine with pymssql to talk to the same database through our VPC. Now I want to try Data Fusion for the same thing.
Here’s what I did so far:
- Added the Data Fusion service account to IAM with the Cloud Data Fusion API Service Agent role
- Set the network config in Data Fusion:
system.profile.properties.network = <my_vpc_name>
- Double checked that my SQL Server login and password work
- Made sure the VPC allows traffic on ports 22 and 1433
But I keep getting this timeout error when trying to connect:
Connection timeout occurred. Please check your connection settings. Ensure SQL Server is running on the target host and accepts TCP/IP connections on the specified port. Verify that no firewall is blocking TCP connections to this port.
I just need to get a basic connection working so I can run some test queries. Any ideas what might be wrong?
Interesting that App Engine Flex works but Data Fusion doesn’t… Are you sure the Dataproc workers that Data Fusion spins up can actually reach your SQL server? Check if the service account has compute.instanceAdmin permissions. Also, what IP range is your SQL server on? Does your VPC have the right routing tables for that range?
Had the same prob last month. Check if your SQL server accepts remote connections - it’s often set to local only by default. Also make sure TCP/IP protocol is enabled in SQL Server Configuration Manager, not just the port.
The timeout indicates that Data Fusion cannot reach your SQL Server. Given that App Engine Flex is functioning properly through the same VPC, the problem likely lies in the Data Fusion network configuration rather than with the database itself. It’s essential to ensure that your Data Fusion instance has VPC peering established correctly for your custom network. Merely configuring system.profile.properties.network might not suffice if the Dataproc cluster cannot route traffic to your on-premises database. Verify that your VPC has the appropriate routes to your SQL Server and that the firewall rules permit traffic from Data Fusion’s IP ranges, not just specific ports. Additionally, consider enabling Private Google Access on the subnet used by Data Fusion. Testing with a basic pipeline to a cloud SQL Server could help determine whether the issue is related to general connectivity or specifically to your on-premises configuration.