crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn Lenderink <martijnru...@gmail.com>
Subject JDBC parallel
Date Mon, 18 Mar 2013 13:50:01 GMT
Hello,

I have a working JDBC-connection to get data from an MSSQL source.
Its all works great except my cluster only opens one connection to the
MSSQL server.

I have multiple nodes running but the data gets pulled only from one node
and then the data get send to other nodes for processing.

I'am using code similar to the following:
https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java

The only difference is the i'am using the DataDrivenDBInputFormat.

When i debug the source-code the query gets split into multiple queries but
only get executed on one machine.
Why isn't this executed in parallel with multiple connections to the MSSQL
server?

Greetings,
Martijn Lenderink

Mime
View raw message