crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: JDBC parallel
Date Mon, 18 Mar 2013 16:48:30 GMT
Hey Martjin,

I don't have any intuition on this one-- is this code that you could post
as a gist or something so I could play with it and see if I see anything
amiss? The trick will be figuring out if the problem is in Crunch, the
underlying DB library, or the config.


On Mon, Mar 18, 2013 at 6:50 AM, Martijn Lenderink

> Hello,
> I have a working JDBC-connection to get data from an MSSQL source.
> Its all works great except my cluster only opens one connection to the
> MSSQL server.
> I have multiple nodes running but the data gets pulled only from one node
> and then the data get send to other nodes for processing.
> I'am using code similar to the following:
> The only difference is the i'am using the DataDrivenDBInputFormat.
> When i debug the source-code the query gets split into multiple queries
> but only get executed on one machine.
> Why isn't this executed in parallel with multiple connections to the MSSQL
> server?
> Greetings,
> Martijn Lenderink

Director of Data Science
Cloudera <>
Twitter: @josh_wills <>

View raw message