crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: JDBC parallel
Date Mon, 18 Mar 2013 16:48:30 GMT
Hey Martjin,

I don't have any intuition on this one-- is this code that you could post
as a gist or something so I could play with it and see if I see anything
amiss? The trick will be figuring out if the problem is in Crunch, the
underlying DB library, or the config.

J


On Mon, Mar 18, 2013 at 6:50 AM, Martijn Lenderink
<martijnrules@gmail.com>wrote:

> Hello,
>
> I have a working JDBC-connection to get data from an MSSQL source.
> Its all works great except my cluster only opens one connection to the
> MSSQL server.
>
> I have multiple nodes running but the data gets pulled only from one node
> and then the data get send to other nodes for processing.
>
> I'am using code similar to the following:
>
> https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java
>
> The only difference is the i'am using the DataDrivenDBInputFormat.
>
> When i debug the source-code the query gets split into multiple queries
> but only get executed on one machine.
> Why isn't this executed in parallel with multiple connections to the MSSQL
> server?
>
> Greetings,
> Martijn Lenderink
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message