hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Singaraju <gautam.singar...@gmail.com>
Subject Re: DBOutputFormat over SSH?
Date Mon, 26 Apr 2010 15:25:58 GMT
I should take a look at Vaidya for performance analysis, but yes,
inserting using LOAD DATA LOCAL INFILE is got to be the fastest. My
numbers show its atleast 9-10 times faster than the next mechanism. I
am loading to an empty table, without much indexes; otherwise, I might
need to disable indexing and re-enable indexing after the load data.

I am yet to check out Hivo; and got a couple of questions. For using
inload functionality of MySQL, I had to copy the results to local and
sequentially loaded it to MYSQL. Does Hivo perform a parallel Load
Data Local? Does the reducers perform this task upon close? That would
mean multiple connections to the DB and could be faster.


On Fri, Apr 23, 2010 at 12:29 AM, Eric Sammer <esammer@cloudera.com> wrote:
> In general, you'll want to avoid tunneling permanent production code
> over ssh tunnels. They're flaky and do not recover from network
> interruption in any reasonable way. If you need to do this, a vpn is
> the correct approach. Linux easily will do ipsec p2p tunnels that are
> reasonably secure. If you really only have port 22 then I suppose
> that's your only option but I really would reevaluate the security
> policy.
> Either way, it's going to be slow due to the encryption overhead but
> if it's a small amount of data, that may be fine.
> On Fri, Apr 23, 2010 at 12:18 AM, Gautam Singaraju
> <gautam.singaraju@gmail.com> wrote:
>> All,
>> I have a use-case where I need to crunch a large amount of data and
>> push to the results (comparatively a smaller set) to a mysql db at a
>> remote location. As per security concerns, only SSH ports are open. I
>> tried using Java Secure Channel [1] in combination with some custom
>> JDBC code from the reducers.
>> Can anyone comment on the performance of DBOutputFormat? Have there
>> been any efforts to tunnel this through SSH? This is going to be an
>> expensive operation; any suggestions would be welcome.
>> [1] http://www.jcraft.com/jsch/
>> ---
>> Gautam Singaraju
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com

View raw message