hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Singaraju <gautam.singar...@gmail.com>
Subject Re: DBOutputFormat over SSH?
Date Mon, 26 Apr 2010 15:25:58 GMT
I should take a look at Vaidya for performance analysis, but yes,
inserting using LOAD DATA LOCAL INFILE is got to be the fastest. My
numbers show its atleast 9-10 times faster than the next mechanism. I
am loading to an empty table, without much indexes; otherwise, I might
need to disable indexing and re-enable indexing after the load data.

I am yet to check out Hivo; and got a couple of questions. For using
inload functionality of MySQL, I had to copy the results to local and
sequentially loaded it to MYSQL. Does Hivo perform a parallel Load
Data Local? Does the reducers perform this task upon close? That would
mean multiple connections to the DB and could be faster.

Thanks!
---
Gautam



On Fri, Apr 23, 2010 at 12:29 AM, Eric Sammer <esammer@cloudera.com> wrote:
> In general, you'll want to avoid tunneling permanent production code
> over ssh tunnels. They're flaky and do not recover from network
> interruption in any reasonable way. If you need to do this, a vpn is
> the correct approach. Linux easily will do ipsec p2p tunnels that are
> reasonably secure. If you really only have port 22 then I suppose
> that's your only option but I really would reevaluate the security
> policy.
>
> Either way, it's going to be slow due to the encryption overhead but
> if it's a small amount of data, that may be fine.
>
> On Fri, Apr 23, 2010 at 12:18 AM, Gautam Singaraju
> <gautam.singaraju@gmail.com> wrote:
>> All,
>>
>> I have a use-case where I need to crunch a large amount of data and
>> push to the results (comparatively a smaller set) to a mysql db at a
>> remote location. As per security concerns, only SSH ports are open. I
>> tried using Java Secure Channel [1] in combination with some custom
>> JDBC code from the reducers.
>>
>> Can anyone comment on the performance of DBOutputFormat? Have there
>> been any efforts to tunnel this through SSH? This is going to be an
>> expensive operation; any suggestions would be welcome.
>>
>> [1] http://www.jcraft.com/jsch/
>> ---
>> Gautam Singaraju
>>
>
>
>
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>

Mime
View raw message