hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: DBOutputFormat over SSH?
Date Wed, 28 Apr 2010 15:35:32 GMT
Gautam,

Answers inline.

Thanks and Regards,
Sonal
www.meghsoft.com


On Mon, Apr 26, 2010 at 8:55 PM, Gautam Singaraju <
gautam.singaraju@gmail.com> wrote:

> I should take a look at Vaidya for performance analysis, but yes,
> inserting using LOAD DATA LOCAL INFILE is got to be the fastest. My
> numbers show its atleast 9-10 times faster than the next mechanism. I
> am loading to an empty table, without much indexes; otherwise, I might
> need to disable indexing and re-enable indexing after the load data.
>
> I am yet to check out Hivo; and got a couple of questions. For using
> inload functionality of MySQL, I had to copy the results to local and
> sequentially loaded it to MYSQL. Does Hivo perform a parallel Load
> Data Local? Does the reducers perform this task upon close? That would
> mean multiple connections to the DB and could be faster.
>
> yes, parallel load data local. HDFS files are loaded in parallel to the
MySQL database. For efficiency, loading is in map only job. Number of
mappers are determined by number of files.


> Thanks!
> ---
> Gautam
>
>
>
> On Fri, Apr 23, 2010 at 12:29 AM, Eric Sammer <esammer@cloudera.com>
> wrote:
> > In general, you'll want to avoid tunneling permanent production code
> > over ssh tunnels. They're flaky and do not recover from network
> > interruption in any reasonable way. If you need to do this, a vpn is
> > the correct approach. Linux easily will do ipsec p2p tunnels that are
> > reasonably secure. If you really only have port 22 then I suppose
> > that's your only option but I really would reevaluate the security
> > policy.
> >
> > Either way, it's going to be slow due to the encryption overhead but
> > if it's a small amount of data, that may be fine.
> >
> > On Fri, Apr 23, 2010 at 12:18 AM, Gautam Singaraju
> > <gautam.singaraju@gmail.com> wrote:
> >> All,
> >>
> >> I have a use-case where I need to crunch a large amount of data and
> >> push to the results (comparatively a smaller set) to a mysql db at a
> >> remote location. As per security concerns, only SSH ports are open. I
> >> tried using Java Secure Channel [1] in combination with some custom
> >> JDBC code from the reducers.
> >>
> >> Can anyone comment on the performance of DBOutputFormat? Have there
> >> been any efforts to tunnel this through SSH? This is going to be an
> >> expensive operation; any suggestions would be welcome.
> >>
> >> [1] http://www.jcraft.com/jsch/
> >> ---
> >> Gautam Singaraju
> >>
> >
> >
> >
> > --
> > Eric Sammer
> > phone: +1-917-287-2675
> > twitter: esammer
> > data: www.cloudera.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message