hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivas Surasani <vas...@gmail.com>
Subject Re: Parallel CSV loader
Date Wed, 25 Jan 2012 01:31:50 GMT
Edmen,

Parallel Databases ( Teradata, Netezza..)??  I believe if you use Sqoop
(with JBDC) for loading you cannot achieve parallelism since table gets
dead locks by specifying more mappers. But you can use Sqoop + Parallel
Database Connector ( you find them on Cloudera site ) to achieve the native
loading utilities. for example you can achieve Teradata fast loader utility
with Sqoop and Teradata Connector.

Srinivas --

On Tue, Jan 24, 2012 at 12:38 PM, Harsh J <harsh@cloudera.com> wrote:

> Agree. Apache Sqoop is what you're looking for:
> http://incubator.apache.org/sqoop/
>
> On Tue, Jan 24, 2012 at 10:51 PM, Prashant Kommireddi
> <prash1784@gmail.com> wrote:
> > I am assuming you want to move data between Hadoop and database.
> > Please take a look at Sqoop.
> >
> > Thanks,
> > Prashant
> >
> > Sent from my iPhone
> >
> > On Jan 24, 2012, at 9:19 AM, Edmon Begoli <ebegoli@gmail.com> wrote:
> >
> >> I am looking to use Hadoop for parallel loading of CSV file into a
> >> non-Hadoop, parallel database.
> >>
> >> Is there an existing utility that allows one to pick entries,
> >> row-by-row, synchronized and in parallel and load into a database?
> >>
> >> Thank you in advance,
> >> Edmon
>
>
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message