hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chalcy Raja <Chalcy.R...@careerbuilder.com>
Subject RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically
Date Tue, 19 Jun 2012 01:32:52 GMT
I did figure out how to compress data from an uncomressed data in hive table.  I also created
a table as sequence file format.  

Is there a way to know if a hive table (hdfs file underneath) is in sequence file format?
 Describe extended table does not give the file format.

Thanks,
Chalcy

-----Original Message-----
From: Chalcy Raja [mailto:Chalcy.Raja@careerbuilder.com] 
Sent: Monday, June 18, 2012 3:28 PM
To: user@hive.apache.org; 'bejoy_ks@yahoo.com'
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Snappy with sequence file works well for us.  We'll have to decide which one suits our needs.
 

Is there a way to convert exiting hdfs in text format to convert to sequence files?

Thanks for all your input,
Chalcy  

-----Original Message-----
From: Chalcy Raja [mailto:Chalcy.Raja@careerbuilder.com]
Sent: Monday, June 18, 2012 1:47 PM
To: user@hive.apache.org; 'bejoy_ks@yahoo.com'
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

It is there.  I have io.compression.codecs in core-site.xml.  There is not error or warn in
the sqoop to hive import which indicates anything.  

The only reason we want to go to lzo is because snappy is not splittable.  

Thanks,
Chalcy

-----Original Message-----
From: Bejoy KS [mailto:bejoy_ks@yahoo.com]
Sent: Monday, June 18, 2012 10:39 AM
To: user@hive.apache.org
Subject: Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Hi Chalcy

Lzo indexing not working, Is Lzo codec class available in 'io.compression.codec' property
in core-site.xml?

Snappy is not splittable on its own. But sequence files are splittable so when used together
snappy gains the advantage of splittability. 

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Chalcy Raja <Chalcy.Raja@careerbuilder.com>
Date: Mon, 18 Jun 2012 14:31:36
To: user@hive.apache.org<user@hive.apache.org>; 'bejoy_ks@yahoo.com'<bejoy_ks@yahoo.com>
Reply-To: user@hive.apache.org
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index  automatically

Hi Bejoy,

The weird thing is I did not get any errors.  The sqoop import will not go to the second phase
where it creates lzo index.

We did deploy the native libraries, except hadoop-lzo lib which we copied after we built in
another machine.  We did the same thing on the test machine also.  

I'll try snappy with sequence file also.  Will snappy with sequence file is naturally splittable
on the block (one mapper per block)?

Yes, it is cumbersome to create lzo library, then create the file and then create index.

Thanks,
Chalcy

-----Original Message-----
From: Bejoy KS [mailto:bejoy_ks@yahoo.com]
Sent: Monday, June 18, 2012 10:04 AM
To: user@hive.apache.org
Subject: Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Hi Chalcy

Did you notice any warnings related to lzo codec on your mapreduce task logs or on sqoop logs?


It could be because LZO libs are not available on the TaskTracker nodes. These are native
libs and are tied to OS, so if you have done an OS upgrade then you need to rebuild and deploy
these native libs as well (a simple copy of native libs based older OS may not work as desired).

Like Edward suggested, snappy + sequence is a great combination.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Edward Capriolo <edlinuxguru@gmail.com>
Date: Mon, 18 Jun 2012 09:32:01
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Have you considered switching to sequence files using snappy compression (or lzo). IIRC the
process of generating LZO files and then generating an index on top of these is cumbersome.
When sequence files are directly splittable.

On Mon, Jun 18, 2012 at 9:16 AM, Chalcy Raja <Chalcy.Raja@careerbuilder.com> wrote:
> I am posting it here first and then may be on sqoop user group as well.
>
>
>
> I am trying to use lzo compression.
>
>
>
> Tested on a standalone by installing cdh3u3 and did sqoop to hive 
> import with lzo compression and everything works great. The data is 
> sqooped into hdfs and lzo index file got created and data is in hive table.
>
>
>
> Did all the lzo necessary steps on the main cluster where the server 
> already has cdh3u3 upgraded previously from cdh3u0 to cdh3u1 to cdh3u2 to cdh3u3.
> Did the same sqoop to hive with lzo compression.  Sqoop to hive works 
> but lzo index is not getting created.
>
>
>
> Need expert opinion. What could be the reason for this behavior. 
> Compared all the versions of hive, sqoop etc., and checked all the configuration.
> Looks like we are missing something.
>
>
>
> Thanks,
>
> Chalcy
>
>
>
>



Mime
View raw message