hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: problem w/ data load
Date Mon, 03 May 2010 18:29:54 GMT
On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann <
susanne.lehmann@metamarketsgroup.com> wrote:

> Hi Tom,
>
> Yes. I store the file in HDFS with a .gz extension. Do i need to
> "tell" somehow Hive that it is a compressed file?
>
> Best,
> Susanne
>
> PS: Thanks for the tip with the list, I will use the other list for
> further questions if necessary. I wasn't sure which one to use.
>
> On Mon, May 3, 2010 at 9:52 AM, Tom White <tom@cloudera.com> wrote:
> > Hi Susanne,
> >
> > Hadoop uses the file extension to detect that a file is compressed. I
> > believe Hive does too. Did you store the compressed file in HDFS with
> > a .gz extension?
> >
> > Cheers,
> > Tom
> >
> > BTW It's best to send Hive questions like these to the hive-user@ list.
> >
> > On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann
> > <susanne.lehmann@metamarketsgroup.com> wrote:
> >> Hi,
> >>
> >> I want to load data from HDFS to Hive, the data is in compressed files.
> >> The data is stored in flat files, the delimiter is ^A (ctrl-A).
> >> As long as I use de-compressed files everything is working fine. Since
> >> ctrl-A is the default delimiter I even don't need a specification for
> >> it.  I do the following:
> >>
> >>
> >> hadoop dfs -put /test/file new
> >>
> >> hive>  DROP TABLE test_new;
> >> OK
> >> Time taken: 0.057 seconds
> >> hive>    CREATE TABLE test_new(
> >>    >        bla  int,
> >>    >        bla            string,
> >>    >        etc
> >>    >        bla      string);
> >> OK
> >> Time taken: 0.035 seconds
> >> hive> LOAD DATA INPATH "/test/file" INTO TABLE test_new;
> >> Loading data to table test_new
> >> OK
> >> Time taken: 0.063 seconds
> >>
> >> But if I do the same with the same file compressed it's not working
> >> anymore. I tried tons of different table definitions with the
> >> delimiter specified, but it doesn't go. The load itself works, but the
> >> data is always NULL, so there is a delimiter problem I conclude.
> >>
> >>  Any help is greatly appreciated!
> >>
> >
>

If your file is a text file that is simply gzipped you create your table as
normal

create table XXXX stored as textfile.

If your file is a sequence file using block compression (gzip) you

create table XXXX stored as sequencefile.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message