hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susanne Lehmann <susanne.lehm...@metamarketsgroup.com>
Subject Re: problem w/ data load
Date Mon, 03 May 2010 02:06:55 GMT
I am using Hadoop on EC2 with pre-configured scripts. So I figured
out, that the properties are already set correctly (I am using gzip):

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>

Do you have another idea?



On Sun, May 2, 2010 at 1:52 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> You can find sample config from
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> Look for io.compression.codecs
>
> On Sun, May 2, 2010 at 1:28 PM, Susanne Lehmann <
> susanne.lehmann@metamarketsgroup.com> wrote:
>
>> No, I did't. Can you specify what exactly I have to do?
>> Thank you so much for your help!
>>
>>
>>
>>
>> On Sun, May 2, 2010 at 1:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> > Did you add codec for the compressed files into io.compression.codecs in
>> > hadoop
>> > configuration files (core-site.xml) ?
>> >
>> > On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann <
>> > susanne.lehmann@metamarketsgroup.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I want to load data from HDFS to Hive, the data is in compressed files.
>> >> The data is stored in flat files, the delimiter is ^A (ctrl-A).
>> >> As long as I use de-compressed files everything is working fine. Since
>> >> ctrl-A is the default delimiter I even don't need a specification for
>> >> it.  I do the following:
>> >>
>> >>
>> >> hadoop dfs -put /test/file new
>> >>
>> >> hive>  DROP TABLE test_new;
>> >> OK
>> >> Time taken: 0.057 seconds
>> >> hive>    CREATE TABLE test_new(
>> >>    >        bla  int,
>> >>    >        bla            string,
>> >>    >        etc
>> >>    >        bla      string);
>> >> OK
>> >> Time taken: 0.035 seconds
>> >> hive> LOAD DATA INPATH "/test/file" INTO TABLE test_new;
>> >> Loading data to table test_new
>> >> OK
>> >> Time taken: 0.063 seconds
>> >>
>> >> But if I do the same with the same file compressed it's not working
>> >> anymore. I tried tons of different table definitions with the
>> >> delimiter specified, but it doesn't go. The load itself works, but the
>> >> data is always NULL, so there is a delimiter problem I conclude.
>> >>
>> >>  Any help is greatly appreciated!
>> >>
>> >
>>
>

Mime
View raw message