hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: Building Custom RCFiles
Date Fri, 18 Mar 2011 22:49:40 GMT
what's your table definition?

http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table

See ROW FORMAT


Thanks
Yongqiang
On Fri, Mar 18, 2011 at 3:33 PM, Severance, Steve <sseverance@ebay.com> wrote:
> One more question. I have everything working except a Map<String,String>.
>
> I understand that the whole Map will be physically stored as a single Text object in
the RCFile.
>
> I have had considerable trouble setting up the delimiters for this Map.
>
> I want to have
>        MAP KEYS TERMINATED BY '='
>        COLLECTION ITEMS TERMINATED BY '&'
>
> Hive doesn't seem to want to take that. I have also tried using the ascii OCT codes.
>
> What do I need to setup to make this Map work?
>
> Thanks.
>
> Steve
>
> -----Original Message-----
> From: yongqiang he [mailto:heyongqiangict@gmail.com]
> Sent: Thursday, March 17, 2011 5:09 PM
> To: user@hive.apache.org
> Subject: Re: Building Custom RCFiles
>
> Yes. It is the same with normal hive tables.
>
> thanks
> yongqiang
> On Thu, Mar 17, 2011 at 4:54 PM, Severance, Steve <sseverance@ebay.com> wrote:
>> Thanks Yongqiang.
>>
>> So for more complex types like map do I just setup a
>>
>> ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc...
>>
>> Thanks.
>>
>> Steve
>>
>> -----Original Message-----
>> From: yongqiang he [mailto:heyongqiangict@gmail.com]
>> Sent: Thursday, March 17, 2011 4:35 PM
>> To: user@hive.apache.org
>> Subject: Re: Building Custom RCFiles
>>
>> A side note, in hive, we make all columns saved as Text internally
>> (even the column's type is int or double etc). And with some
>> experiments, string is more friendly to compression. But it needs CPU
>> to decode to its original type.
>>
>> Thanks
>> Yongqiang
>> On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he <heyongqiangict@gmail.com> wrote:
>>> You need to customize Hive's ColumnarSerde (maybe functions in
>>> LazySerde)'s serde and deserialize function (depends you want to read
>>> or write.). And the main thing is that you need to use your own type
>>> def (not LazyInt/LazyLong).
>>>
>>> If your type is int or long (not double/float), casting it to string
>>> only wastes some CPU, but can save you more spaces.
>>>
>>> Thanks
>>> Yongqiang
>>> On Thu, Mar 17, 2011 at 3:48 PM, Severance, Steve <sseverance@ebay.com>
wrote:
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I am working on building a MR job that generates RCFiles that will become
>>>> partitions of a hive table. I have most of it working however only strings
>>>> (Text) are being deserialized inside of Hive. The hive table is specified
to
>>>> use a columnarserde which I thought should allow the writable types stored
>>>> in the RCFile to be deserialized properly.
>>>>
>>>>
>>>>
>>>> Currently all numeric types (IntWritable and LongWritable) come back a null.
>>>>
>>>>
>>>>
>>>> Has anyone else seen anything like this or have any ideas? I would rather
>>>> not convert all my data to strings to use RCFile.
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> Steve
>>>
>>
>

Mime
View raw message