hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <ashutosh.chau...@gmail.com>
Subject Re: column projections in RCFile
Date Mon, 28 Jun 2010 21:08:06 GMT
Thanks Yongqiang for quick reply. It was helpful.

Ashutosh
On Mon, Jun 28, 2010 at 10:43, yongqiang he <heyongqiangict@gmail.com> wrote:
> This is expected.
> The reason we did that is because of easy implementation. Because that
> way, hive will not need to compute array offset to get a column.
>
> On Mon, Jun 28, 2010 at 10:11 AM, Ashutosh Chauhan
> <ashutosh.chauhan@gmail.com> wrote:
>> Hi,
>>
>> I am trying to use RCFile outside of realms of Hive.  Though I am
>> still using column serde and column struct to get the row. I found
>> that the way to tell RCFile the columns  I am interested in is through
>> setting READ_COLUMN_IDS_CONF_STR key in jobconf. This worked except
>> for one thing. If there are originally 5 columns in the data and I ask
>> RCFile to project 3 columns out of it. I get back row of 5 columns
>> with data in 3 columns I asked it to project and 2 nulls. I expected
>> it to give me back row with exactly 3 columns. As a concrete example,
>> assume data is as follows:
>>
>> 123 | 456 | "hadoop" | 23090L | 5.3D |
>> and I ask to project column 0,2,4 I get back
>> 123 | null | "hadoop" | null | 5.3D |
>> instead I had expected to get:
>> |123| "hadoop" | 5.3D |
>>
>> So, my question is this the expected behavior (or I am doing something
>> wrong ?). If it is, then is this by design and  it is expected that
>> "higher layers" (like hive) are expected to reconstruct the row with
>> nulls weeded out.
>>
>> Thanks,
>> Ashutosh
>>
>

Mime
View raw message