hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: column projections in RCFile
Date Mon, 28 Jun 2010 17:43:18 GMT
This is expected.
The reason we did that is because of easy implementation. Because that
way, hive will not need to compute array offset to get a column.

On Mon, Jun 28, 2010 at 10:11 AM, Ashutosh Chauhan
<ashutosh.chauhan@gmail.com> wrote:
> Hi,
>
> I am trying to use RCFile outside of realms of Hive.  Though I am
> still using column serde and column struct to get the row. I found
> that the way to tell RCFile the columns  I am interested in is through
> setting READ_COLUMN_IDS_CONF_STR key in jobconf. This worked except
> for one thing. If there are originally 5 columns in the data and I ask
> RCFile to project 3 columns out of it. I get back row of 5 columns
> with data in 3 columns I asked it to project and 2 nulls. I expected
> it to give me back row with exactly 3 columns. As a concrete example,
> assume data is as follows:
>
> 123 | 456 | "hadoop" | 23090L | 5.3D |
> and I ask to project column 0,2,4 I get back
> 123 | null | "hadoop" | null | 5.3D |
> instead I had expected to get:
> |123| "hadoop" | 5.3D |
>
> So, my question is this the expected behavior (or I am doing something
> wrong ?). If it is, then is this by design and  it is expected that
> "higher layers" (like hive) are expected to reconstruct the row with
> nulls weeded out.
>
> Thanks,
> Ashutosh
>

Mime
View raw message