hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <>
Subject [jira] Commented: (HIVE-756) performance improvement for RCFile and ColumnarSerDe in Hive
Date Mon, 17 Aug 2009 12:10:15 GMT


He Yongqiang commented on HIVE-756:

Changes for ColumnarStruct is good.
@hive-756.patch: line 100:
The new var "prjColIDs" is not necessary. The cost wouldn't deviate too much from using a
boolean array to do the same things. 
1. Columns information is provided but empty: we ignore all columns
2. Columns information is not provided: we read all columns.
In this way if the caller (some non-hive applications) does not know the RCFile column information
settings, it can still read in all columns.
Agree. We can use "none" as the conf value to denote empty columns, and use "" to denote all
columns. The code for setting and reading lies in HiveFileFormatUtils. 

> performance improvement for RCFile and ColumnarSerDe in Hive
> ------------------------------------------------------------
>                 Key: HIVE-756
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: hive-756.patch
> There are some easy performance improvements in the columnar storage in Hive I found
during Hackathon. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message