hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet
Date Mon, 20 Feb 2017 06:44:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Teddy Choi updated HIVE-15987:
------------------------------
    Description: 
Most of data operations in Hive uses null operations. The current implementation of ColumnVector.isNull
uses a boolean array, which uses 8 bits per 1 boolean. BitSet is a more compact representation,
as it uses 1 bit per 1 boolean with a backing long array. Also logical operations between
longs are much faster than ones with bytes as it uses less instructions per byte. So it will
bring 8x or more performance for null operations.

However, there also are several cases that will make this improvement slow. Such as simple
reads will require more instructions per row. So it should include benchmark tests to show
its performance impact.

  was:Most of data operations in Hive uses null operations. The current implementation of
ColumnVector.isNull uses a boolean array, which uses 8 bits per 1 boolean. BitSet is a more
compact representation, as it uses 1 bit per 1 boolean with a backing long array. Also logical
operations between longs are much faster than ones with bytes as it uses less instructions
per byte. So it will bring 8x or more performance for null operations.


> Replace ColumnVector.isNull boolean[] impl. with BitSet
> -------------------------------------------------------
>
>                 Key: HIVE-15987
>                 URL: https://issues.apache.org/jira/browse/HIVE-15987
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>
> Most of data operations in Hive uses null operations. The current implementation of ColumnVector.isNull
uses a boolean array, which uses 8 bits per 1 boolean. BitSet is a more compact representation,
as it uses 1 bit per 1 boolean with a backing long array. Also logical operations between
longs are much faster than ones with bytes as it uses less instructions per byte. So it will
bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. Such as simple
reads will require more instructions per row. So it should include benchmark tests to show
its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message