hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches
Date Wed, 21 May 2014 16:42:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004882#comment-14004882
] 

Eric Hanson commented on HIVE-7105:
-----------------------------------

I agree with Remus. If you do want to get good performance with vectorization on the reduce
side, you'll need to think carefully about how you can efficiently create full VectorizedRowBatches.
Single-row or small VectorizedRowBatches will not give performance gains. Also, if it is expensive
to load rows into the batches on the reduce side, that could dominate total runtime.

> Enable ReduceRecordProcessor to generate VectorizedRowBatches
> -------------------------------------------------------------
>
>                 Key: HIVE-7105
>                 URL: https://issues.apache.org/jira/browse/HIVE-7105
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Rajesh Balamohan
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-7105.1.patch
>
>
> Currently, ReduceRecordProcessor sends one key,value pair at a time to its operator pipeline.
 It would be beneficial to send VectorizedRowBatch to downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message