hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu (JIRA)" <>
Subject [jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches
Date Wed, 21 May 2014 10:56:38 GMT


Remus Rusanu commented on HIVE-7105:

Extending the vectorized processing to the reduce side is a complex undertaking. None of the
vector mode operators are implemented in reduce side. The thinking is that the bulk of the
CPU intensive processing occurs on the map side and our goal was to provide maximum feature
coverage (ie. implement as many operators as needed to cover the most queries) but atm vectorization
only works for map side of first stage. I'm not sure whether at this stage we can call the
map side effort stable/mature/complete enough to warrant a focus shift to reduce side.

> Enable ReduceRecordProcessor to generate VectorizedRowBatches
> -------------------------------------------------------------
>                 Key: HIVE-7105
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Rajesh Balamohan
>            Assignee: Jitendra Nath Pandey
> Currently, ReduceRecordProcessor sends one key,value pair at a time to its operator pipeline.
 It would be beneficial to send VectorizedRowBatch to downstream operators. 

This message was sent by Atlassian JIRA

View raw message