hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao (JIRA)" <>
Subject [jira] [Created] (HIVE-8810) Make HashTableSinkOperator works for Spark Branch [Spark Branch]
Date Mon, 10 Nov 2014 23:32:33 GMT
Chao created HIVE-8810:

             Summary: Make HashTableSinkOperator works for Spark Branch [Spark Branch]
                 Key: HIVE-8810
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
    Affects Versions: spark-branch
            Reporter: Chao

In MR, all small tables for a particular MJ operator share the same instance of {{HashTableSinkOperator}},
while in Spark branch, each small table corresponds to a different {{HashTableSinkOperator}}
instance. This difference causes some issues.

For instance, in {{HashTableSinkOperator#processOp}}, it uses a tag to look for information
in various data structures, such as {{joinKeys}}, {{filterMaps}}, {{joinValues}}, etc. Those
data structures stores the information BEFORE it splits the MJ operator with its parents.
But, since later on we use separate {{HashTableSinkOperator}} for each small table, that information
is no longer valid, and thus this method will fail.

This JIRA is to track and solve these related issues.

This message was sent by Atlassian JIRA

View raw message