hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1641) add map joined table to distributed cache
Date Wed, 06 Oct 2010 01:32:32 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Liyin Tang updated HIVE-1641:

    Affects Version/s: 0.7.0
         Release Note: Split the MapJoin into 2 stage. In stage 1, generate the JDBM file
for each small table. In stage 2, load the JDBM file and do the Join operation in memory
               Status: Patch Available  (was: In Progress)

> add map joined table to distributed cache
> -----------------------------------------
>                 Key: HIVE-1641
>                 URL: https://issues.apache.org/jira/browse/HIVE-1641
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Namit Jain
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>         Attachments: Hive-1641.patch
> Currently, the mappers directly read the map-joined table from HDFS, which makes it difficult
to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a few thousand,
due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read from there

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message