hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (JIRA)" <>
Subject [jira] Commented: (HIVE-1641) add map joined table to distributed cache
Date Thu, 07 Oct 2010 22:45:32 GMT


Liyin Tang commented on HIVE-1641:

I have submitted a new patch on jira.
This new patch includes adding jdbm files to distributed cache and load it back from the cached

This patch has been tested in the Test cluster for all the map join test case ( join25.q --
All the testing results match with the expected results. 

> add map joined table to distributed cache
> -----------------------------------------
>                 Key: HIVE-1641
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Namit Jain
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>         Attachments: Hive-1641.patch
> Currently, the mappers directly read the map-joined table from HDFS, which makes it difficult
to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a few thousand,
due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read from there

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message