hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1641) add map joined table to distributed cache
Date Thu, 07 Oct 2010 22:43:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HIVE-1641:
-----------------------------

    Attachment: Hive-1641.patch


This new patch includes adding jdbm files to distributed cache and load it back from the cached
file.

This patch has been tested in the Test cluster for all the map join test case ( join25.q --
join39.q). 
All the testing results match with the expected results. 

> add map joined table to distributed cache
> -----------------------------------------
>
>                 Key: HIVE-1641
>                 URL: https://issues.apache.org/jira/browse/HIVE-1641
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Namit Jain
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which makes it difficult
to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a few thousand,
due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read from there
instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message