hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-860) Persistent distributed cache
Date Mon, 15 Dec 2014 08:17:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246439#comment-14246439
] 

Dong Chen commented on HIVE-860:
--------------------------------

It is a little strange for these failed cases. They work fine in local env. 

After analyzing the test results, most of them failed at {{LocalDistributedCacheManager.setup()}}
and deeply at {{FSDownload.unpack()}} or {{FSDownload.changePermissions()}}. Then I uploaded
a temp patch with logging the properties {{mapreduce.job.cache.archives}} in conf for debugging.
The value is the URIs of the cache jars and they look correct in test logs. However, the test
result complained incorrect jar URI or missing class file in unpacked jar.

So I think this patch might be ok. It worked in local test.
The jar files might be messed up between test cases in Jenkins CI. Since I am not sure how
to check the env in Jenkins, this is a guess...

Any thought?

> Persistent distributed cache
> ----------------------------
>
>                 Key: HIVE-860
>                 URL: https://issues.apache.org/jira/browse/HIVE-860
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.12.0
>            Reporter: Zheng Shao
>            Assignee: Dong Chen
>             Fix For: 0.15.0
>
>         Attachments: HIVE-860-debug.4.patch, HIVE-860.1.patch, HIVE-860.2.patch, HIVE-860.2.patch,
HIVE-860.3.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.4.patch,
HIVE-860.4.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch,
HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch
>
>
> DistributedCache is shared across multiple jobs, if the hdfs file name is the same.
> We need to make sure Hive put the same file into the same location every time and do
not overwrite if the file content is the same.
> We can achieve 2 different results:
> A1. Files added with the same name, timestamp, and md5 in the same session will have
a single copy in distributed cache.
> A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed
cache.
> A2 has a bigger benefit in sharing but may raise a question on when Hive should clean
it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message