hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-860) Persistent distributed cache
Date Wed, 30 Sep 2009 07:46:32 GMT
Persistent distributed cache
----------------------------

                 Key: HIVE-860
                 URL: https://issues.apache.org/jira/browse/HIVE-860
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


DistributedCache is shared across multiple jobs, if the hdfs file name is the same.

We need to make sure Hive put the same file into the same location every time and do not overwrite
if the file content is the same.

We can achieve 2 different results:
A1. Files added with the same name, timestamp, and md5 in the same session will have a single
copy in distributed cache.
A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed
cache.

A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up
in hdfs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message