incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikrant Navalgund (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-234) Create a softlink like capability in the HDFSDirectory
Date Tue, 15 Oct 2013 22:07:42 GMT

    [ https://issues.apache.org/jira/browse/BLUR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795720#comment-13795720
] 

Vikrant Navalgund commented on BLUR-234:
----------------------------------------

Hello,
I am looking into this.

Regards,
Vikrant

> Create a softlink like capability in the HDFSDirectory
> ------------------------------------------------------
>
>                 Key: BLUR-234
>                 URL: https://issues.apache.org/jira/browse/BLUR-234
>             Project: Apache Blur
>          Issue Type: Sub-task
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>
> The problem we are trying to solve here is minimizing file copying.  During a merge of
an external index produced by MR into a shard index normally the index files are copied. 
In a lot of cases the new external index(es) are very large.  This can cause some serious
performance problems because all the new data would be copied into shard index.  Normally
this can happens across the cluster at the same time so it will likely turn into an IO storm.
> The current implementation in the IndexImporter that deals with this problem does so
by overriding method in the HDFSDirectory that moves the files in HDFS instead of copying.
 This makes those merges very fast, but it's risky because if the shard index writer doesn't
commit the changes the files are not moved back to their original location.  Instead they
are deleted, loss of data.
> So I'm preposing that we create a softlink system that allows for links to the be created
instead of being moved.  That way if the commit fails the links are removed and the original
data files are in the their original location.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message