hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11926) Ozone: Implement a common helper to return a range of KVs in levelDB
Date Wed, 07 Jun 2017 03:49:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040118#comment-16040118

Weiwei Yang commented on HDFS-11926:

Thanks [~anu] and [~xyao] for the comments, I just updated v4 patch to incorporate following
changes based on your comments.

[~anu]'s comments
bq. if count is 0 or less than zero, I think we should just throw an IllegalArgumentException.

Fixed. Also updated the java doc.

bq. We do a DbIter.seekToFirst – shouldn't we do that if and only if the startKey == null
? It looks like it is a frivolous operation if startKey argument is not null.

Fixed. It only seekToFirst when startKey is null. Thanks for pointing this out.

bq. Even though I did suggest that we should throw if cannot find the startKey, We also need
a plan to handle the situation where someone is iterating a bucket with concurrent deletes
going on ...

Well for this common helper, I think it should be good to throws an exception when startKey
not found. The scenario you mentioned, I am not sure, didn't we have a read/write lock in
KSM metadata manager that to avoid such races? Agree to open another jira to investigate this

bq. One more minor suggestion, Would it be possible to log the time taken to execute this

That's a very good suggestion, I have added a debug info in the code to print the time consumed
for this function.

[~xyao]'s comments
bq. FilteredKeys.java

This class is removed, now {{KeyManagerImpl#listKey}} calls the common helper to avoid duplicate

bq. Race between iterator and modification.

May or may not. I am not sure, I failed to get an answer from levelDB doc (poor docs :().
Anyway I have tested the snapshot approach, it doesn't seem to be expensive. I setup some
tests, with different data size in levelDB. From 10 entries, 10000 to 10,000,000 entries,
data size from a few KB to over 180mb, the time to take a snapshot is trivial (around 1ms).
I did not read the levelDB implementation, but if the iterator reads from the memory table
and so as how snapshot created, it probably makes no big difference of the 2 approach. However,
with snapshot maybe safer.


> Ozone: Implement a common helper to return a range of KVs in levelDB
> --------------------------------------------------------------------
>                 Key: HDFS-11926
>                 URL: https://issues.apache.org/jira/browse/HDFS-11926
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Blocker
>         Attachments: HDFS-11926-HDFS-7240.001.patch, HDFS-11926-HDFS-7240.002.patch,
HDFS-11926-HDFS-7240.003.patch, HDFS-11926-HDFS-7240.004.patch
> There are quite some *LIST* operations need to get a range of keys or values from levelDB,
and filter entries with key prefix. 
> # HDFS-11782 listKeys
> # HDFS-11779 listBuckets
> # HDFS-11773 listVolumes
> # HDFS-11679 listContainers
> we need to implement a common utility for them.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message