hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nandakumar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12506) Ozone: ListBucket is too slow
Date Wed, 20 Sep 2017 16:20:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173430#comment-16173430
] 

Nandakumar edited comment on HDFS-12506 at 9/20/17 4:19 PM:
------------------------------------------------------------

+1 for [~xyao]'s idea, I was also thinking of the same.
One small change though
For Volume
/#v1
For Bucket
/v1/#b1
Keys can be stored as they are stored now

With this we can iterate and get list of volumes without iterating over buckets, and get list
of buckets without iterating over keys.

Something like
{code}
/#v1
/#v2
/#v3
/v1/#b1
/v1/#b2
/v2/#b1
/v3/#b1
/v1/b1/k1
/v2/b2/k2
{code}




was (Author: nandakumar131):
+1 for [~xyao]'s idea, I was also thinking of the same.
One small change though
For Volume
/#v1
For Bucket
/v1/#b1
Keys can be stored as they are stored now

With this we can iterate and get list of volumes without iterating over buckets, and get list
of buckets without iterating over keys.

Something lime
{code}
/#v1
/#v2
/#v3
/v1/#b1
/v1/#b2
/v2/#b1
/v3/#b1
/v1/b1/k1
/v2/b2/k2
{code}



> Ozone: ListBucket is too slow
> -----------------------------
>
>                 Key: HDFS-12506
>                 URL: https://issues.apache.org/jira/browse/HDFS-12506
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Priority: Blocker
>              Labels: ozoneMerge
>
> Generated 3 million keys in ozone, and run {{listBucket}} command to get a list of buckets
under a volume,
> {code}
> bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
> {code}
> this call spent over *15 seconds* to finish. The problem was caused by the inflexible
structure of KSM DB. Right now {{ksm.db}} stores keys like following
> {code}
> /v1/b1
> /v1/b1/k1
> /v1/b1/k2
> /v1/b1/k3
> /v1/b2
> /v1/b2/k1
> /v1/b2/k2
> /v1/b2/k3
> /v1/b3
> /v1/b4
> {code}
> keys are sorted in nature order so when we do list buckets under a volume e.g /v1, we
need to seek to /v1 point and start to iterate and filter keys, this ends up with scanning
all keys under volume /v1. The problem with this design is we don't have an efficient approach
to locate all buckets without scanning the keys.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message