hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
Date Mon, 26 Jan 2015 23:26:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292617#comment-14292617

Lei (Eddy) Xu commented on HDFS-6673:

Thank you so much for the continuous inputs on this issue, [~andrew.wang], [~wheat9] and [~cmccabe]!

Just want to add a little bit more information regarding our design considerations.

bq. Convert the fsimage into LevelDB before running the oiv.

We do agree that an ordered fsimage in LevelDB can be scanned much faster than the approach
we used in the path. However, the main concern about this approach and the reason that we
gave up on this are that:  we discovered that writing a large fsimage (> 1GB) into LevelDB
along is several times slower (4-5x) than the end-to-end time used in the latest patch. We
believed that the bottleneck is write amplification on LevelDB, but not in-memory computation
(e.g., serialization), since we had observed that the throughput of writing inodes to LevelDB
continuously drops _significantly_ after the db size becomes larger than 1GB. That's the reason
that we expected it would be much worse for even larger fsimage. 

Adding another data point, currently for the 3.3GB (33M inodes) fsimage we test, we have less
than 300MB metadata in LevelDB. If we could assume that the file distributions are similar
amount fsimages, we will have {{2-3GB}} leveldb DB for {{20-GB}} fsimage ({{6-8GB}} leveldb
for {{40GB}} (400M inodes)). The working set here is {{6-8GB}} leveldb, which is still arguably
reasonable for today's laptop memory. Moreover, today's laptops have quite fast SSD for decent
random IO :)

I would be very interested to see the performance results on such {{400M}} inodes fsimage
if possible, which will definitely help me to optimize this patch. 

bq. Tweak saver of the pb-based fsimage so that it stores the inodes using with the order
of the full path. It can be done without changing the format of the current fsimage.

That would be much appreciated if this can be done. 

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch,
HDFS-6673.004.patch, HDFS-6673.005.patch, HDFS-6673.006.patch
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported
in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 

This message was sent by Atlassian JIRA

View raw message