hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
Date Tue, 27 Jan 2015 23:03:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294375#comment-14294375

Lei (Eddy) Xu commented on HDFS-6673:

[~andrew.wang], [~wheat9] and I had an offline call about this issue. We discovered two different
use cases:

# User downloads fsimage to his laptop and runs this PB OIV tool
# User runs PB OIV tool as a MapReduce task.

[~wheat9] raised the concern that when the working set (directory file names mapping and inode
to parent inode mapping) is larger than memory, it is hard to expect the execution time for
running OIV in MapReduce task, because usually such tasks are running on DN with relatively
smaller memory and HDDs, and random seeks in LevelDB might kill the performance. He suggested
rather than make the MR task unexpected long, it would be better to let the task failed faster.
We think it would be better to use the {{InMemoryMap}} here to store metadata in memory for
MR task, so that if the working set is too large, the MR task will out of memory and die fast.
So we can suggest user to run this task on a larger memory machine.

On the other hand, for case #1, user can leverage laptop's SSD to get decent performance for
such large fsimage, without requiring large memory.

In summary, we suggest to use the PB OIV tool as following:
* For very small fsimage (e.g., < 1GB) or very large fsimage on the machine with HDD and
limited RAM (e.g., 40+GB fsimage vs 8GB RAM), it should use {{InMemoryMap}}, by not specifying
{{--tempdb}} parameter. Users are suggested to run it in very large RAM.
 * Other than that, user can use {{--tempdb}} to specify a path to use LevelDB to store metadata
out of heap.

[~wheat9] and [~andrew.wang] does the above cover all the information we have discussed? [~wheat9]
Can I get a +0 from you?

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch,
HDFS-6673.004.patch, HDFS-6673.005.patch, HDFS-6673.006.patch
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported
in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 

This message was sent by Atlassian JIRA

View raw message