hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold
Date Thu, 02 Oct 2014 20:45:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157145#comment-14157145
] 

Colin Patrick McCabe commented on HDFS-6988:
--------------------------------------------

I'm trying to understand the process for configuring this.  First, there is the decision as
to how big to make the ramdisk.  This is something that a sysadmin needs to do ahead of time
(or management software needs to do).  This is clearly going to be done in terms of a number
of bytes.  Then, there is setting {{dfs.datanode.ram.disk.low.watermark.percent}}.  This will
determine how much of the ramdisk we will try to keep free.  Then there is {{dfs.datanode.ram.disk.low.watermark.replicas}}.
 I'm not sure when you would set this one.

I don't like the fact that {{dfs.datanode.ram.disk.low.watermark.percent}} is an int.  In
a year or two, we may find that 100 GB ramdisks are common.  Then the sysadmin gets a choice
between specifying 0% (0 bytes free) and 1% (try to keep 1 GB free).  Making this a float
would be better, I think...

Why is {{dfs.datanode.ram.disk.low.watermark.replicas}} specified in terms of number of replicas?
 Block size is a per-replica property-- I could easily have a client that writes 256 MB or
1 GB replicas, while the DataNode is configured with {{dfs.blocksize}} at 64MB.  It's pretty
common for formats like ORCFile and Apache Parquet to use large blocks and seek around within
them.  This property seems like it should be given in terms of bytes to avoid confusion. 
It seems like we are translating it into a number of bytes before using it anyway, so why
not give the user access to that number directly?

bq. I explained this earlier, a single number fails to work well for a range of disks and
makes configuration mandatory. What would you choose as the default value of this single setting.
Let's say we choose 1GB or higher. Then we are wasting at least 25% of space on a 4GB RAM
disk. Or we choose 512MB. Then we are not evicting fast enough to keep up with multiple writers
on a 50GB disk.

There seems to be a hidden assumption that the number of writers (or the speed at which they're
writing) will increase with the size of the ramdisk.  I don't see why that's true.  In theory,
I could have a system with a small ramdisk and a high write rate, or a system with a huge
ramdisk and a low write rate.  It seems that the amount of space I want to keep free is related
to a percentage of the write rate, not to a percentage of the total ramdisk size?

> Add configurable limit for percentage-based eviction threshold
> --------------------------------------------------------------
>
>                 Key: HDFS-6988
>                 URL: https://issues.apache.org/jira/browse/HDFS-6988
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: HDFS-6581
>            Reporter: Arpit Agarwal
>             Fix For: HDFS-6581
>
>         Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable.
The hard-coded thresholds may not be appropriate for very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message