hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes
Date Thu, 01 Feb 2018 19:14:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349121#comment-16349121

Anu Engineer commented on HADOOP-15204:

[~stevel@apache.org] I did look at getLongBytes. The issue is that it does not provide the
same code improvement as {{getTimeDuration}} and {{setTimeDuration}}. 

Let me support my assertion with some examples that you can look at.

Let us suppose that I have a case to read the standard configured size of containers. This
is a configurable parameter in Ozone.

Here is how the code would look like with get/setStorageSize.
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
// there is no symetric function called setLongBytes.. So I have to fall back to setLong
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576); 
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB or what, 
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB

// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576); 
// But this is bad too, now I can not set container size in MB, since setLong takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
Before you think this is a made up example, this is part of changes that we tried which triggered
this change.

Now let us go to back get examples --

// getLongBytes forces us to write code in a certain way which does not match with
// rest of code like getTimeDuration. For example, if I have a case where I want to read //
the read the value in MB, and the ozone-default.xml is configured to GB.
// the case that this line solves 

long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, defaultValueInBytes)).toMBs();{code}

Now imagine repeating this code many times all over the code base, plus, the BYTES.fromBytes(xxx).ToMBs()
is a fuction from this patch. We need some equivalent code.


In other words, I submit that these following factors make getLongBytes a less desirable function
compared to getTimeDuration/getStorageSize.
 * lack of symetric function in getLongBytes - Without a set function it degenerates to a
messy set of multiplication and division each time we have to use a storage Unit. With this
those issue are cleanly isolated to a single place.
 * Lack of a format storage unit - lack of a formal value like to TimeUnit/StorageUnit makes
the code less readable (see the example setStorageSize) where the context also tells you the
unit that we are operating with. 
 *  Does not suit our usage pattern - Ozone code follows the patterns in HDFS. Hence the
default value mapping is not handled well in getLongBytes. 

 * Units and Conversation is needed - In ozone, there are several places where we convert
these numbers. Users can specify Quota as a easy to read Storage Units, like 5 GB or 10 TB.
We have a dedicated code for handling that, we use the storage numbers to specify how large
the off-heap size should be, or as I am showing in this example, container sizes etc.
 * The getLongBytes by itself does not address the lack of StorageUnits, what this patch does
is that it introduces a class that is very similar to {{TimeUnit}}. This makes the code more
readable and easy to maintain.



> Add Configuration API for parsing storage sizes
> -----------------------------------------------
>                 Key: HADOOP-15204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15204
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 3.1.0
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Minor
>             Fix For: 3.1.0
>         Attachments: HADOOP-15204.001.patch
> Hadoop has a lot of configurations that specify memory and disk size. This JIRA proposes
to add an API like {{Configuration.getStorageSize}} which will allow users
> to specify units like KB, MB, GB etc. This is JIRA is inspired by HDFS-8608 and Ozone.
Adding {{getTimeDuration}} support was a great improvement for ozone code base, this JIRA
hopes to do the same thing for configs that deal with disk and memory usage.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message