hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3479) Implement configuration items useful for Hadoop resource manager (v1)
Date Wed, 04 Jun 2008 05:48:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602175#action_12602175
] 

Hemanth Yamijala commented on HADOOP-3479:
------------------------------------------

Some initial thoughts, combining ideas Vivek expressed on a mail to core-dev.

We have the following options to consider:

- Format for the new configuration options
- Where to define them in. Could be related to the format.

Looking at format first, I can think of 2 options. Given that there is a set of properties
(for e.g. user list, default priority, resource limit, etc) that are related, we could have
a nested XML format - something like:
{code:xml}
<Grid>
  <Organization>
    <Name>org1</Name>
    <MaxCapacity>100</MaxCapacity>
    <Queues>
      <Queue>
        <Name>queue1</Name>
        <AllowedUsers>u1,u2,u3</AllowedUsers>
        <DisallowedUsers>u3,u4,u5</DisallowedUsers>
        <AllowedOverrides>False</AllowedOverrides>
      </Queue>
    </Queues>
</Grid>
{code}

IMO, this is an intuitive model to capture the configuration. We can use a DOM parser like
what we use currently in Configuration.java to construct Scheduler config objects.

The main drawbacks of this approach are:
- A new format to administer. Could be a pain for administrators.
- Parity expectations with other features provided by standard Hadoop configuration.

The other format tries to retain the same structure as current Hadoop configuration, which
is truly like a list of key and value pairs. Here's an example:
{code:xml}
<property>
    <name>hadoop.scheduler.orgs</name>
    <value>Org1,Org2</value>
    <description>Comma separated list of Org names</description>
</property>
<property>
    <name>hadoop.scheduler.Org1.max-capacity</name>
    <value>100</value>
</property>
<property>
    <name>hadoop.scheduler.Org1.queues</name>
    <value>q1,q2</value>
</property>
<property>
    <name>hadoop.scheduler.Org1.q1.allowedusers</name>
    <value>u1,u2,u3</value>
</property>
{code}

As shown above, the keys for the properties are used to indicate the grouping. All the properties
for an Org would be under hadoop.scheduler.org-name, likewise for Queues. This implies that
we need property names to be dynamically built by code and that we may need ways of listing
all children of a given property - something that can possibly be solved using HADOOP-3407.

This format is much less intuitive, and maybe error prone to administer ? Also, we may have
unnecessary restrictions or special handling on names. For e.g. a queue name cannot contain
a '.'

However, it allows us to have the same format as in Hadoop and use the same features related
to configuration. It would also help us to reuse the basic code for parsing and reading in
configuration.

Regarding where to define this configuration, the first format will necessitate a new file,
as the current hadoop configs are truly a single level hierarchy. The second option allows
us to continue to use the current Hadoop config files: hadoop-defaults.xml and hadoop-site.xml.

Having a separate file has the benefits that we can define policies around how to manage updates
to the file (for e.g. by reading it periodically, etc). However, it would add admin overhead,
in that there is now one more file to administer.

Personally, I prefer the more intuitive format of Option 1. Though some learning is involved,
it may be easier to learn this format. 

Comments from others ? Any other options ?

> Implement configuration items useful for Hadoop resource manager (v1)
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3479
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3479
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>
> HADOOP-3421 lists requirements for a new resource manager for Hadoop. Implementation
for these will require support for new configuration items in Hadoop. This JIRA is to define
such configuration, and track it's implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message