hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit
Date Tue, 13 May 2014 16:51:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996616#comment-13996616
] 

Chris Nauroth commented on HDFS-6377:
-------------------------------------

Thanks for changing the config property names.

bq. Do you think we need to have minimum configuration limit? Lets say user configured size
as 3, then this is always invalid size as Namespace itself occupy this space? [ I am not insisting,
just to discuss this point ]

Alternatively, the other fs-limits configs have the semantics that setting them to 0 disables
enforcement.  I suppose this might be helpful as an escape hatch if something causes really
unexpectedly long data, but the admin still wants to keep the service running.  (Like Uma,
I'm just discussing ideas, not insisting.)

{code}
  private void checkXAttrSize(XAttr xAttr) throws UnsupportedEncodingException {
    int size = xAttr.getName().getBytes("UTF-8").length;
    if (xAttr.getValue() != null) {
      size += xAttr.getValue().length;
    }
    if (size > nnConf.xattrMaxSize) {
      throw new HadoopIllegalArgumentException(
          "XAttr is too big, maximum size = " + nnConf.xattrMaxSize
              + ", but the size is = " + xAttr.getName().length());
    }
  }
{code}

I believe the log message will be incorrect in the presence of multi-byte characters.  The
limit is enforced on the number of bytes in UTF-8 encoding.  The log message uses the string
length, which can differ.  This could confuse users if we reject an xattr and then log a size
that appears to be under the configured limit.  Here is a quick Scala REPL session demonstrating
the problem:

{code}
scala> val s = "single-byte-chars"
val s = "single-byte-chars"
s: java.lang.String = single-byte-chars

scala> s.getBytes("UTF-8").length
s.getBytes("UTF-8").length
res2: Int = 17

scala> s.length
s.length
res3: Int = 17

scala> val s2 = "multi-byte-\u0641-chars"
val s2 = "multi-byte-\u0641-chars"
s2: java.lang.String = multi-byte-?-chars

scala> s2.getBytes("UTF-8").length
s2.getBytes("UTF-8").length
res4: Int = 19

scala> s2.length
s2.length
res5: Int = 18
{code}

Also, here is a minor code cleanup suggestion on the above.  Guava defines a constant {{Charsets#UTF_8}}.
 We can pass this to {{String#getBytes(Charset)}} (not using the overload that takes a {{String}}
parameter).  Then, that eliminates the need to deal with {{UnsupportedEncodingException}}.
 I've always found that exception irritating.  Of course we have UTF-8!  :-)


For {{dfs.namenode.fs-limits.max-directory-items}}, we log an error message if we encounter
an existing inode that violates the limit during startup/applying edits.  This can be a helpful
message if an admin down-tunes the setting and then wants to identify and clean up existing
data that's in violation.  Can we log a message for the xattr limit violations too?  If it's
easier, feel free to punt this part to a separate jira.  (I realize you're close to +1 on
this patch already.)

> Unify xattr name and value limits into a single limit
> -----------------------------------------------------
>
>                 Key: HDFS-6377
>                 URL: https://issues.apache.org/jira/browse/HDFS-6377
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS XAttrs (HDFS-2006)
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an xattr's name
and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message