Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 057B0200B21 for ; Fri, 27 May 2016 02:52:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 04817160A18; Fri, 27 May 2016 00:52:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 52CB5160A2C for ; Fri, 27 May 2016 02:52:19 +0200 (CEST) Received: (qmail 61309 invoked by uid 500); 27 May 2016 00:52:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 60554 invoked by uid 99); 27 May 2016 00:52:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 May 2016 00:52:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 08CB92C1F6E for ; Fri, 27 May 2016 00:52:13 +0000 (UTC) Date: Fri, 27 May 2016 00:52:13 +0000 (UTC) From: "Joep Rottinghuis (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5167) Escaping occurences of encodedValues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 27 May 2016 00:52:20 -0000 [ https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303275#comment-15303275 ] Joep Rottinghuis commented on YARN-5167: ---------------------------------------- As part of our discussion of cost of doing all these replace operations we said the keys shouldn't be _that_ long, so we shouldn't worry about the cost too much unless we can show that it is a problem. That thought lead us to the realization that there are currently no limits on size of keys, nor on the size of the value passed to us. By default HBase will allow a keyvalue size (configurable through hbase.client.keyvalue.maxsize) of 10MB. We said we should probably limit the keys to be no larger than a thousand characters or so. The rowkey and column qualifiers would together get to a considerable size, tags could be in the mix as well. In order to avoid issues with region servers OOM'ing (on coprocessors), replication between HBase clusters in different DCs choking, and clients dying with memory issues we should probably enforce a reasonable limit. We can make this configurable, but if we choose something like 2048 in max rowkey, and column qualifier size and 127 MB for the value, then we can arrive at a total max keyvalue size of max 128MB. [~vrushalic] will file a separate jira for this and then we don't have to worry about walking through strings that are too large in this jira. > Escaping occurences of encodedValues > ------------------------------------ > > Key: YARN-5167 > URL: https://issues.apache.org/jira/browse/YARN-5167 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Joep Rottinghuis > Assignee: Sangjin Lee > Priority: Critical > > We had earlier decided to punt on this, but in discussing YARN-5109 we thought it would be best to just be safe rather than sorry later on. > Encoded sequences can occur in the original string, especially in case of "foreign key" if we decide to have lookups. > For example, space is encoded as %2$. > Encoding "String with %2$ in it" would decode to "String with in it". > We though we should first escape existing occurrences of encoded strings by prefixing a backslash (even if there is already a backslash that should be ok). Then we should replace all unencoded strings. > On the way out, we should replace all occurrences of our encoded string to the original except when it is prefixed by an escape character. Lastly we should strip off the one additional backslash in front of each remaining (escaped) sequence. > If we add the following entry to TestSeparator#testEncodeDecode() that demonstrates what this jira should accomplish: > {code} > testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor \\\\%2$ = no problem!", Separator.QUALIFIERS, > Separator.VALUES, Separator.SPACE, Separator.TAB); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org