accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4145) Eliminate constant unnecessary Text wrapping of immutable tableIDs
Date Fri, 19 Feb 2016 03:32:18 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153653#comment-15153653
] 

ASF GitHub Bot commented on ACCUMULO-4145:
------------------------------------------

GitHub user ctubbsii opened a pull request:

    https://github.com/apache/accumulo/pull/72

    ACCUMULO-4145 Eliminate Text wrapping of tableIDs

    * Best attempt to eliminate text wrapping of tableIDs on master branch.
    * Some ByteBuffer wrapping still occurs due to thrift using bytes to transfer tableId
instead of string.
    * Some Text wrapping is needed for:
     - preserving existing KeyExtent serialization
     - putting tableIDs in replication-related table entries
     - avoiding public API changes to deprecated KeyExtent.
    
    All unit tests pass, and -Psunny ITs, as well as checkstyle, findbugs, and modernizer
    This replaces #70

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ctubbsii/accumulo ACCUMULO-4145

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #72
    
----
commit 413244765adb7a60cef8de701de6983d4c99c8e2
Author: Christopher Tubbs <ctubbsii@apache.org>
Date:   2016-02-18T23:56:54Z

    ACCUMULO-4145 Eliminate Text wrapping of tableIDs

----


> Eliminate constant unnecessary Text wrapping of immutable tableIDs
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-4145
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Christopher Tubbs
>            Assignee: Christopher Tubbs
>              Labels: tedious
>             Fix For: 1.8.0
>
>
> I started looking at ACCUMULO-4138 to see how KeyExtent is used to do overlaps, and I
noticed a *lot* of {{new KeyExtent(new Text(String), ...)}} calls, where that first parameter
is the Text version of tableId.
> It appears, we even do this practice so much, that KeyExtent has a built-in WeakReference
caching of these tableIds, so the Java GC we can dedupe them and avoid creating too many.
> The thing is... a lot of this attempt to optimize appears to be the result of unnecessarily
wrapping all these immutable String tableIDs with a Text object, yet it doesn't really seem
to buy us anything. In most of our API and a lot of internal utilities, these are already
Strings with references elsewhere in the code... so even if we dedupe the Text objects, we're
not doing anything about these Strings. Worse, we're actually doing some protective copying
here and there as we pass around these Text objects, using TextUtil, etc. This is completely
unnecessary, because they were immutable before we copied them and wrapped them.
> In some cases, we actually call toString on the text objects to pass them around, and
they get flip-flopped a few times between String and Text, depending on what the internal
API accepts.
> At best, using the Text object helps us serialize KeyExtent... but that's questionably
beneficial since DataOutput can already serialize with: {{out.writeUTF(String)}}, so that's
not actually that helpful.
> The only other benefit I could possibly see is with compareTo for lexicographical comparison,
but I have a hard time believing Text can compare faster than {{java.lang.String}}, and there
shouldn't be any difference in the results of the comparison between the UTF-8 encoding of
Text and the modified UTF encoding which is native to Java Strings, certainly not for the
base-36 characters and fixed constants (for special tables) we use in tableIDs.
> We should just completely strip out all the Text versions of tableId, wherever possible.
I've already done this as an exercise in the 1.6 branch, but it might be potentially risky
as these are put in Java maps which don't have strict type checking ({{map.get(Object)}},
for instance) and are sometimes compared with {{.equals(Object)}}. For what it's worth, the
only public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was only inadvertently
in the public API and has since been moved (IIRC). We can preserve the old methods for compatibility,
if necessary (deprecated, of course).
> Also, getting rid of these Text objects, and just sticking with the immutable String
objects, we'll be able to take advantage of future JVM improvements for String deduplication,
like http://java-performance.info/java-string-deduplication/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message