accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2817) Add offset and limit arguments to byte array Encoder.decode method
Date Wed, 18 Feb 2015 16:06:11 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326091#comment-14326091
] 

Josh Elser commented on ACCUMULO-2817:
--------------------------------------

bq. Ideally the new decode(byte[] b, int offset, int len) method would be added to the Lexicode
interface. However adding a method to that interface would break any existing code outside
of Accumulo that extended the interface. Java 8 provides a way around this w/ Default Methods,
but can not use that currently. What about creating two tickets for the future? One to switch
Accumulo to Java 8 and one to add a method to the interface that depends on the JAva 8 issue.

A default method would be nice like you said, but we don't have a Java8 roadmap yet do we?
Maybe an {{AbstractLexicoder}} would be a good short-term solution?

Same goes for the TypedValueCombiner:

{code}
diff --git a/core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java
b/core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java
index dbe2d4a..c2a74a2 100644
--- a/core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java
+++ b/core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java
@@ -111,6 +111,8 @@ public abstract class TypedValueCombiner<V> extends Combiner {
     byte[] encode(V v);
 
     V decode(byte[] b) throws ValueFormatException;
+
+    V decode(byte[] b, int offset, int len) throws ValueFormatException;
   }
{code}

{code}
@@ -215,8 +220,13 @@ public class SummingArrayCombiner extends TypedValueCombiner<List<Long>>
{
 
     @Override
     public List<Long> decode(byte[] b) {
-      String[] longstrs = new String(b, UTF_8).split(",");
-      List<Long> la = new ArrayList<Long>(longstrs.length);
+      return decode(b, 0, b.length);
+    }
+
+    @Override
+    public List<Long> decode(byte[] b, int offset, int len) {
+      String[] longstrs = new String(b, offset, len, UTF_8).split(",");
+      List<Long> la = new ArrayList<Long>(len);
       for (String s : longstrs) {
         if (s.length() == 0)
           la.add(0l);
{code}

I believe you still want to pass in {{longstrs.length}} and not {{len}} to the ArrayList constructor.

If we can find a way to collapse the repeated changes in one place, that'd be much better
IMO. Lots of unit tests are also much appreciated!

> Add offset and limit arguments to byte array Encoder.decode method
> ------------------------------------------------------------------
>
>                 Key: ACCUMULO-2817
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2817
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.7.0
>            Reporter: Josh Elser
>            Assignee: Matt Dailey
>              Labels: newbie
>             Fix For: 1.7.0
>
>         Attachments: ACCUMULO-2817.patch
>
>
> Similar to ACCUMULO-2445, but presently the encoder only works on complete byte arrays.
This forces an extra copy of the data when it is located in an array that contains other information
(e.g. a composite key).
> It would be nice to be able to provide offset and length arguments to {{Encoder.decode}}
so that users can avoid the additional arraycopy.
> Changing to a ByteBuffer instead of byte array argument would also be acceptable, but
more churn on the API that, unless it's happening globally, I would rather avoid. It would
also incur the penalty for that extra Object, which while minimal alone, could be significant
if decoding every value in a table, for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message