accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Bug in either InMemoryMap or NativeMap
Date Fri, 19 Feb 2016 22:56:47 GMT
On Fri, Feb 19, 2016 at 5:14 PM, Dan Blum <dblum@bbn.com> wrote:

> Yes, please open an issue for this.
>
>
>
> In the meantime, as a workaround is it safe to assign an arbitrary
> increasing timestamp when calling Mutation.put()? That seems the simplest
> way to get the ColumnUpdates to be treated properly.
>

Seems like that would work, but then you may have to keep track of the next
timestamp across processes.

A possible alternative is to configure the table to use logical time and
multiple mutations.  Logical time ensures every mutation is assigned a
unique timestamp. The following program is an example of this.

    String table = getUniqueNames(1)[0];
    Connector c = getConnector();
    c.tableOperations().create(table,
        new
NewTableConfiguration().setTimeType(TimeType.LOGICAL).withoutDefaultIterators());

    BatchWriterConfig config = new BatchWriterConfig();
    BatchWriter writer = c.createBatchWriter(table, config);

    Mutation m = new Mutation("row");
    m.put("cf1", "cq1", new Value("abc".getBytes()));
    writer.addMutation(m);
    m = new Mutation("row");
    m.put("cf1", "cq1", new Value("xyz".getBytes()));
    writer.addMutation(m);
    writer.close();

    Scanner scanner = c.createScanner(table, Authorizations.EMPTY);
    for (Entry<Key,Value> entry : scanner) {
      System.out.println(entry);
    }

This program prints

  row cf1:cq1 [] 2 false=xyz
  row cf1:cq1 [] 1 false=abc

Accumulo assigned the timestamps 1 and 2.    In this case Accumulo will
keep track of the next timestamp for you.

If you do not use logical time, then the two mutations would likely get the
same timestamp because they arrived in the same millisecond.


>
> *From:* Keith Turner [mailto:keith@deenlo.com]
> *Sent:* Friday, February 19, 2016 5:11 PM
> *To:* user@accumulo.apache.org
> *Cc:* Jonathan Lasko; Maxwell Jordan; kstudzin@bbn.com
> *Subject:* Re: Bug in either InMemoryMap or NativeMap
>
>
>
>
>
>
>
> On Fri, Feb 19, 2016 at 3:34 PM, Dan Blum <dblum@bbn.com> wrote:
>
> (Resend: I forgot to actually subscribe before sending originally.)
>
> I noticed a difference in behavior between our cluster and our tests
> running
> on MiniCluster: when multiple put() calls are made to a Mutation with the
> same CF, CQ, and CV and no explicit timestamp, on a live cluster only the
> last one is written, whereas in Mini all of them are.
>
> Of course in most cases it wouldn't matter but if there is a Combiner set
> on
> the column (which is the case I am dealing with) then it does.
>
> I believe the difference in behavior is due to code in NativeMap._mutate
> and
> InMemoryMap.DefaultMap.mutate. In the former if there are multiple
> ColumnUpdates in a Mutation they all get written with the same
> mutationCount
> value; I haven't looked at the C++ map code but I assume that this means
> that entries with the same CF/CQ/CV/timestamp will overwrite each other. In
> contrast, in DefaultMap multiple ColumnUpdates are stored with an
> incrementing kvCount, so the keys will necessarily be distinct.
>
>
>
> You made this issue easy to track down.
>
>
>
> This seems like a bug w/ the native map.  The code allocates a unique int
> for each key/value in the mutation.
>
>
>
> https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/InMemoryMap.java#L476
>
>
> It seems like the native map code should increment like the DefaultMap
> code does.  Specifically it seems like the following code should increment
> mutationCount (coordinating with the code that calls it)
>
>
> https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/NativeMap.java#L532
>
>
>
> Would you like to open an issue in Jira?
>
>
>
>
> My main question is: which of these is the intended behavior? We'll
> obviously need to change our code to work with NativeMap's current
> implementation regardless (since we don't want to use the Java maps on a
> live cluster), but it would be useful to know if that change is temporary
> or
> permanent.
>
> My secondary question is whether there is any trick to getting native maps
> to work in MiniCluster, which would be very helpful for our testing. I
> changed the configuration XML we use and I can see that it picks up the
> change - server.Accumulo logs "tserver.memory.maps.native.enabled = true,"
> but NativeMap never logs that it tries to load the library so the setting
> seems to be dropped somewhere.
>
>
>

Mime
View raw message