accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parise, Jonathan" <>
Subject RE: Watching for Changes with Write Ahead Log?
Date Fri, 02 Oct 2015 13:29:30 GMT
After spending some time on this I am planning a different approach. I am just going to have
the other system notify my system of what keys it changed. This way I can update the index.
When I laid out the complexity involved in the constraint approach, they were willing to change
their system behavior to assist mine.

Doing it through the constraint is just too much of a performance hit. This is because I need
to convert the Mutations back to POJOs and probably to my systems JSON format before I can
index them in ElasticSearch. This turns into an big O(M*N) algorithm where M is the number
of mutations and N is the number of column updates in each mutation. Also, it is difficult
because I need application state in order to decode the mutation values properly, the constraint
doesn’t have that state since it isn’t running in the same JVM (probably not even the
same machine) as the rest of the system. Getting that state would require additional overhead
or perhaps even a REST call back to the original server. Doing all of that inside a constraint
just isn’t feasible.

Thanks for all the helpful information, I now understand constraints much better than I did
a few days ago.

Thanks again,

Jon Parise

From: Adam Fuchs []
Sent: Thursday, October 01, 2015 4:03 PM
Subject: Re: Watching for Changes with Write Ahead Log?

I would stay away from ThreadLocal -- the threads that run Constraints can be dynamically
generated in a resizable thread pool, and cleaning up after them could be challenging. Static
might work better if you can make it thread safe, maybe with a resource pool.


On Thu, Oct 1, 2015 at 2:39 PM, John Vines <<>>

As dirty as it is, that sounds like a case for a static, or maybe thread local, object

On Thu, Oct 1, 2015, 7:19 PM Parise, Jonathan <<>>
I have a few follow up questions in regard to constraints.

What is the lifecycle of a constraint? What I mean by this is are the constraints somehow
tied to Accumulo’s lifecycle or are they just instantiated each time a mutation occurs and
then disposed?

Also, are there multiple instances of the same constraint class at any time or do all mutation
on a table go through the exact same constraint?

My guess is that  when a mutation comes in a new constraint is made through reflection. Then
check() is called, the violation codes are parsed and the object is disposed/finalized.

The reason I ask is that what I want to do is update my ElasticSearch index each time I see
a mutation on the table. However, I don’t want to have to make a connection, send the data
and then tear down the connection each time. That’s a lot of unnecessary overhead and with
all that overhead happening on every mutation performance could be badly impacted.

Is there some way to cache something like a connection and reuse it between calls to the Constraint’s
check() method? How would such a thing be cleaned up if Accumulo is shut down?

Thanks again,

From: Parise, Jonathan []
Sent: Wednesday, September 30, 2015 9:21 AM
Subject: RE: Watching for Changes with Write Ahead Log?

In this particular case, I need to update some of my application state when changes made by
another system occur.

I would need to do a few things to accomplish my goal.

1)      Be notified or see that a table had changed

2)      Checked that against changes I know my system has made

3)      If my system is not the originator of the change, update internal state to reflect
the change.

Examples of state I may need to update include an ElasticSearch index and also an in memory

I’m going to read up on constraints again and see if I can use them for this purpose.



From: Adam Fuchs []
Sent: Tuesday, September 29, 2015 5:46 PM
Subject: Re: Watching for Changes with Write Ahead Log?


You might think about putting a constraint on your table. I think the API for constraints
is flexible enough for your purpose, but I'm not exactly sure how you would want to manage
the results / side effects of your observations.


On Tue, Sep 29, 2015 at 5:41 PM, Parise, Jonathan <<>>

I’m working on a system where generally changes to Accumulo will come through that system.
However, in some cases, another system may change data without my system being aware of it.

What I would like to do is somehow listen for changes to the tables my system cares about.
I know there is a write ahead log that I could potentially listen to for changes, but I don’t
know how to use it. I looked around for some documentation about it, and I don’t see much.
I get the impression that it isn’t really intended for this type of use case.

Does anyone have any suggestions on how to watch a table for changes and then determine if
those changes were made by a different system.

Is there some documentation about how to use the write ahead log?


Jon Parise

View raw message