accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: WAL Log - Real-time updates
Date Wed, 22 Oct 2014 18:43:54 GMT
I tried to design the replication implementation to be relatively flexible in what the act
of "replication" actually looks like. In short, you write an implementation that will get
some context information about new information in the system that will be run.

https://github.com/apache/accumulo/blob/3107627b778e3093d95777a4313277305cd0aaa2/core/src/main/java/org/apache/accumulo/core/client/replication/ReplicaSystem.java

Be aware though, this isn't a substitute for a trigger, and may not actually meet your needs
of "realtime". By default, it would be order of minutes before your implementation would be
triggered. You could tweak some configuration parameters down to 10's of seconds, but you
would incur some more load by repeatedly scanning the Accumulo replication table.

If you just want notification of *any* data being written to a table, I think you could do
this pretty easily. Inspecting the new data that has arrived and make some data-aware notification
would be more difficult but likely still feasible.

D P wrote:
> The lily indexer/SEP is really interesting.  Thanks for both of your posts
>
> On Wed, Oct 22, 2014 at 2:07 PM, Sean Busbey <busbey@cloudera.com 
> <mailto:busbey@cloudera.com>> wrote:
>
>     the way this gets done in HBase, i.e. for the HBase Lily
>     Indexer[1], is to add a replication consumer that isn't an actual
>     cluster. IMHO, you'd be better off taking that kind of approach
>     rather than trying to consume the WALs off of HDFS. I haven't
>     attempted to use our replication interface for this yet, but in
>     principle it should work.
>
>     Note that either of these approaches are going to be very fragile
>     across Accumulo versions because they aren't interfaces intended
>     for consumption.
>
>     [1]: http://ngdata.github.io/hbase-indexer/
>
>     On Wed, Oct 22, 2014 at 12:59 PM, D P <pacificobuzz@gmail.com
>     <mailto:pacificobuzz@gmail.com>> wrote:
>
>         I am working with Accumulo and looking for the best means of
>         knowing when something has been updated/inserted into my
>         Accumulo instance.  For instance, every-time data is inserted,
>         how can I know externally?  If the write-ahead log file stores
>         this, is it best to just read the HDFS WAL log with a storm
>         spout to know when something has been inserted into a table?
>
>         I am planning to do some real-time visualization with
>         accumulo, but when data is inserted I want to be able to
>         notify my UI.
>
>         Thanks!
>
>
>
>
>     -- 
>     Sean
>
>

Mime
View raw message