Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E2A318B9C for ; Tue, 8 Dec 2015 22:59:46 +0000 (UTC) Received: (qmail 14289 invoked by uid 500); 8 Dec 2015 22:59:45 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 14236 invoked by uid 500); 8 Dec 2015 22:59:45 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 14226 invoked by uid 99); 8 Dec 2015 22:59:45 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Dec 2015 22:59:45 +0000 Received: from mail-vk0-f47.google.com (mail-vk0-f47.google.com [209.85.213.47]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 684491A0048 for ; Tue, 8 Dec 2015 22:59:45 +0000 (UTC) Received: by vkay187 with SMTP id y187so32085083vka.3 for ; Tue, 08 Dec 2015 14:59:44 -0800 (PST) X-Received: by 10.31.8.147 with SMTP id 141mr1943067vki.33.1449615584000; Tue, 08 Dec 2015 14:59:44 -0800 (PST) MIME-Version: 1.0 References: <565FBC11.30802@gmail.com> <565FBCE0.8050904@gmail.com> In-Reply-To: From: Christopher Date: Tue, 08 Dec 2015 22:59:34 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Trigger for Accumulo table To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a1144f130f2070905266aed58 --001a1144f130f2070905266aed58 Content-Type: text/plain; charset=UTF-8 In the future, it might be useful to provide a supported API hook here. It certainly would've made implementing replication easier, but could also be useful as a notification system. On Tue, Dec 8, 2015 at 4:51 PM Keith Turner wrote: > Constraints are checked before data is written. In the case of failures a > constraint may see data thats never successfully written. > > On Tue, Dec 8, 2015 at 4:18 PM, Christopher wrote: > >> Look at org.apache.accumulo.core.constraints.Constraint for a description >> and org.apache.accumulo.core.constraints.DefaultKeySizeConstraint as an >> example. >> >> In short, Mutations which are live-ingested into a tablet server are >> validated against constraints you specify on the table. That means that all >> Mutations written to a table go through this bit of user-provided code at >> least once. You could use that fact to your advantage. However, this would >> be highly experimental and might have some caveats to consider. >> >> You can configure a constraint on a table with >> connector.tableOperations().addConstraint(...) >> >> >> On Sun, Dec 6, 2015 at 10:49 PM Thai Ngo wrote: >> >>> Christopher, >>> >>> This is interesting! Could you please give me more details about this? >>> >>> Thanks, >>> Thai >>> >>> On Thu, Dec 3, 2015 at 12:17 PM, Christopher >>> wrote: >>> >>>> You could also implement a constraint to notify an external system when >>>> a row is updated. >>>> >>>> On Wed, Dec 2, 2015, 22:54 Josh Elser wrote: >>>> >>>>> oops :) >>>>> >>>>> [1] http://fluo.io/ >>>>> >>>>> Josh Elser wrote: >>>>> > Hi Thai, >>>>> > >>>>> > There is no out-of-the-box feature provided with Accumulo that does >>>>> what >>>>> > you're asking for. Accumulo doesn't provide any functionality to push >>>>> > notifications to other systems. You could potentially maintain other >>>>> > tables/columns in which you maintain the last time a row was updated, >>>>> > but the onus is on your "other services" to read the table to find >>>>> out >>>>> > when a change occurred (which is probably not scalable at "real >>>>> time"). >>>>> > >>>>> > There are other systems you could likely leverage to solve this, >>>>> > depending on the durability and scalability that your application >>>>> needs. >>>>> > >>>>> > For a system "close" to Accumulo, you could take a look at Fluo [1] >>>>> > which is an implementation of Google's "Percolator" system. This is a >>>>> > system based on throughput rather than low-latency, so it may not be >>>>> a >>>>> > good fit for your needs. There are probably other systems in the >>>>> Apache >>>>> > ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?) that are be >>>>> > helpful to your problem. I'm not an expert on these to recommend on >>>>> (nor >>>>> > do I think I understand your entire architecture well enough). >>>>> > >>>>> > Thai Ngo wrote: >>>>> >> Hi list, >>>>> >> >>>>> >> I have a use-case when existing rows in a table will be updated by >>>>> an >>>>> >> internal service. Data in a row of this table is composed of 2 >>>>> parts: >>>>> >> 1st part - immutable and the 2nd one - will be updated (filled in) a >>>>> >> little later. >>>>> >> >>>>> >> Currently, I have a need of knowing when and which rows will be >>>>> updated >>>>> >> in the table so that other services will be wisely start consuming >>>>> the >>>>> >> data. It will make more sense when I need to consume the data in >>>>> near >>>>> >> realtime. So developing a notification function or simpler - a >>>>> trigger >>>>> >> is what I really want to do now. >>>>> >> >>>>> >> I am curious to know if someone has done similar job or there are >>>>> >> features or APIs or best practices available for Accumulo so far. >>>>> I'm >>>>> >> thinking of letting the internal service which updates the data >>>>> notify >>>>> >> us whenever it updates the data. >>>>> >> >>>>> >> What do you think? >>>>> >> >>>>> >> Thanks, >>>>> >> Thai >>>>> >>>> >>> > --001a1144f130f2070905266aed58 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
In the future, it might be useful to provide a supported A= PI hook here. It certainly would've made implementing replication easie= r, but could also be useful as a notification system.

On Tue, Dec 8, 2015 at 4:51 PM Keith Tu= rner <keith@deenlo.com> wrote= :
Constraints are = checked before data is written.=C2=A0 In the case of failures a constraint = may see data thats never successfully written.

On Tue, Dec 8, 2015 at 4:18 PM, Chri= stopher <ctubbsii@apache.org> wrote:
Look at org.apache.accumulo.core.c= onstraints.Constraint for a description and org.apache.accumulo.core.constr= aints.DefaultKeySizeConstraint as an example.

In short, Mutati= ons which are live-ingested into a tablet server are validated against cons= traints you specify on the table. That means that all Mutations written to = a table go through this bit of user-provided code at least once. You could = use that fact to your advantage. However, this would be highly experimental= and might have some caveats to consider.

You can configure a = constraint on a table with connector.tableOperations().addConstraint(...)


= On Sun, Dec 6, 2015 at 10:49 PM Thai Ngo <baothaingo@gmail.com> wrote:
Christopher,

This is interesting! Could you please give me more details about this?

Thanks,
Thai
=

On Thu, Dec 3, 20= 15 at 12:17 PM, Christopher <ctubbsii@apache.org> wrote:

You could also implement a = constraint to notify an external system when a row is updated.


On Wed, Dec 2, 2015, 22:54= =C2=A0Josh Elser <josh.elser@gmail.com> wrote:
oops :)

[1] http:/= /fluo.io/

Josh Elser wrote:
> Hi Thai,
>
> There is no out-of-the-box feature provided with Accumulo that does wh= at
> you're asking for. Accumulo doesn't provide any functionality = to push
> notifications to other systems. You could potentially maintain other > tables/columns in which you maintain the last time a row was updated,<= br> > but the onus is on your "other services" to read the table t= o find out
> when a change occurred (which is probably not scalable at "real t= ime").
>
> There are other systems you could likely leverage to solve this,
> depending on the durability and scalability that your application need= s.
>
> For a system "close" to Accumulo, you could take a look at F= luo [1]
> which is an implementation of Google's "Percolator" syst= em. This is a
> system based on throughput rather than low-latency, so it may not be a=
> good fit for your needs. There are probably other systems in the Apach= e
> ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?) that are be<= br> > helpful to your problem. I'm not an expert on these to recommend o= n (nor
> do I think I understand your entire architecture well enough).
>
> Thai Ngo wrote:
>> Hi list,
>>
>> I have a use-case when existing rows in a table will be updated by= an
>> internal service. Data in a row of this table is composed of 2 par= ts:
>> 1st part - immutable and the 2nd one - will be updated (filled in)= a
>> little later.
>>
>> Currently, I have a need of knowing when and which rows will be up= dated
>> in the table so that other services will be wisely start consuming= the
>> data. It will make more sense when I need to consume the data in n= ear
>> realtime. So developing a notification function or simpler - a tri= gger
>> is what I really want to do now.
>>
>> I am curious to know if someone has done similar job or there are<= br> >> features or APIs or best practices available for Accumulo so far. = I'm
>> thinking of letting the internal service which updates the data no= tify
>> us whenever it updates the data.
>>
>> What do you think?
>>
>> Thanks,
>> Thai


--001a1144f130f2070905266aed58--