accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Trigger for Accumulo table
Date Tue, 08 Dec 2015 23:28:17 GMT
I totally agree, Christopher. I have also run into a few situations where
it would have been nice to have something like a mutation listener hook.
Particularly in generating indexing and stats records.

Adam


On Tue, Dec 8, 2015 at 5:59 PM, Christopher <ctubbsii@apache.org> wrote:

> In the future, it might be useful to provide a supported API hook here. It
> certainly would've made implementing replication easier, but could also be
> useful as a notification system.
>
> On Tue, Dec 8, 2015 at 4:51 PM Keith Turner <keith@deenlo.com> wrote:
>
>> Constraints are checked before data is written.  In the case of failures
>> a constraint may see data thats never successfully written.
>>
>> On Tue, Dec 8, 2015 at 4:18 PM, Christopher <ctubbsii@apache.org> wrote:
>>
>>> Look at org.apache.accumulo.core.constraints.Constraint for a
>>> description and
>>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint as an example.
>>>
>>> In short, Mutations which are live-ingested into a tablet server are
>>> validated against constraints you specify on the table. That means that all
>>> Mutations written to a table go through this bit of user-provided code at
>>> least once. You could use that fact to your advantage. However, this would
>>> be highly experimental and might have some caveats to consider.
>>>
>>> You can configure a constraint on a table with
>>> connector.tableOperations().addConstraint(...)
>>>
>>>
>>> On Sun, Dec 6, 2015 at 10:49 PM Thai Ngo <baothaingo@gmail.com> wrote:
>>>
>>>> Christopher,
>>>>
>>>> This is interesting! Could you please give me more details about this?
>>>>
>>>> Thanks,
>>>> Thai
>>>>
>>>> On Thu, Dec 3, 2015 at 12:17 PM, Christopher <ctubbsii@apache.org>
>>>> wrote:
>>>>
>>>>> You could also implement a constraint to notify an external system
>>>>> when a row is updated.
>>>>>
>>>>> On Wed, Dec 2, 2015, 22:54 Josh Elser <josh.elser@gmail.com> wrote:
>>>>>
>>>>>> oops :)
>>>>>>
>>>>>> [1] http://fluo.io/
>>>>>>
>>>>>> Josh Elser wrote:
>>>>>> > Hi Thai,
>>>>>> >
>>>>>> > There is no out-of-the-box feature provided with Accumulo that
does
>>>>>> what
>>>>>> > you're asking for. Accumulo doesn't provide any functionality
to
>>>>>> push
>>>>>> > notifications to other systems. You could potentially maintain
other
>>>>>> > tables/columns in which you maintain the last time a row was
>>>>>> updated,
>>>>>> > but the onus is on your "other services" to read the table to
find
>>>>>> out
>>>>>> > when a change occurred (which is probably not scalable at "real
>>>>>> time").
>>>>>> >
>>>>>> > There are other systems you could likely leverage to solve this,
>>>>>> > depending on the durability and scalability that your application
>>>>>> needs.
>>>>>> >
>>>>>> > For a system "close" to Accumulo, you could take a look at Fluo
[1]
>>>>>> > which is an implementation of Google's "Percolator" system.
This is
>>>>>> a
>>>>>> > system based on throughput rather than low-latency, so it may
not
>>>>>> be a
>>>>>> > good fit for your needs. There are probably other systems in
the
>>>>>> Apache
>>>>>> > ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?) that
are
>>>>>> be
>>>>>> > helpful to your problem. I'm not an expert on these to recommend
on
>>>>>> (nor
>>>>>> > do I think I understand your entire architecture well enough).
>>>>>> >
>>>>>> > Thai Ngo wrote:
>>>>>> >> Hi list,
>>>>>> >>
>>>>>> >> I have a use-case when existing rows in a table will be
updated by
>>>>>> an
>>>>>> >> internal service. Data in a row of this table is composed
of 2
>>>>>> parts:
>>>>>> >> 1st part - immutable and the 2nd one - will be updated (filled
in)
>>>>>> a
>>>>>> >> little later.
>>>>>> >>
>>>>>> >> Currently, I have a need of knowing when and which rows
will be
>>>>>> updated
>>>>>> >> in the table so that other services will be wisely start
consuming
>>>>>> the
>>>>>> >> data. It will make more sense when I need to consume the
data in
>>>>>> near
>>>>>> >> realtime. So developing a notification function or simpler
- a
>>>>>> trigger
>>>>>> >> is what I really want to do now.
>>>>>> >>
>>>>>> >> I am curious to know if someone has done similar job or
there are
>>>>>> >> features or APIs or best practices available for Accumulo
so far.
>>>>>> I'm
>>>>>> >> thinking of letting the internal service which updates the
data
>>>>>> notify
>>>>>> >> us whenever it updates the data.
>>>>>> >>
>>>>>> >> What do you think?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Thai
>>>>>>
>>>>>
>>>>
>>

Mime
View raw message