incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Pipelines for text processing
Date Thu, 09 Aug 2012 17:20:03 GMT
Hi Breno,

currently, S4 does not guarantee that no events can be lost in case of 
failures or load shedding (due to queues overflow).

Regarding the sentence that you point out, it refers to a specific 
implementation of a join operation, but alternative implementations 
could certainly overcome this limitation.

Regards,

Matthieu


On 8/8/12 10:33 AM, Breno Faria wrote:
> Hi everyone,
>
> I am evaluating whether s4 would be a suitable backend for text
> processing in an enterprise search context and I've come across one
> sentence in the documentation which raised a concern that I'd like to
> address here.
>
> The sentence is the following: "TheJoinPEwill fail to join properly if
> multiple events arrive to one slot and some of the other slots are empty."
>
> One of the nice properties of s4's architecture is that there is no need
> to pass around a document model to each processor in a /pipeliny/
> fashion -- instead, processors could generate annotations which after
> being possibly processed further are merged at a later point before
> being written to an index. I cannot afford loosing documents or
> annotations in the processing though. Is there a way to guarantee
> eventual consistency?
>
> Thanks!
>
> Breno
>


Mime
View raw message