incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Breno Faria <>
Subject Pipelines for text processing
Date Wed, 08 Aug 2012 08:33:01 GMT
Hi everyone,

I am evaluating whether s4 would be a suitable backend for text processing in an enterprise
search context and I've come across one sentence in the documentation which raised a concern
that I'd like to address here.

The sentence is the following: "The JoinPE will fail to join properly if multiple events arrive
to one slot and some of the other slots are empty."

One of the nice properties of s4's architecture is that there is no need to pass around a
document model to each processor in a pipeliny fashion -- instead, processors could generate
annotations which after being possibly processed further are merged at a later point before
being written to an index. I cannot afford loosing documents or annotations in the processing
though. Is there a way to guarantee eventual consistency?


View raw message