jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Asynchronous indexing consistency
Date Wed, 29 May 2013 21:18:34 GMT
I've been watching this thread, and the situation sounds quite simular to
issues I was seeing in pre production load testing a bit over a year ago. I
started with Solr and switched to elastic search.

ElasticSearch uses a write ahead log on every instance in its cluster to
address this issue. I have not heard of or had problems with instances
being randomly reprovisioned/lost/trashed/gone AWOL for a time, except
where over 60% of the instances have been lost in the same incident.

The wal enables acceptance of update operations far sooner than would
otherwise be possible, and the lack of binary replication via segments
greatly reduces the requirement for hard commits, which in turn greatly
reduces the latency from the update triggering event, to the item being
available in the index. Because of all that, provided tokenising
large content bodies is excluded from the main indexing pathway, I
found that the journal to record update events prior to acceptance by
elastic search, and I was able to place that acceptance inside a
transaction boundary.

I am not certain if you can use the same technique in Solr, as IIUC it
replicates on segment fragments, after insertion, although I think one of
the commercial Lucene companies might have done something along these
lines. IIRC there is some form of wal in the last Solr 4 release.

One last point, if you are coĊ„sidering using an observation journal, please
make it distributed. IMHO the shared central journal in JR2 was alway a
bottleneck in any cluster, and although a custom version using timestamps
partially eliminated cluster wide synchronisation it never broke free of
the resolution of time synchronisation.

HTH, if not ignore.

On Thursday, May 30, 2013, Thomas Mueller wrote:

> Hi,
> >For example during large batch imports or content
> >migrations it might be useful to be able to speed things up by
> >disabling things like full text indexing.
> OK, it's good to have a concrete use case.
> For a migration, the easiest solution might be to re-create the index at
> the end. For a large import to an already large repository that might not
> be good enough; but I wonder how common that would be.
> >Or it could be that an
> >external index server like Solr is down for maintenance or other
> >reasons.
> That's also a good use case.
> > Such cases would obviously lead to some loss of
> >functionality, but probably wouldn't be too troublesome if the
> >relevant indexers were able to automatically pick up from where they
> >left.
> Yes, that's true. Specially because you can't typically predict how long
> the outage would be (not because of Solr - it might be a hardware failure
> or so).
> >>It sounds like reading with old revisions.
> >
> >Not really; let me rephrase. What I'm suggesting is something like this:
> Yes, I understand that you have used the "copy" operation to pin an old
> revision. That's nice because it doesn't need a new API; it can be
> implemented using the already existing copy operation.
> What I meant by "reading old revisions" is that we need a way to read old
> revisions. You have suggested to use the copy operation to do that, which
> is fine; another solution is to not garbage collect a certain revision;
> this would require a new API (a way to define which revision to keep), and
> the node state API would need to be extended to support using revisions
> explicitly.
> I think using the copy operation would be preferable, if we find a simple
> way to support it in the MongoMK. Possibly the MongoMK could internally
> implement the copy operation (when copying a large tree) as keeping a
> pointer to a revision. That way we don't need a new API. It would slightly
> complicate the MongoMK, specially if we need a way to change the copy.
> Just supporting a fast read-only copy (a snapshot) should be feasible I
> think.
> Regards,
> Thomas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message