lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Storing external transaction log-ids in lucene...
Date Tue, 15 Aug 2017 22:12:18 GMT
You're welcome; I'm glad it worked well!

Mike McCandless

http://blog.mikemccandless.com

On Fri, Aug 11, 2017 at 2:31 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Many thanks. This is a real cool feature & saves us a lot of time !!!
>
> --
> Ravi
>
> On Thu, Aug 10, 2017 at 9:40 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > That's exactly right!  That is the purpose of the sequence numbers
> returned
> > by IndexWriter mutations.  You can know exactly which ops made it into
> your
> > commit and which didn't.
> >
> > TrackingIndexWriter is replaced by the sequence numbers.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Thu, Aug 10, 2017 at 9:37 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > Many thanks for the help...
> > >
> > > Just one more clarification..
> > >
> > > I also see that a sequence number is returned from IW on all mutations
> in
> > > latest code including prepareCommit() [Apologize, as we are still on an
> > old
> > > version of lucene 4.6]
> > >
> > > So the approach will be to co-relate the seq.no returned from
> individual
> > > operations of IW with the external queue offset. Background commit
> thread
> > > can then call prepareCommit() & it will exactly know what got through.
> > Have
> > > I understood it correctly?
> > >
> > > Btw, we have a TrackingIndexWriter that returns a sequence number too,
> in
> > > 4.6. Is the new implementation something different from this?
> > >
> > > --
> > > Ravi
> > >
> > > On Thu, Aug 10, 2017 at 6:09 PM, Michael McCandless <
> > > lucene@mikemccandless.com> wrote:
> > >
> > > > IW.setCommitData (now .setLiveCommitData in 7.0) is the right way to
> > > store
> > > > this.
> > > >
> > > > Mike McCandless
> > > >
> > > > http://blog.mikemccandless.com
> > > >
> > > > On Thu, Aug 10, 2017 at 6:57 AM, Ravikumar Govindarajan <
> > > > ravikumar.govindarajan@gmail.com> wrote:
> > > >
> > > > > Every mutation (Add/Update/Delete) has a transaction-id
> (incremental
> > > > long)
> > > > > assigned by our Messaging Queue (Kafka)
> > > > >
> > > > > To index these mutations, an indexer thread pulls data from the
> > queue,
> > > > adds
> > > > > & commits to IndexWriter, then updates the latest transaction-id
in
> > an
> > > > > external system (ZooKeeper). During rollback/server-restarts, the
> > > threads
> > > > > read the previous value from ZK & resume...
> > > > >
> > > > > Have now moved to NRT implementation, where commit thread runs in
> the
> > > > > background & am unable to find the latest transaction-id that
got
> > > > committed
> > > > >
> > > > > Initially thought of adding transaction-id as a first-class field
> of
> > > the
> > > > > index itself & then writing a custom-codec to harvest it, but
this
> > > fails
> > > > > when a mutation-set consists only of deletes [as there is no way
to
> > > pass
> > > > > these ids via IW delete APIs]
> > > > >
> > > > > Is there a way to store this as part of segment-info?. Can
> > > > setCommitData()
> > > > > of IW help me here in any way?
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message