Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTilnJ7aHqUrUn936mAv8N2J_iODdZJ3p-3s3yNan@mail.gmail.com>
References: <82E4F109-B052-4AA7-950A-9B5BCEC928E2@apache.org>
	 <k2v9ac0c6aa1004060240m67e2d0ffy696a4301fa3fa804@mail.gmail.com>
	 <g2xb0f7deba1004081905r3acd3398ib56505c181c069d7@mail.gmail.com>
	 <u2s786fde51004250432gd50bec64m9b2f6ee6dd495987@mail.gmail.com>
	 <z2z9ac0c6aa1005050854i2394ba48ka5b2112a36b44f7a@mail.gmail.com>
	 <j2v786fde51005051017n37743f88s3c645f581c93f49@mail.gmail.com>
	 <r2vb0f7deba1005082339tdc57575auc0d67f344214b5d8@mail.gmail.com>
	 <AANLkTimACic2rK1zQR-Wt_lzkAq_12mgEWhFoF8dIm52@mail.gmail.com>
	 <m2vb0f7deba1005090031w9e1a47afxfc862c025905ffa9@mail.gmail.com>
	 <AANLkTilnJ7aHqUrUn936mAv8N2J_iODdZJ3p-3s3yNan@mail.gmail.com>
Date: Mon, 10 May 2010 03:43:42 -0400
Message-ID: <m2t9ac0c6aa1005100043s71957a89qa194781310c75c42@mail.gmail.com>
Subject: Re: Incremental Field Updates
From: Michael McCandless <lucene@mikemccandless.com>
To: dev@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I think another example would be the catch-all field.

EG say my app concatenates the title, abstract and body of a document
into the catch-all field.

But now I want to change just the title.

I think in theory (assuming we can work out an intuitive user-level
API exposure of this...), on changing the title, we could
incrementally re-index the catch-all field, to remove the old title
and add the new one.  Right?

But: would this approach work on the positions too?

EG say term X appears 4 times in the doc -- positions 1, 7, 22, 53.
I've now replaced a part of this field, affecting only this term's
occurrence at position 53, so that maybe that occurrence is now at 57.
 Are we able to handle this?

One possible low-level API (I think Doug originally suggested this)
might be to allow .remove() calls on the enums (like many Java
collection support from their iterators) -- eg as you are stepping
through the DocsEnum, you could call .remove() to complete
disassociate that term from that doc, or in DocsAndPositionsEnum you
could call .removePosition to remove a specific position.

We would still need a higher level user API... maybe you'd provide a
"text to remove" and a "text to add" for each field, pre-init'd with
position/offset for the analyzer (if the field is analyzed)?  I guess
we'd make this a new attr on Field.  Ie, Field normally contains just
"text/tokens to add to the index", but we could also include a
"text/tokens to remove".  Or, each Field instance is marked as "to be
added" vs "to be removed", and you have to add 2 Field instances to
subtract then add.

Mike

On Sun, May 9, 2010 at 7:38 AM, Shai Erera <serera@gmail.com> wrote:
>> When I update a field, I
>> want to update all of it, not just part of it. No?
>
> Well ... might be. But the most common case for us are fields to which we
> want to add data or remove. For example ACLs - you don't want to replace =
the
> entire field w/ the document, but simply to add/remove access for certain
> people/groups. Same goes for "social" fields, like tags, ratings, bookmar=
ks
> etc. - the granularity of the update is to associate/dissociate a particu=
lar
> value w/ the field + doc, and not update the entire field.
>
> Shai
>
> On Sun, May 9, 2010 at 10:31 AM, Babak Farhang <farhang@gmail.com> wrote:
>>
>> > No, actually, you can update index-only fields also. It all depends on
>> > the
>> > operation that you ask to do on the fields.
>>
>> I love the level of control this provides, but again, I was talking at
>> the user level.
>>
>> > If you want to e.g. remove an entire field w/ such update operation,
>> > then it
>> > becomes more expensive
>>
>> That's the typical usage scenario, I imagine. When I update a field, I
>> want to update all of it, not just part of it. No?
>>
>> (The lower level semantics of twiddling with the postings is poorly
>> understood by the typical user..)
>>
>> > We didn't
>> > face such a scenario though, and I think it's probably a rare one?
>>
>> As rare as any time you want to update an indexed-only field. =A0Not a
>> serious limitation (but perhaps worth noting?)
>> Perhaps at the API level, you provide an updateIndexedOnlyField that
>> takes the old value as well as the new value for the field.
>>
>> Anyway, I think your approach would be a great addition to the
>> toolkit. Would love to see it even in rough cut form :))
>>
>> -Babak
>>
>> On Sun, May 9, 2010 at 12:49 AM, Shai Erera <serera@gmail.com> wrote:
>> > No, actually, you can update index-only fields also. It all depends on
>> > the
>> > operation that you ask to do on the fields. For example, if the query =
to
>> > execute is something like "update all documents w/ tags:ibm -> remove
>> > terms
>> > t1, t2, t3 and add terms t4, t5", then the result of such request woul=
d
>> > dissociate t1-3 from those docs that answer tags:ibm and associate the=
m
>> > w/
>> > t4 and t5. Specifically, if docs 1, 4, 10 answer tags:ibm, then the
>> > following posting updates will be done:
>> > t1: -1, -4, -10
>> > t2: -1, -4, -10
>> > t3: -1, -4, -10
>> > t4, 1, 4, 10
>> > t5, 1, 4, 10
>> > (in addition to whatever other updates that are associated with those
>> > postings).
>> >
>> > At search time, if you search for "t1 OR t2", then the regular t1 and =
t2
>> > postings will be merged on-the-fly w/ the updated ones to remove docs =
1,
>> > 4,
>> > 10.
>> >
>> > If you want to e.g. remove an entire field w/ such update operation,
>> > then it
>> > becomes more expensive, but in general you'd need to iterate over the
>> > field's terms and dissociate the documents from all the terms. We didn=
't
>> > face such a scenario though, and I think it's probably a rare one?
>> >
>> > Shai
>> >
>> > On Sun, May 9, 2010 at 9:39 AM, Babak Farhang <farhang@gmail.com> wrot=
e:
>> >>
>> >> Shai,
>> >>
>> >> I think this is an interesting approach. I can see how I could
>> >> [incrementally] update a stored, indexed field this way, but I don't
>> >> understand how I could update an indexed-only field. Here's why: for =
a
>> >> stored (and indexed) field, I can always determine what terms to
>> >> remove ('-') from the postings, but for an indexed-only field I'd hav=
e
>> >> no [practical] way to know..
>> >>
>> >> So under this approach, =A0I'm thinking at a user level, a Lucene fie=
ld
>> >> would be updateable only if it's at least stored.
>> >>
>> >> Is that right?
>> >>
>> >> -Babak
>> >>
>> >> On Wed, May 5, 2010 at 11:17 AM, Shai Erera <serera@gmail.com> wrote:
>> >> > Yes Mike - I don't know yet if two MPs will be used, one for the
>> >> > stacked
>> >> > segments and one for the general segments (which will include the
>> >> > stacked
>> >> > ones as well) .. feels like one MP should be enough, but this can b=
e
>> >> > decided
>> >> > on as we progress.
>> >> >
>> >> > This approach allows you to update every term in every already
>> >> > indexed
>> >> > field, as well as add terms to already indexed fields ... and add
>> >> > totally
>> >> > new fields, with lots of text in them. So e.g. there are two neat u=
se
>> >> > cases
>> >> > that come to mind:
>> >> > 1) Allow users to rate search results, favor them etc.
>> >> > 2) Or even comment them,
>> >> > I think Google offers the 2nd. Both translate into updating the
>> >> > search
>> >> > result's already indexed document w/ the new rating, comment etc. w=
/o
>> >> > needing to reindex the document.
>> >> >
>> >> > I will try to find perf results numbers. It's been long time ago,
>> >> > hope
>> >> > all
>> >> > the documents are still where they were :). Indeed, for terms like
>> >> > ACLs,
>> >> > it
>> >> > means that each query had to merge dozens of postings lists. For th=
at
>> >> > I
>> >> > implemented an alternative solution, which uses a payload-like
>> >> > structure
>> >> > that registers for each document the list of ACLs that are associat=
ed
>> >> > with
>> >> > it (not as strings, it was more efficient). Then, if the query
>> >> > included
>> >> > dozens of such terms, I created a filter-like object which for ever=
y
>> >> > matching document by the query checked if it matches the ACLs list =
of
>> >> > the
>> >> > document. This is usually slower, because the ACLs themselves don't
>> >> > drive
>> >> > the query, which means more matches will be found. That was a
>> >> > tradeoff
>> >> > which
>> >> > one could configure based on the number of such terms in the query,
>> >> > the
>> >> > number of updated terms etc.
>> >> >
>> >> > But in essence you're right - if the solution is generic to cover a=
ny
>> >> > term,
>> >> > we cannot use such payload-based feature. We might need to merge th=
e
>> >> > stacked
>> >> > segments more frequently. This is pending perf testing though.
>> >> >
>> >> > Shai
>> >> >
>> >> > On Wed, May 5, 2010 at 6:54 PM, Michael McCandless
>> >> > <lucene@mikemccandless.com> wrote:
>> >> >>
>> >> >> Catching up here :)
>> >> >>
>> >> >> This is great stuff Shai -- I like the notion of "negative"
>> >> >> postings,
>> >> >> that "subtract" docs from previous generations as you iterate them=
.
>> >> >>
>> >> >> And I like the term "stacked segments". =A0This fits very well wit=
h
>> >> >> Lucene's write-once approach, ie, a writer can at any time stack a
>> >> >> new
>> >> >> segment (writes to new files) "over" an old segment, like the laye=
rs
>> >> >> in photoshop. =A0A reader merges all stacks on-the-fly when readin=
g.
>> >> >>
>> >> >> And the merge policy now picks from 2 dimensions right? =A0Ie it m=
ay
>> >> >> want to simply consolidate stacks on an old segment but otherwise
>> >> >> not
>> >> >> merge that segment (eg for very large segments that have accumulat=
ed
>> >> >> updates), and normal merging would of course consolidate all stack=
s
>> >> >> for all segments merged.
>> >> >>
>> >> >> Wouldn't this approach conceivably allow you to alter single terms
>> >> >> within a single field (we'd have to figure out how we'd expose the
>> >> >> API
>> >> >> for this)? =A0EG if I appended some text to an already-indexed fie=
ld?
>> >> >> (In addition to adding a new field to an already indexed doc, or
>> >> >> replacing an indexed field on a previously indexed doc).
>> >> >>
>> >> >> Did you have any hard perf numbers? =A0Merge sorting N streams is
>> >> >> surprisingly costly... we may need/want to have a reader pre-merge
>> >> >> (using up RAM) any "long tail" of stacked segments as long as they
>> >> >> are
>> >> >> small enough...
>> >> >>
>> >> >> This sounds great!!
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Sun, Apr 25, 2010 at 7:32 AM, Shai Erera <serera@gmail.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > WARNING: following email is a bit long, but I think is worth the
>> >> >> > reading
>> >> >> > :)
>> >> >> >
>> >> >> > I would like to describe my implementation of incremental field
>> >> >> > updates
>> >> >> > in Juru (the former search library I've worked on in IBM). I wil=
l
>> >> >> > later
>> >> >> > discuss how I think it can be implemented in Lucene.
>> >> >> >
>> >> >> > The motivation/requirement came from a doc management system whi=
ch
>> >> >> > used
>> >> >> > Juru as its search component. The system included document
>> >> >> > libraries
>> >> >> > where users could create files and upload documents. A user coul=
d
>> >> >> > belong
>> >> >> > to any number of libraries and complex ACLs model was used (down
>> >> >> > to
>> >> >> > the
>> >> >> > level of the file). ACLs and Folders were modeled as categories =
in
>> >> >> > the
>> >> >> > index (boolean-like terms). Files and folders could be moved
>> >> >> > around
>> >> >> > and
>> >> >> > access to a library/folder/file could be granted/revoked at any
>> >> >> > given
>> >> >> > time. Therefore, such updates usually affected hundreds (and
>> >> >> > thousands)
>> >> >> > of documents. Overall, the index managed millions of documents,
>> >> >> > tens
>> >> >> > of
>> >> >> > thousands of libraries and hundreds of thousands of ACLs (large
>> >> >> > organization :)). To get a rough understanding on the number of
>> >> >> > such
>> >> >> > updates - every several minutes, tens of thousands of documents
>> >> >> > were
>> >> >> > updated due to such changes only (in addition to the regular
>> >> >> > content
>> >> >> > updates).
>> >> >> >
>> >> >> > We were asked to support requests in the following form: "update
>> >> >> > all
>> >> >> > docs
>> >> >> > that match <criteria> --> do <operation>" where:
>> >> >> > * <criteria> was "a single doc", "docs belonging to a category"
>> >> >> > and
>> >> >> > "docs
>> >> >> > belonging to a set of categories".
>> >> >> > * <operation> was "add categories NEW" (NEW might not even exist
>> >> >> > in
>> >> >> > the
>> >> >> > index yet, or already associated w/ the document), "remove
>> >> >> > categories
>> >> >> > OLD"
>> >> >> > (irregardless if the docs were already associated w/ OLD or not)
>> >> >> > and
>> >> >> > "remove all OLD and add all NEW".
>> >> >> >
>> >> >> > The solution I've implemented to support these requests turned o=
ut
>> >> >> > to
>> >> >> > actually allow you to update every term (!) in the index: suppos=
e
>> >> >> > that
>> >> >> > you have a table, which recorded tuples like <docid, term, +/->.
>> >> >> > The
>> >> >> > record <1, "ibm", '+'> means that doc 1 is associated w/ the ter=
m
>> >> >> > "ibm",
>> >> >> > and the record <1, "hp", '-'> means that the same document is no=
t
>> >> >> > associated w/ the word "hp". Then, you could very easily ask for
>> >> >> > all
>> >> >> > documents that are assoicated w/ "hp", and the result would not
>> >> >> > include
>> >> >> > doc 1. Note that docid=3D1 is not the app Doc_ID, but the intern=
al
>> >> >> > id
>> >> >> > the
>> >> >> > document received.
>> >> >> >
>> >> >> > I've kept two types of postings in the index: regular and update=
s.
>> >> >> > Taking the above examples, "ibm" regular posting looked like
>> >> >> > <"ibm",
>> >> >> > 1,
>> >> >> > 3, 1, 2, 5 ...> (dgaps) and the updates posting looked like
>> >> >> > <"ibm",
>> >> >> > +2,
>> >> >> > -3, +6, +10 ...> (absolute docid value, w/ a +/- sign). Similarl=
y
>> >> >> > for
>> >> >> > "hp".
>> >> >> >
>> >> >> > During search time, when a query with the word "ibm" was
>> >> >> > submitted, I
>> >> >> > create a virtual posting which reads from both the regular and t=
he
>> >> >> > updates, and merges them on the fly according to the +/- signs.
>> >> >> > Since
>> >> >> > both postings are sorted in ascending order, the merge is very
>> >> >> > efficient, and query time is hardly affected.
>> >> >> >
>> >> >> > Those postings are merged from time to time in a process that is
>> >> >> > similar
>> >> >> > to how Lucene works today, which keeps the update postings
>> >> >> > relatively
>> >> >> > small and manageable.
>> >> >> >
>> >> >> > Now here comes the fun part - how I think it can be implemented =
in
>> >> >> > Lucene !
>> >> >> >
>> >> >> > To be honest, this sat on my TODO list for a very long time and
>> >> >> > only
>> >> >> > a
>> >> >> > couple of months ago I figured out how to implement it in Lucene=
.
>> >> >> > The
>> >> >> > main difficulty I had was around the difference between the
>> >> >> > write-once
>> >> >> > policy in Juru and Lucene - in Lucene, once a segment is written=
,
>> >> >> > it
>> >> >> > cannot be changed. BUT, I've only recently realized that this
>> >> >> > isn't
>> >> >> > exactly true, because deleted docs do change existing segments.
>> >> >> > The
>> >> >> > deletes are kept in a separate file to the segment (.del) and ha=
ve
>> >> >> > their
>> >> >> > own generation. Deletes, as I understood then, and Grant helped =
me
>> >> >> > term
>> >> >> > them better, can be defined as "Stacked Segments" - they add dat=
a
>> >> >> > to
>> >> >> > a
>> >> >> > segment, which from time to time are integrated into the segment
>> >> >> > (unlike
>> >> >> > Photoshop Layers, but my understanding of Photoshop is limited).
>> >> >> > And
>> >> >> > the
>> >> >> > Lucene engine knows how to combine the two, giving precedence to
>> >> >> > the
>> >> >> > deletes.
>> >> >> >
>> >> >> > By introducing an "Updates Stacked Segment", we can encode
>> >> >> > postings
>> >> >> > w/
>> >> >> > the '+'/'-' signs, and when TermDocs/Positions is requested, we
>> >> >> > can
>> >> >> > create a variation which merges the two lists. When segments are
>> >> >> > merged,
>> >> >> > the updates will be merged into the regular postings (just like
>> >> >> > deletes)
>> >> >> > and thus will be gone. In addition, this plays very nicely with
>> >> >> > readers
>> >> >> > that are currently reading the index, as well as we can have
>> >> >> > generations
>> >> >> > for the updates - really like deletes !
>> >> >> >
>> >> >> > I think that Lucene's architecture allows for such a solution ve=
ry
>> >> >> > cleanly and nicely (and I believe flex makes it even easier). We
>> >> >> > can
>> >> >> > (later, after you've digested the idea) discuss whether this
>> >> >> > should
>> >> >> > be
>> >> >> > built into the current IW, or an extension like UpdateableIW. Th=
e
>> >> >> > API
>> >> >> > I've been thinking about should really be like deletes, allowing
>> >> >> > to
>> >> >> > update docs based on Term or Query. I defer the API discussion f=
or
>> >> >> > later
>> >> >> > for now.
>> >> >> >
>> >> >> > As for stored fields, this was a real challenge to support in
>> >> >> > Juru,
>> >> >> > but
>> >> >> > I think that w/ "Stacked Segments" and Lucene's architecture, th=
is
>> >> >> > should
>> >> >> > be much easier - adding stacked stored fields ...
>> >> >> >
>> >> >> > As you've noticed, the update postings are not DGap encoded, and
>> >> >> > sign
>> >> >> > needs to be preserved. While I haven't implemented it in Juru, I
>> >> >> > think
>> >> >> > that perhaps this can be improved by keeping the '-' and '+' lis=
ts
>> >> >> > separated. We will need to register somewhere which came before
>> >> >> > which
>> >> >> > because order matters a lot here (and I'm not talking about
>> >> >> > concurrency
>> >> >> > - simple update instructions order). I have some idea how this c=
an
>> >> >> > be
>> >> >> > achieved, but I refrain from describing it now, to not make this
>> >> >> > email
>> >> >> > even longer :).
>> >> >> >
>> >> >> > I've mentioned that this approach can be applied to any term and
>> >> >> > not
>> >> >> > just categories under some circumstances. Basically, as soon as
>> >> >> > you
>> >> >> > update a term, its DF is no longer true, unless you are able to
>> >> >> > take
>> >> >> > the
>> >> >> > updates into account. We can defer the discussion on that, but
>> >> >> > clearly
>> >> >> > for many fields, incrementally update them should not affect
>> >> >> > precision,
>> >> >> > as they're not used for that type of scoring ... Maybe, by keepi=
ng
>> >> >> > separate '+' and '-' lists we can compute statistics precisely.
>> >> >> > And I
>> >> >> > haven't given much thought yet to how this and Mike's flex scori=
ng
>> >> >> > will
>> >> >> > be integrated.
>> >> >> >
>> >> >> > BTW, a word on Parallel Indexing - the two are completely
>> >> >> > orthogonal.
>> >> >> > Once PI is introduced, one can index all the updateable fields i=
n
>> >> >> > a
>> >> >> > dedicated slice, for perhaps improving search performance for
>> >> >> > slices
>> >> >> > that are not updateable (not involving code which attempts to re=
ad
>> >> >> > and
>> >> >> > merge update and regular lists on the fly). Also, incremental
>> >> >> > field
>> >> >> > updates support all of PI's scenarios, even though some will be
>> >> >> > done
>> >> >> > more efficiently w/ PI. But this too is a matter for a separate
>> >> >> > discussion :).
>> >> >> >
>> >> >> > That's it ! I believe I've given you all the details I have abou=
t
>> >> >> > the
>> >> >> > approach and high level proposed solution for Lucene. Perhaps so=
me
>> >> >> > details slipped my mind, but if you ask the right questions, I'm
>> >> >> > sure
>> >> >> > I'll be able to answer them :). I would like to emphasize that
>> >> >> > since
>> >> >> > this was already implemented (in Juru) - this is more than just =
a
>> >> >> > "I
>> >> >> > think this approach can work" proposal ...
>> >> >> >
>> >> >> > I would appreciate your comments on this. I would like to start
>> >> >> > implementing it soon, and so as a first step, please share your
>> >> >> > comments
>> >> >> > on the overall approach. I'll then write a more detailed
>> >> >> > description
>> >> >> > on
>> >> >> > how I think to impl it in Lucene (been spending some time on
>> >> >> > that),
>> >> >> > and
>> >> >> > we can have more detailed (and fun) discussions on the low level
>> >> >> > details.
>> >> >> >
>> >> >> > Shai
>> >> >> >
>> >> >> > On Fri, Apr 9, 2010 at 5:05 AM, Babak Farhang <farhang@gmail.com=
>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Good point. I meant the model at the document level: i.e. what
>> >> >> >> milestones does a document go through in its life cycle. Today:
>> >> >> >>
>> >> >> >> created --> deleted
>> >> >> >>
>> >> >> >> With incremental updates:
>> >> >> >>
>> >> >> >> created --> update1 --> update2 --> deleted
>> >> >> >>
>> >> >> >> I think what I'm trying to say is that this second threaded
>> >> >> >> sequence
>> >> >> >> of state changes seems intuitively more fragile under concurren=
t
>> >> >> >> scenarios. =A0So for example, in a lock-free design, the system
>> >> >> >> would
>> >> >> >> also have to anticipate the following sequence of events:
>> >> >> >>
>> >> >> >> created --> update1 --> deleted --> update2
>> >> >> >>
>> >> >> >> and consider update2 a null op. =A0I'm imagining there are othe=
r
>> >> >> >> cases
>> >> >> >> that I can't think of..
>> >> >> >>
>> >> >> >> -Babak
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Apr 6, 2010 at 3:40 AM, Michael McCandless
>> >> >> >> <lucene@mikemccandless.com> wrote:
>> >> >> >> > write once, plus the option to the app to keep multiple commi=
t
>> >> >> >> > points
>> >> >> >> > around (by customizing the deletion policy).
>> >> >> >> >
>> >> >> >> > Actually order of operations / commits very much matters in
>> >> >> >> > Lucene
>> >> >> >> > today.
>> >> >> >> >
>> >> >> >> > Deletions are not idempotent: if you add a doc w/ term X,
>> >> >> >> > delete
>> >> >> >> > by
>> >> >> >> > term X, add a new doc with term X... that's very different th=
an
>> >> >> >> > if
>> >> >> >> > you
>> >> >> >> > moved the delete op to the end. =A0Ie the deletion only appli=
es
>> >> >> >> > to
>> >> >> >> > the
>> >> >> >> > docs added before it.
>> >> >> >> >
>> >> >> >> > Mike
>> >> >> >> >
>> >> >> >> > On Mon, Apr 5, 2010 at 12:45 AM, Babak Farhang
>> >> >> >> > <farhang@gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >> Sure. Because of the write once principle. =A0But at some co=
st
>> >> >> >> >> (duplicated data). I was just agreeing that it would not be =
a
>> >> >> >> >> good
>> >> >> >> >> idea to bake in version-ing by keeping the layers around
>> >> >> >> >> forever
>> >> >> >> >> in
>> >> >> >> >> a
>> >> >> >> >> merged index; I wasn't keying in on transactions per se.
>> >> >> >> >>
>> >> >> >> >> Speaking of transactions: I'm not sure if we should worry
>> >> >> >> >> about
>> >> >> >> >> this
>> >> >> >> >> much yet, but with "updates" the order of the transaction
>> >> >> >> >> commits
>> >> >> >> >> seems important. I think commit order is less important toda=
y
>> >> >> >> >> in
>> >> >> >> >> Lucene because its model supports only 2 types of events:
>> >> >> >> >> document
>> >> >> >> >> creation--which only happens once, and document deletion,
>> >> >> >> >> which
>> >> >> >> >> is
>> >> >> >> >> idempotent. =A0What do you think? Will commits have to be
>> >> >> >> >> ordered
>> >> >> >> >> if
>> >> >> >> >> we
>> >> >> >> >> introduce updates? =A0Or does the onus of maintaining order =
fall
>> >> >> >> >> on
>> >> >> >> >> the
>> >> >> >> >> application?
>> >> >> >> >>
>> >> >> >> >> -Babak
>> >> >> >> >>
>> >> >> >> >> On Sat, Apr 3, 2010 at 3:28 AM, Michael McCandless
>> >> >> >> >> <lucene@mikemccandless.com> wrote:
>> >> >> >> >>> On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang
>> >> >> >> >>> <farhang@gmail.com>
>> >> >> >> >>> wrote:
>> >> >> >> >>>>> I think they get merged in by the merger, ideally in the
>> >> >> >> >>>>> background.
>> >> >> >> >>>>
>> >> >> >> >>>> That sounds sensible. (In other words, we wont concern
>> >> >> >> >>>> ourselves
>> >> >> >> >>>> with
>> >> >> >> >>>> roll backs--something possible while a "layer" is still
>> >> >> >> >>>> around.)
>> >> >> >> >>>
>> >> >> >> >>> Actually roll backs would still be very possible even if
>> >> >> >> >>> layers
>> >> >> >> >>> are
>> >> >> >> >>> merged.
>> >> >> >> >>>
>> >> >> >> >>> Ie, one could keep multiple commits around, and the older
>> >> >> >> >>> commits
>> >> >> >> >>> would still be referring to the old postings + layers,
>> >> >> >> >>> keeping
>> >> >> >> >>> them
>> >> >> >> >>> alive.
>> >> >> >> >>>
>> >> >> >> >>> Lucene would still be transactional with such an approach.
>> >> >> >> >>>
>> >> >> >> >>> Mike
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>> -----------------------------------------------------------=
----------
>> >> >> >> >>> To unsubscribe, e-mail:
>> >> >> >> >>> java-dev-unsubscribe@lucene.apache.org
>> >> >> >> >>> For additional commands, e-mail:
>> >> >> >> >>> java-dev-help@lucene.apache.org
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------=
---------
>> >> >> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.o=
rg
>> >> >> >> >> For additional commands, e-mail:
>> >> >> >> >> java-dev-help@lucene.apache.org
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > -------------------------------------------------------------=
--------
>> >> >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.or=
g
>> >> >> >> > For additional commands, e-mail:
>> >> >> >> > java-dev-help@lucene.apache.org
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ---------------------------------------------------------------=
------
>> >> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.or=
g
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------=
---
>> >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >> >>
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org