Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 51289 invoked from network); 10 May 2010 07:44:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 May 2010 07:44:11 -0000 Received: (qmail 72939 invoked by uid 500); 10 May 2010 07:44:10 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 72788 invoked by uid 500); 10 May 2010 07:44:09 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 72781 invoked by uid 99); 10 May 2010 07:44:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 07:44:09 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.179] (HELO mail-yx0-f179.google.com) (209.85.210.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 07:44:04 +0000 Received: by yxe9 with SMTP id 9so2097687yxe.29 for ; Mon, 10 May 2010 00:43:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.120.35 with SMTP id s35mr8382762ybc.224.1273477422471; Mon, 10 May 2010 00:43:42 -0700 (PDT) Received: by 10.151.11.20 with HTTP; Mon, 10 May 2010 00:43:42 -0700 (PDT) In-Reply-To: References: <82E4F109-B052-4AA7-950A-9B5BCEC928E2@apache.org> Date: Mon, 10 May 2010 03:43:42 -0400 Message-ID: Subject: Re: Incremental Field Updates From: Michael McCandless To: dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think another example would be the catch-all field. EG say my app concatenates the title, abstract and body of a document into the catch-all field. But now I want to change just the title. I think in theory (assuming we can work out an intuitive user-level API exposure of this...), on changing the title, we could incrementally re-index the catch-all field, to remove the old title and add the new one. Right? But: would this approach work on the positions too? EG say term X appears 4 times in the doc -- positions 1, 7, 22, 53. I've now replaced a part of this field, affecting only this term's occurrence at position 53, so that maybe that occurrence is now at 57. Are we able to handle this? One possible low-level API (I think Doug originally suggested this) might be to allow .remove() calls on the enums (like many Java collection support from their iterators) -- eg as you are stepping through the DocsEnum, you could call .remove() to complete disassociate that term from that doc, or in DocsAndPositionsEnum you could call .removePosition to remove a specific position. We would still need a higher level user API... maybe you'd provide a "text to remove" and a "text to add" for each field, pre-init'd with position/offset for the analyzer (if the field is analyzed)? I guess we'd make this a new attr on Field. Ie, Field normally contains just "text/tokens to add to the index", but we could also include a "text/tokens to remove". Or, each Field instance is marked as "to be added" vs "to be removed", and you have to add 2 Field instances to subtract then add. Mike On Sun, May 9, 2010 at 7:38 AM, Shai Erera wrote: >> When I update a field, I >> want to update all of it, not just part of it. No? > > Well ... might be. But the most common case for us are fields to which we > want to add data or remove. For example ACLs - you don't want to replace = the > entire field w/ the document, but simply to add/remove access for certain > people/groups. Same goes for "social" fields, like tags, ratings, bookmar= ks > etc. - the granularity of the update is to associate/dissociate a particu= lar > value w/ the field + doc, and not update the entire field. > > Shai > > On Sun, May 9, 2010 at 10:31 AM, Babak Farhang wrote: >> >> > No, actually, you can update index-only fields also. It all depends on >> > the >> > operation that you ask to do on the fields. >> >> I love the level of control this provides, but again, I was talking at >> the user level. >> >> > If you want to e.g. remove an entire field w/ such update operation, >> > then it >> > becomes more expensive >> >> That's the typical usage scenario, I imagine. When I update a field, I >> want to update all of it, not just part of it. No? >> >> (The lower level semantics of twiddling with the postings is poorly >> understood by the typical user..) >> >> > We didn't >> > face such a scenario though, and I think it's probably a rare one? >> >> As rare as any time you want to update an indexed-only field. =A0Not a >> serious limitation (but perhaps worth noting?) >> Perhaps at the API level, you provide an updateIndexedOnlyField that >> takes the old value as well as the new value for the field. >> >> Anyway, I think your approach would be a great addition to the >> toolkit. Would love to see it even in rough cut form :)) >> >> -Babak >> >> On Sun, May 9, 2010 at 12:49 AM, Shai Erera wrote: >> > No, actually, you can update index-only fields also. It all depends on >> > the >> > operation that you ask to do on the fields. For example, if the query = to >> > execute is something like "update all documents w/ tags:ibm -> remove >> > terms >> > t1, t2, t3 and add terms t4, t5", then the result of such request woul= d >> > dissociate t1-3 from those docs that answer tags:ibm and associate the= m >> > w/ >> > t4 and t5. Specifically, if docs 1, 4, 10 answer tags:ibm, then the >> > following posting updates will be done: >> > t1: -1, -4, -10 >> > t2: -1, -4, -10 >> > t3: -1, -4, -10 >> > t4, 1, 4, 10 >> > t5, 1, 4, 10 >> > (in addition to whatever other updates that are associated with those >> > postings). >> > >> > At search time, if you search for "t1 OR t2", then the regular t1 and = t2 >> > postings will be merged on-the-fly w/ the updated ones to remove docs = 1, >> > 4, >> > 10. >> > >> > If you want to e.g. remove an entire field w/ such update operation, >> > then it >> > becomes more expensive, but in general you'd need to iterate over the >> > field's terms and dissociate the documents from all the terms. We didn= 't >> > face such a scenario though, and I think it's probably a rare one? >> > >> > Shai >> > >> > On Sun, May 9, 2010 at 9:39 AM, Babak Farhang wrot= e: >> >> >> >> Shai, >> >> >> >> I think this is an interesting approach. I can see how I could >> >> [incrementally] update a stored, indexed field this way, but I don't >> >> understand how I could update an indexed-only field. Here's why: for = a >> >> stored (and indexed) field, I can always determine what terms to >> >> remove ('-') from the postings, but for an indexed-only field I'd hav= e >> >> no [practical] way to know.. >> >> >> >> So under this approach, =A0I'm thinking at a user level, a Lucene fie= ld >> >> would be updateable only if it's at least stored. >> >> >> >> Is that right? >> >> >> >> -Babak >> >> >> >> On Wed, May 5, 2010 at 11:17 AM, Shai Erera wrote: >> >> > Yes Mike - I don't know yet if two MPs will be used, one for the >> >> > stacked >> >> > segments and one for the general segments (which will include the >> >> > stacked >> >> > ones as well) .. feels like one MP should be enough, but this can b= e >> >> > decided >> >> > on as we progress. >> >> > >> >> > This approach allows you to update every term in every already >> >> > indexed >> >> > field, as well as add terms to already indexed fields ... and add >> >> > totally >> >> > new fields, with lots of text in them. So e.g. there are two neat u= se >> >> > cases >> >> > that come to mind: >> >> > 1) Allow users to rate search results, favor them etc. >> >> > 2) Or even comment them, >> >> > I think Google offers the 2nd. Both translate into updating the >> >> > search >> >> > result's already indexed document w/ the new rating, comment etc. w= /o >> >> > needing to reindex the document. >> >> > >> >> > I will try to find perf results numbers. It's been long time ago, >> >> > hope >> >> > all >> >> > the documents are still where they were :). Indeed, for terms like >> >> > ACLs, >> >> > it >> >> > means that each query had to merge dozens of postings lists. For th= at >> >> > I >> >> > implemented an alternative solution, which uses a payload-like >> >> > structure >> >> > that registers for each document the list of ACLs that are associat= ed >> >> > with >> >> > it (not as strings, it was more efficient). Then, if the query >> >> > included >> >> > dozens of such terms, I created a filter-like object which for ever= y >> >> > matching document by the query checked if it matches the ACLs list = of >> >> > the >> >> > document. This is usually slower, because the ACLs themselves don't >> >> > drive >> >> > the query, which means more matches will be found. That was a >> >> > tradeoff >> >> > which >> >> > one could configure based on the number of such terms in the query, >> >> > the >> >> > number of updated terms etc. >> >> > >> >> > But in essence you're right - if the solution is generic to cover a= ny >> >> > term, >> >> > we cannot use such payload-based feature. We might need to merge th= e >> >> > stacked >> >> > segments more frequently. This is pending perf testing though. >> >> > >> >> > Shai >> >> > >> >> > On Wed, May 5, 2010 at 6:54 PM, Michael McCandless >> >> > wrote: >> >> >> >> >> >> Catching up here :) >> >> >> >> >> >> This is great stuff Shai -- I like the notion of "negative" >> >> >> postings, >> >> >> that "subtract" docs from previous generations as you iterate them= . >> >> >> >> >> >> And I like the term "stacked segments". =A0This fits very well wit= h >> >> >> Lucene's write-once approach, ie, a writer can at any time stack a >> >> >> new >> >> >> segment (writes to new files) "over" an old segment, like the laye= rs >> >> >> in photoshop. =A0A reader merges all stacks on-the-fly when readin= g. >> >> >> >> >> >> And the merge policy now picks from 2 dimensions right? =A0Ie it m= ay >> >> >> want to simply consolidate stacks on an old segment but otherwise >> >> >> not >> >> >> merge that segment (eg for very large segments that have accumulat= ed >> >> >> updates), and normal merging would of course consolidate all stack= s >> >> >> for all segments merged. >> >> >> >> >> >> Wouldn't this approach conceivably allow you to alter single terms >> >> >> within a single field (we'd have to figure out how we'd expose the >> >> >> API >> >> >> for this)? =A0EG if I appended some text to an already-indexed fie= ld? >> >> >> (In addition to adding a new field to an already indexed doc, or >> >> >> replacing an indexed field on a previously indexed doc). >> >> >> >> >> >> Did you have any hard perf numbers? =A0Merge sorting N streams is >> >> >> surprisingly costly... we may need/want to have a reader pre-merge >> >> >> (using up RAM) any "long tail" of stacked segments as long as they >> >> >> are >> >> >> small enough... >> >> >> >> >> >> This sounds great!! >> >> >> >> >> >> Mike >> >> >> >> >> >> On Sun, Apr 25, 2010 at 7:32 AM, Shai Erera >> >> >> wrote: >> >> >> > Hi, >> >> >> > >> >> >> > WARNING: following email is a bit long, but I think is worth the >> >> >> > reading >> >> >> > :) >> >> >> > >> >> >> > I would like to describe my implementation of incremental field >> >> >> > updates >> >> >> > in Juru (the former search library I've worked on in IBM). I wil= l >> >> >> > later >> >> >> > discuss how I think it can be implemented in Lucene. >> >> >> > >> >> >> > The motivation/requirement came from a doc management system whi= ch >> >> >> > used >> >> >> > Juru as its search component. The system included document >> >> >> > libraries >> >> >> > where users could create files and upload documents. A user coul= d >> >> >> > belong >> >> >> > to any number of libraries and complex ACLs model was used (down >> >> >> > to >> >> >> > the >> >> >> > level of the file). ACLs and Folders were modeled as categories = in >> >> >> > the >> >> >> > index (boolean-like terms). Files and folders could be moved >> >> >> > around >> >> >> > and >> >> >> > access to a library/folder/file could be granted/revoked at any >> >> >> > given >> >> >> > time. Therefore, such updates usually affected hundreds (and >> >> >> > thousands) >> >> >> > of documents. Overall, the index managed millions of documents, >> >> >> > tens >> >> >> > of >> >> >> > thousands of libraries and hundreds of thousands of ACLs (large >> >> >> > organization :)). To get a rough understanding on the number of >> >> >> > such >> >> >> > updates - every several minutes, tens of thousands of documents >> >> >> > were >> >> >> > updated due to such changes only (in addition to the regular >> >> >> > content >> >> >> > updates). >> >> >> > >> >> >> > We were asked to support requests in the following form: "update >> >> >> > all >> >> >> > docs >> >> >> > that match --> do " where: >> >> >> > * was "a single doc", "docs belonging to a category" >> >> >> > and >> >> >> > "docs >> >> >> > belonging to a set of categories". >> >> >> > * was "add categories NEW" (NEW might not even exist >> >> >> > in >> >> >> > the >> >> >> > index yet, or already associated w/ the document), "remove >> >> >> > categories >> >> >> > OLD" >> >> >> > (irregardless if the docs were already associated w/ OLD or not) >> >> >> > and >> >> >> > "remove all OLD and add all NEW". >> >> >> > >> >> >> > The solution I've implemented to support these requests turned o= ut >> >> >> > to >> >> >> > actually allow you to update every term (!) in the index: suppos= e >> >> >> > that >> >> >> > you have a table, which recorded tuples like . >> >> >> > The >> >> >> > record <1, "ibm", '+'> means that doc 1 is associated w/ the ter= m >> >> >> > "ibm", >> >> >> > and the record <1, "hp", '-'> means that the same document is no= t >> >> >> > associated w/ the word "hp". Then, you could very easily ask for >> >> >> > all >> >> >> > documents that are assoicated w/ "hp", and the result would not >> >> >> > include >> >> >> > doc 1. Note that docid=3D1 is not the app Doc_ID, but the intern= al >> >> >> > id >> >> >> > the >> >> >> > document received. >> >> >> > >> >> >> > I've kept two types of postings in the index: regular and update= s. >> >> >> > Taking the above examples, "ibm" regular posting looked like >> >> >> > <"ibm", >> >> >> > 1, >> >> >> > 3, 1, 2, 5 ...> (dgaps) and the updates posting looked like >> >> >> > <"ibm", >> >> >> > +2, >> >> >> > -3, +6, +10 ...> (absolute docid value, w/ a +/- sign). Similarl= y >> >> >> > for >> >> >> > "hp". >> >> >> > >> >> >> > During search time, when a query with the word "ibm" was >> >> >> > submitted, I >> >> >> > create a virtual posting which reads from both the regular and t= he >> >> >> > updates, and merges them on the fly according to the +/- signs. >> >> >> > Since >> >> >> > both postings are sorted in ascending order, the merge is very >> >> >> > efficient, and query time is hardly affected. >> >> >> > >> >> >> > Those postings are merged from time to time in a process that is >> >> >> > similar >> >> >> > to how Lucene works today, which keeps the update postings >> >> >> > relatively >> >> >> > small and manageable. >> >> >> > >> >> >> > Now here comes the fun part - how I think it can be implemented = in >> >> >> > Lucene ! >> >> >> > >> >> >> > To be honest, this sat on my TODO list for a very long time and >> >> >> > only >> >> >> > a >> >> >> > couple of months ago I figured out how to implement it in Lucene= . >> >> >> > The >> >> >> > main difficulty I had was around the difference between the >> >> >> > write-once >> >> >> > policy in Juru and Lucene - in Lucene, once a segment is written= , >> >> >> > it >> >> >> > cannot be changed. BUT, I've only recently realized that this >> >> >> > isn't >> >> >> > exactly true, because deleted docs do change existing segments. >> >> >> > The >> >> >> > deletes are kept in a separate file to the segment (.del) and ha= ve >> >> >> > their >> >> >> > own generation. Deletes, as I understood then, and Grant helped = me >> >> >> > term >> >> >> > them better, can be defined as "Stacked Segments" - they add dat= a >> >> >> > to >> >> >> > a >> >> >> > segment, which from time to time are integrated into the segment >> >> >> > (unlike >> >> >> > Photoshop Layers, but my understanding of Photoshop is limited). >> >> >> > And >> >> >> > the >> >> >> > Lucene engine knows how to combine the two, giving precedence to >> >> >> > the >> >> >> > deletes. >> >> >> > >> >> >> > By introducing an "Updates Stacked Segment", we can encode >> >> >> > postings >> >> >> > w/ >> >> >> > the '+'/'-' signs, and when TermDocs/Positions is requested, we >> >> >> > can >> >> >> > create a variation which merges the two lists. When segments are >> >> >> > merged, >> >> >> > the updates will be merged into the regular postings (just like >> >> >> > deletes) >> >> >> > and thus will be gone. In addition, this plays very nicely with >> >> >> > readers >> >> >> > that are currently reading the index, as well as we can have >> >> >> > generations >> >> >> > for the updates - really like deletes ! >> >> >> > >> >> >> > I think that Lucene's architecture allows for such a solution ve= ry >> >> >> > cleanly and nicely (and I believe flex makes it even easier). We >> >> >> > can >> >> >> > (later, after you've digested the idea) discuss whether this >> >> >> > should >> >> >> > be >> >> >> > built into the current IW, or an extension like UpdateableIW. Th= e >> >> >> > API >> >> >> > I've been thinking about should really be like deletes, allowing >> >> >> > to >> >> >> > update docs based on Term or Query. I defer the API discussion f= or >> >> >> > later >> >> >> > for now. >> >> >> > >> >> >> > As for stored fields, this was a real challenge to support in >> >> >> > Juru, >> >> >> > but >> >> >> > I think that w/ "Stacked Segments" and Lucene's architecture, th= is >> >> >> > should >> >> >> > be much easier - adding stacked stored fields ... >> >> >> > >> >> >> > As you've noticed, the update postings are not DGap encoded, and >> >> >> > sign >> >> >> > needs to be preserved. While I haven't implemented it in Juru, I >> >> >> > think >> >> >> > that perhaps this can be improved by keeping the '-' and '+' lis= ts >> >> >> > separated. We will need to register somewhere which came before >> >> >> > which >> >> >> > because order matters a lot here (and I'm not talking about >> >> >> > concurrency >> >> >> > - simple update instructions order). I have some idea how this c= an >> >> >> > be >> >> >> > achieved, but I refrain from describing it now, to not make this >> >> >> > email >> >> >> > even longer :). >> >> >> > >> >> >> > I've mentioned that this approach can be applied to any term and >> >> >> > not >> >> >> > just categories under some circumstances. Basically, as soon as >> >> >> > you >> >> >> > update a term, its DF is no longer true, unless you are able to >> >> >> > take >> >> >> > the >> >> >> > updates into account. We can defer the discussion on that, but >> >> >> > clearly >> >> >> > for many fields, incrementally update them should not affect >> >> >> > precision, >> >> >> > as they're not used for that type of scoring ... Maybe, by keepi= ng >> >> >> > separate '+' and '-' lists we can compute statistics precisely. >> >> >> > And I >> >> >> > haven't given much thought yet to how this and Mike's flex scori= ng >> >> >> > will >> >> >> > be integrated. >> >> >> > >> >> >> > BTW, a word on Parallel Indexing - the two are completely >> >> >> > orthogonal. >> >> >> > Once PI is introduced, one can index all the updateable fields i= n >> >> >> > a >> >> >> > dedicated slice, for perhaps improving search performance for >> >> >> > slices >> >> >> > that are not updateable (not involving code which attempts to re= ad >> >> >> > and >> >> >> > merge update and regular lists on the fly). Also, incremental >> >> >> > field >> >> >> > updates support all of PI's scenarios, even though some will be >> >> >> > done >> >> >> > more efficiently w/ PI. But this too is a matter for a separate >> >> >> > discussion :). >> >> >> > >> >> >> > That's it ! I believe I've given you all the details I have abou= t >> >> >> > the >> >> >> > approach and high level proposed solution for Lucene. Perhaps so= me >> >> >> > details slipped my mind, but if you ask the right questions, I'm >> >> >> > sure >> >> >> > I'll be able to answer them :). I would like to emphasize that >> >> >> > since >> >> >> > this was already implemented (in Juru) - this is more than just = a >> >> >> > "I >> >> >> > think this approach can work" proposal ... >> >> >> > >> >> >> > I would appreciate your comments on this. I would like to start >> >> >> > implementing it soon, and so as a first step, please share your >> >> >> > comments >> >> >> > on the overall approach. I'll then write a more detailed >> >> >> > description >> >> >> > on >> >> >> > how I think to impl it in Lucene (been spending some time on >> >> >> > that), >> >> >> > and >> >> >> > we can have more detailed (and fun) discussions on the low level >> >> >> > details. >> >> >> > >> >> >> > Shai >> >> >> > >> >> >> > On Fri, Apr 9, 2010 at 5:05 AM, Babak Farhang >> >> >> > wrote: >> >> >> >> >> >> >> >> Good point. I meant the model at the document level: i.e. what >> >> >> >> milestones does a document go through in its life cycle. Today: >> >> >> >> >> >> >> >> created --> deleted >> >> >> >> >> >> >> >> With incremental updates: >> >> >> >> >> >> >> >> created --> update1 --> update2 --> deleted >> >> >> >> >> >> >> >> I think what I'm trying to say is that this second threaded >> >> >> >> sequence >> >> >> >> of state changes seems intuitively more fragile under concurren= t >> >> >> >> scenarios. =A0So for example, in a lock-free design, the system >> >> >> >> would >> >> >> >> also have to anticipate the following sequence of events: >> >> >> >> >> >> >> >> created --> update1 --> deleted --> update2 >> >> >> >> >> >> >> >> and consider update2 a null op. =A0I'm imagining there are othe= r >> >> >> >> cases >> >> >> >> that I can't think of.. >> >> >> >> >> >> >> >> -Babak >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Apr 6, 2010 at 3:40 AM, Michael McCandless >> >> >> >> wrote: >> >> >> >> > write once, plus the option to the app to keep multiple commi= t >> >> >> >> > points >> >> >> >> > around (by customizing the deletion policy). >> >> >> >> > >> >> >> >> > Actually order of operations / commits very much matters in >> >> >> >> > Lucene >> >> >> >> > today. >> >> >> >> > >> >> >> >> > Deletions are not idempotent: if you add a doc w/ term X, >> >> >> >> > delete >> >> >> >> > by >> >> >> >> > term X, add a new doc with term X... that's very different th= an >> >> >> >> > if >> >> >> >> > you >> >> >> >> > moved the delete op to the end. =A0Ie the deletion only appli= es >> >> >> >> > to >> >> >> >> > the >> >> >> >> > docs added before it. >> >> >> >> > >> >> >> >> > Mike >> >> >> >> > >> >> >> >> > On Mon, Apr 5, 2010 at 12:45 AM, Babak Farhang >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> Sure. Because of the write once principle. =A0But at some co= st >> >> >> >> >> (duplicated data). I was just agreeing that it would not be = a >> >> >> >> >> good >> >> >> >> >> idea to bake in version-ing by keeping the layers around >> >> >> >> >> forever >> >> >> >> >> in >> >> >> >> >> a >> >> >> >> >> merged index; I wasn't keying in on transactions per se. >> >> >> >> >> >> >> >> >> >> Speaking of transactions: I'm not sure if we should worry >> >> >> >> >> about >> >> >> >> >> this >> >> >> >> >> much yet, but with "updates" the order of the transaction >> >> >> >> >> commits >> >> >> >> >> seems important. I think commit order is less important toda= y >> >> >> >> >> in >> >> >> >> >> Lucene because its model supports only 2 types of events: >> >> >> >> >> document >> >> >> >> >> creation--which only happens once, and document deletion, >> >> >> >> >> which >> >> >> >> >> is >> >> >> >> >> idempotent. =A0What do you think? Will commits have to be >> >> >> >> >> ordered >> >> >> >> >> if >> >> >> >> >> we >> >> >> >> >> introduce updates? =A0Or does the onus of maintaining order = fall >> >> >> >> >> on >> >> >> >> >> the >> >> >> >> >> application? >> >> >> >> >> >> >> >> >> >> -Babak >> >> >> >> >> >> >> >> >> >> On Sat, Apr 3, 2010 at 3:28 AM, Michael McCandless >> >> >> >> >> wrote: >> >> >> >> >>> On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang >> >> >> >> >>> >> >> >> >> >>> wrote: >> >> >> >> >>>>> I think they get merged in by the merger, ideally in the >> >> >> >> >>>>> background. >> >> >> >> >>>> >> >> >> >> >>>> That sounds sensible. (In other words, we wont concern >> >> >> >> >>>> ourselves >> >> >> >> >>>> with >> >> >> >> >>>> roll backs--something possible while a "layer" is still >> >> >> >> >>>> around.) >> >> >> >> >>> >> >> >> >> >>> Actually roll backs would still be very possible even if >> >> >> >> >>> layers >> >> >> >> >>> are >> >> >> >> >>> merged. >> >> >> >> >>> >> >> >> >> >>> Ie, one could keep multiple commits around, and the older >> >> >> >> >>> commits >> >> >> >> >>> would still be referring to the old postings + layers, >> >> >> >> >>> keeping >> >> >> >> >>> them >> >> >> >> >>> alive. >> >> >> >> >>> >> >> >> >> >>> Lucene would still be transactional with such an approach. >> >> >> >> >>> >> >> >> >> >>> Mike >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> -----------------------------------------------------------= ---------- >> >> >> >> >>> To unsubscribe, e-mail: >> >> >> >> >>> java-dev-unsubscribe@lucene.apache.org >> >> >> >> >>> For additional commands, e-mail: >> >> >> >> >>> java-dev-help@lucene.apache.org >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------= --------- >> >> >> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.o= rg >> >> >> >> >> For additional commands, e-mail: >> >> >> >> >> java-dev-help@lucene.apache.org >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > -------------------------------------------------------------= -------- >> >> >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.or= g >> >> >> >> > For additional commands, e-mail: >> >> >> >> > java-dev-help@lucene.apache.org >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ---------------------------------------------------------------= ------ >> >> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.or= g >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------= --- >> >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> >> >> For additional commands, e-mail: dev-help@lucene.apache.org >> >> >> >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: dev-help@lucene.apache.org >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: dev-help@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org