lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Incremental Field Updates
Date Tue, 11 May 2010 04:26:32 GMT
>
> but because of the cost of preparing the inputs (i.e. text
> extraction) to Lucene.
>

You're right ! That and also the cost of fetching the document, in systems
where the content lives on other servers/systems. Reindexing is usually
(depends on your analysis chain) the cheapest step.

Shai

On Tue, May 11, 2010 at 7:22 AM, Babak Farhang <farhang@gmail.com> wrote:

> >> My take on it is that if someone wants to update the catch-all field,
> then
> >> reindexing the document may not be such a bad idea anyway. The purpose
> of
> >> those incremental updates is to cope w/ high frequency of updates, which
> >> usually happen on metadata fields, and not title.
> >
> > I agree.
>
> I too agree with the general gist of this argument.
>
> As an aside, just to add another dimension to this discussion (perhaps
> now the net is cast too wide), Lucene users often want incremental
> updates not because of the cost of reindexing the document inside
> Lucene, but because of the cost of preparing the inputs (i.e. text
> extraction) to Lucene.
>
>
> On Mon, May 10, 2010 at 2:40 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
> > On Mon, May 10, 2010 at 4:05 AM, Shai Erera <serera@gmail.com> wrote:
> >> That's an interesting scenario Mike.
> >>
> >> Previously, I only handled boolean-like terms, as the scenarios we were
> >> asked to support involved just those types of terms. Obviously, when the
> >> approach allows for more, more scenarios pop to mind :).
> >
> > OK.
> >
> >> I think we may still be able to resolve that case, but it becomes much
> more
> >> complicated. My design approach of adding the +/- affected the entire
> >> posting element, whereas the scenario you describe affects the positions
> of
> >> the posting element. This calls for a more complicated design and
> solution.
> >
> > Right.
> >
> >> My take on it is that if someone wants to update the catch-all field,
> then
> >> reindexing the document may not be such a bad idea anyway. The purpose
> of
> >> those incremental updates is to cope w/ high frequency of updates, which
> >> usually happen on metadata fields, and not title.
> >
> > I agree.
> >
> >> But since one could add the 'tags' to the catch-all field as well, it
> brings
> >> us to the same point - how do I remove the positions of term X that
> relate
> >> to the tag X and not the potentially original term X that existed in the
> >> document?
> >>
> >> This is a very advanced case (and interesting). I don't want to hold up
> the
> >> discussion on it, but want to make sure we do not deviate from getting
> the
> >> more simpler cases in first. Depending on the API, this might be very
> easy
> >> to solve, but might also complicate matters. Maybe, for a
> >> incr-field-updates-v1, we can do without it?
> >
> > Definitely, let's take this (incrementally updating the positions as
> > well) out of scope for the first cut, when we actually start building
> > things.  One simple way to do this might be to only allow incremental
> > update on fields that have omitTFAP=true.
> >
> > When brainstorming/designing a new feature, I like to cast a wide net
> > during the discussion/thinking (what we are doing now), but then when
> > it comes to what to actually build for phase one well pull it way back
> > in and aim for baby steps / progress not perfection.  We are able to
> > do much more imagining than we can actually writing code :)
> >
> > The wide net during brainstorming gives us a better view/context of
> > the road ahead, eg to validate that the baby step is in the right
> > direction, so that it doesn't preclude other things we might imagine
> > later.
> >
> > In this case, it does sound like the approach should work (in theory)
> > fine w/ positions, too.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message