lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4272) another idea for updatable fields
Date Mon, 30 Jul 2012 20:39:36 GMT


Robert Muir commented on LUCENE-4272:

We'd also need to open up the TV APIs so we can get TVs for a doc in the current segment,
for the case where app adds a doc and later (before flush), replaces some fields.

Realistically I'd like to support that anyway for the norms case so that codecs can index
term impacts (LUCENE-4198),
as this is going to involve length normalization in addition to TF. But currently the postings
writer has no way
to "see" this.

So it would be nice if we could do solve that too, then we wouldnt need norms/dvs in the vectors
(they are already per-doc).
This would make for a faster way of updating docvalues fields: for that specific case I think
more can be done
but it would be an improvement and fit well.

> another idea for updatable fields
> ---------------------------------
>                 Key: LUCENE-4272
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing from stored
fields, I think they solve a lot of the problem: users don't have to "rebuild" their document
from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the unaffected
> The basic idea is that we would require term vectors for this approach (as the already
store a serialized indexed version of the doc), and so we could just take the other pieces
from the existing vectors for the doc.
> I think we would have to extend vectors to also store the norm (so we dont recompute
that), and payloads, but it seems feasible at a glance.
> I dont think we should discard the idea because vectors are slow/big today, this seems
like something we could fix.
> Personally I like the idea of not slowing down search performance to solve the problem,
I think we should really start from that angle and work towards making the indexing side more
efficient, not vice-versa.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message