lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4272) another idea for updatable fields
Date Mon, 30 Jul 2012 19:53:37 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425159#comment-13425159
] 

Michael McCandless commented on LUCENE-4272:
--------------------------------------------

This is an interesting idea!  And it makes sense to factor this down from ElasticSearch/Solr.

So we have the codec approach (LUCENE-3837), the stacked-segments approach (LUCENE-4258),
and this new approach (copy over already-inverted fields).

We could quite efficiently add the already-inverted doc (term vectors) to the in-memory postings.
 And then there'd be zero impact to search performance, and no (well, small) index format
changes.

The only downside is the use case of replacing tiny fields on otherwise massive docs: in this
case the other approaches would be faster at indexing (but still slower at searching).  I
agree not slowing down search is a big plus for this approach.

We'd also need to open up the TV APIs so we can get TVs for a doc in the current segment,
for the case where app adds a doc and later (before flush), replaces some fields.  And we
need to pool readers in IW so the updates can on-demand resolve the Term to docIDs.  Hmm and
we'd need to be able to do so for the in-memory segment (I think we should not support replaceFields
by Query for starters).
                
> another idea for updatable fields
> ---------------------------------
>
>                 Key: LUCENE-4272
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4272
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing from stored
fields, I think they solve a lot of the problem: users don't have to "rebuild" their document
from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the unaffected
fields.
> The basic idea is that we would require term vectors for this approach (as the already
store a serialized indexed version of the doc), and so we could just take the other pieces
from the existing vectors for the doc.
> I think we would have to extend vectors to also store the norm (so we dont recompute
that), and payloads, but it seems feasible at a glance.
> I dont think we should discard the idea because vectors are slow/big today, this seems
like something we could fix.
> Personally I like the idea of not slowing down search performance to solve the problem,
I think we should really start from that angle and work towards making the indexing side more
efficient, not vice-versa.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message