lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Split mutable logical document into two Lucene documents
Date Thu, 08 Dec 2011 17:57:38 GMT
It is conceivable that nested documents might help.
https://issues.apache.org/jira/browse/LUCENE-2454.  I don't know
anything about that so might be way off target.


--
Ian.


On Wed, Dec 7, 2011 at 8:46 PM, Brandon Mintern <mintern@easyesi.com> wrote:
> We have a document tagging system where documents are composed of two
> types of data:
>
> Rarely changed (hereafter: "immutable") data - document text and
> metadata that we upload and almost never change. The text can be
> hundreds of pages.
>
> User created (hereafter: "mutable") data - document properties that
> are set by users of our system. In total a document's properties are
> generally several dozen bytes at most. Even viewing a document changes
> the data (e.g. the document's "viewed" property.
>
>
> At present, all data is part of a single Lucene document. The problem
> is that when any piece of mutable data is updated (this happens
> relatively frequently), we have to reindex the entire document. We'd
> like to have two separate indexed Lucene documents per logical
> document, one containing the immutable data and the other containing
> the much smaller and more transient mutable data. When the mutable
> data changes, we can delete that document's mutable Lucene document
> and index a new one very quickly.
>
> There are two major difficulties when actually performing a search, though:
>
> 1. We are providing complex queries to retrieve logical documents
> based on information in either of its Lucene documents. It seems
> non-trivial to fetch a logical document in a BooleanQuery with
> Occur.MUST clauses referring to fields in both of the Lucene
> documents.
>
> 2. We need to sort results (logical document IDs) based on fields in
> either of its Lucene documents.
>
> Has anyone done anything like this before? Is there functionality I'm
> overlooking that could make this easier?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message