lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jong Kim <jong.luc...@gmail.com>
Subject Re-indexing a particular field only without re-indexing the entire enclosing document in the index
Date Mon, 23 Apr 2012 15:31:58 GMT
Hi,

I'm sure that this is very common use case that probably hundreds of people
have asked the same question in the past, but I haven't been able to find
an exact answer to my question.

I have a system where each document in the Lucene index comprises of at
least one field containing very large number of terms (for example, entire
text from the content of potentially very large text files) and another
metadata field that is much smaller. The first field is rarely modified
hence remains mostly static, while the second field is modified very
frequently.

Currently, I'm re-indexing the entire Lucene document whenever the value of
the second field changes from the source side. Needless to say, this yields
very inefficient system, because significant amount of the system resources
are being wasted in effectively re-indexing what has not changed.

Is there any good way to solve this design problem? Obviously, an
alternative design would be to split the index into two, and maintain
static (and large) data in one index and the other dynamic part in the
other index. However, this approach is not acceptable due to our data
pattern where the match on the first index yields very large result set,
and filtering them against the second index is very inefficient due to high
ratio of disjoint data. In other word, while the alternate approach
significantly reduces the indexing-time overhead, resulting search is
unacceptably expensive.

Any design help would be highly appreciated.

Thanks
/Jong

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message