lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Zavorin <izavo...@caci.com>
Subject can I make incremental index/search more efficient?
Date Tue, 21 Feb 2012 21:09:29 GMT
I have a fairly straightforward task: I have a collection of N documents and a set of "hot"
words. I need to find all occurrences of these words in all the docs.



The original use case was that I would get all the docs at once. In this case, I:

1. Create a single index for all the docs

2. Loop over all hot words. For each word, I find all hits in all the docs

3. I collect and rearrange the hit info to have all hits for each of the indexed doc



However, it looks like there might be a different use case: the user might want to add one
document at a time to the collection and see the search results immediately. So for this case
I am now doing the following:

1. Loop over docs i = 1 : N. For each doc:

1.1 If i == 1 then create index else update index

1.2 Loop over all hot words. For each word, find all hits in all the docs that have been indexed
so far, i.e. docs 1 through i

1.3 Collect and rearrange



Of course, this is not particularly efficient, especially because I am forced to do a lot
or redundant work by searching though docs 1:i instead of just i at each iteration. This is
because, if I understand it corrently, I can't specify "search only the part of index that
corresponds to doc X". Or can I?



Is there any way to make this incremental index/search more efficient? For instance, is it
at all possible to restrict where in the index a search for hits is performed? Or any other
optimization?



Thanks much



Ilya Zavorin
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message