Hi,
I'm seeking any kind of advice that I can find to solve a problem I've
run into with using lucene.
I'm integrating lucene as an alternative to other methods of indexing
and searching that already exist in our product. Therefore it would be
best if I could make the integration of lucene live up to the existing
requirements.
What is indexed as lucene documents is structured in a tree (just like
files in a filesystem), and the feature that I am working on is
restricting a search to a certain part of this tree.
To implement this I used a PrefixQuery with the path to the folder to
search below. Since the PrefixQuery creates a boolean query with a
clause for each mathching term, this is a problem if there are more than
1024 subfolders below the selected folder.
One way of getting around this would be if maxClauseCount could be set
for a PrefixQuery, but there are problems with this.
Picking a number for this would be hard. In order to support very large
installations a value of a million or so would have to be used. This
would probably not perform very well.
The only alternative I can think of would be to store a whitespace
seperated list of all ancestors along with a document:
/foo /foo/bar /foo/bar/baz
But this has two drawbacks: Index storage space used, and the cost of
indexing (finding all ancestors).
So my question boils down to: Are there any alternatives to solve this
scenario in an efficient way?
Thanks in advance,
Dennis Thrysøe
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|