lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Empty fields ...
Date Tue, 18 Jul 2006 17:08:53 GMT
Quoting the guys "it depends" <G>...

At root, a filter is a bitset. So size-wise, you are using 1 bit/doc (plus
some small overhead). Both the storage required and the time to construct
are dependent on the characteristics of your corpus. I guess the only way
you can answer that for your particular situation is to test with your
corpus. I can say that I was surprised at how very fast constructing a
filter was in my situation. Which has no relevance to your situation <G>....

More of "it depends" is the fluidity of your index. If you construct it once
and don't modify it, you could consider storing your filters permanently.
Either in files or as "special documents" in your index or perhaps even in a
meta-data index. You can store documents of meta-data just by putting in
fields that are in none of your other documents..... Deletions/additions and
re-optimizations will affect the internal lucene doc IDs, so you've got to
be careful here about synchronization...

You could consider constructing your filters all in a bunch when you open
your searcher. Again, depending upon whether you modify your searcher often
will determine whether you want to do this or not.

What I'd really recommend is that you start by constructing your filters on
the fly, without even a caching wrapper and get some timings, mostly for
your peace of mind. I'd also do some timings when combining filters, just
for yucks.. There's no reason not to use a caching wrapper if you expect to
use these filters, which will load the first user with a delay, but you can
warm up your filters by issuing some canned queries upon startup....

Only if constructing any filters on the fly and using a caching wrapper
proves unsatisfactory would I move on to any kind of permanent storage.
Premature optimization and all that....

So, I don't have a good answer since I don't have a detailed knowledge of
your problem, but it should be relatively easy for you to get a sense of
whether this is a reasonable approach or not.

Hope this helps

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message