lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Index Ratio
Date Thu, 25 Jun 2009 02:43:37 GMT
I was actually suggesting that you build synthetic documents so that you
know *exactly* that these documents exist and have known values for every
field.  Your test is good, but not comprehensive since it doesn't test every
field and one of the best ways to get a small index is to only index a few
fields with small values.

On Wed, Jun 24, 2009 at 7:39 PM, pof <> wrote:

> (it is also very helpful to have some test documents with extraordinary
> values in key fields so that you can verify indexing and retrieval.  These
> are called tracer bullets in some quarters and it is handy to have at least
> one such tracer per input file.  You can also add corpus meta-data this way
> (n documents for file f).  If you put a special field on these documents
> you
> can include or exclude them from your retrievals with essentially no cost)
> I have done this to a small extent (Search for a few unique terms like a
> one
> off email address etc.) but I will give it more of a go.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message