lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregory Tarr" <Gregory.t...@detica.com>
Subject Index doubling in size when adding extra terms
Date Wed, 15 Jul 2009 10:48:41 GMT
I have added a new field to each document in my index containing
substrings of another field to speed up initial-wildcard searches.

Each document has a field "text" which might contain "the quick brown
fox jumped over the lazy dogs"
The new field - "text_substrings" would then contain "the quick uick ick
brown rown own fox jumped umped mped ped over ver the lazy azy dogs ogs"

This allows me to convert initial wildcard queries "*own" into a term
query "own".

However adding this field has exactly doubled the size of the index.
Given that the term list is a small fraction of the index (?), I find
this strange. I think it might be storing the documents twice.

Is there any way to stop this from happening?

Thanks

Greg Tarr




This message should be regarded as confidential. If you have received this email in error
please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by an authorised
signatory.  The contents of this email may relate to dealings with other companies within
the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message