lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: ways to minimize index size?
Date Wed, 14 Mar 2007 13:12:47 GMT
Store as little as possible, index as little as possible <G>.....

How big is your index, and how much do you expect it to grow?
I ask this because it's probably not worth your time to try to
reduce the index size below some threshold... I found that
reducing my index from 8G to 4G (through not stemming) gave
me about a 10% performance improvement, so at some point
it's just not worth the effort. Also, if you posted the index size,
it would give folks a chance to say "there's not much you can
gain by reducing things more". As it is, I don't have a clue
whether your index is 100M or 100T. The former is in the
"don't waste your time" class, and the latter is...er...
different....

I wouldn't bother compressing for 1%....

Question for "the guys" so I can check an assumption....
Is there any difference between these two?
Field(Name, Value, Store, index)
*<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Field.Store,%20org.apache.lucene.document.Field.Index,%20org.apache.lucene.document.Field.TermVector%29>
*Field(Name, Value, Store, index, Field.TermVector.NO)


Best
Erick

On 3/14/07, jm <jmuguruza@gmail.com> wrote:
>
> Hi,
>
> I want to make my index as small as possible. I noticed about
> field.setOmitNorms(true), I read in the list the diff is 1 byte per
> field per doc, not huge but hey...is the only effect the score being
> different? I hardly mind about the score so that would be ok.
>
> And can I add to an index without norms when it has previous doc with
> norms?
>
> Any other way to minimize size of index? Most of my fields but one are
> Field.Store.NO, Field.Index.TOKENIZED and Field.TermVector.NO, one is
> Field.Store.YES, Field.Index.UN_TOKENIZED and Field.TermVector.NO. I
> tried compressing that one and size is reduced around 1% (it's a small
> field), but I guess compression means worse performance so I am not
> sure about applying that.
>
> thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message