lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuliya Palchaninava ...@solute.de>
Subject AW: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index
Date Mon, 11 Jan 2010 17:55:36 GMT
Thanks again.

Disabling norms, where it was possible without influencing the search quality,
has solved the problem:
- The not optimized version of the index has become smaller.
- The optimized index has practically the same size as the not optimized one.

Yuliya

> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com] 
> Gesendet: Freitag, 8. Januar 2010 14:38
> An: java-user@lucene.apache.org
> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as 
> large as the not optimized index
> 
> Lucene stores 1 byte (disk and RAM, when searching that 
> field) per document for any field that has norms enabled, 
> even for documents that do not contain that field.
> 
> In your case, that's ~20 MB per field (once optimize is done), times
> 559 fields = ~11TB of storage.
> 
> You should index these fields with 
> Field.Index.ANALYZED_NO_NORMS to turn off norms.  But, this 
> means field/doc boosting, and the normal length boosting 
> Lucene normally does (shorter documents get a better score), 
> will be silently disabled.  Also: you must fully re-index 
> from scratch, otherwise the norms will turn themselves back 
> on when segments merge together.
> 
> Mike
> 
> On Fri, Jan 8, 2010 at 7:55 AM, Yuliya Palchaninava 
> <yp@solute.de> wrote:
> > Thanks Michael.
> >
> > You are probably wright.
> >
> > Not optimized size is 4.1G, optimized index is about 15G.
> >
> > Yes, our documents do have many different indexed fields 
> and norms are enabled.
> > Nr of fields: 559
> > Nr of documents: 20845906
> > Nr of terms: 25615389
> >
> > Could you please give me a more detailled explanation, how 
> the storage of norms effects the size of an index.
> > What do you mean exactly with "norms are not stored sparsely"?
> >
> > Thanks,
> > Yuliya
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> >> Gesendet: Donnerstag, 7. Januar 2010 18:00
> >> An: java-user@lucene.apache.org
> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice 
> as large 
> >> as the not optimized index
> >>
> >> Do your documents have many different indexed fields?  If 
> you do, and 
> >> norms are enabled, that could be the cause (norms are not stored 
> >> sparsely).
> >>
> >> But: what actual sizes are we talking about?
> >>
> >> Mike
> >>
> >> On Thu, Jan 7, 2010 at 11:50 AM, Yuliya Palchaninava 
> <yp@solute.de> 
> >> wrote:
> >> > Otis,
> >> >
> >> > thanks for the answer.
> >> >
> >> > Unfortunatelly the index *directory* remains larger *after"
> >> the optimization.
> >> > In our case the otimization was/is completed successfully
> >> and, as you
> >> > say, there is only one segment in the directory.
> >> >
> >> > Some other ideas?
> >> >
> >> > Thanks,
> >> > Yuliya
> >> >
> >> >> -----Ursprüngliche Nachricht-----
> >> >> Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> >> >> Gesendet: Donnerstag, 7. Januar 2010 17:35
> >> >> An: java-user@lucene.apache.org
> >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice
> >> as large
> >> >> as the not optimized index
> >> >>
> >> >> Yuliya,
> >> >>
> >> >> The index *directory* will be larger *while* you are optimizing.
> >> >> After the optimization is completed successfully, the
> >> index directory
> >> >> will be smaller.  It is possible that your index directory is
> >> >> large(r) because you have some left-over segments (e.g. 
> from some 
> >> >> earlier failed/interrupted optimizations) that are not
> >> really a part
> >> >> of the index.  After optimizing, you should have only 1
> >> segment, so
> >> >> if you see more than 1 segment, look at the ones with older 
> >> >> timestamps.  Those can be (re)moved.
> >> >>
> >> >>  Otis
> >> >> --
> >> >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >> > From: Yuliya Palchaninava <yp@solute.de>
> >> >> > To: "java-user@lucene.apache.org" 
> <java-user@lucene.apache.org>
> >> >> > Sent: Thu, January 7, 2010 11:23:08 AM
> >> >> > Subject: Lucene 2.9 and 3.0: Optimized index is thrice as
> >> >> large as the
> >> >> > not optimized index
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > According to the api documentation: "In general, once
> >> the optimize
> >> >> > completes, the total size of the index will be less than
> >> >> the size of
> >> >> > the starting index. It could be quite a bit smaller (if
> >> there were
> >> >> > many pending deletes) or just slightly smaller". In our
> >> >> case the index
> >> >> > becomes not smaller but larger, namely thrice as large.
> >> >> >
> >> >> > The not optimized index doesn't contain compressed fields,
> >> >> what could
> >> >> > have caused the growth of the index due to the
> >> otimization. So we
> >> >> > cannot explain what happens.
> >> >> >
> >> >> > Does someone have an explanation for the index growth due
> >> >> to the optimization?
> >> >> >
> >> >> > Thanks,
> >> >> > Yuliya
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> 
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.apache.org
> >> >> > For additional commands, e-mail: 
> >> >> > java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >>
> >> 
> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: 
> java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> 
> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message