lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: indexing size
Date Wed, 01 Sep 2004 05:29:59 GMT
On Wed, 1 Sep 2004, Niraj Alok wrote
> I was also thinking on the same lines.
> Actually the original code was written by some one else who has left and so
> I have to own this.
>
> At almost all the places, it is Field.Text and at some few places its
> Field.UnIndexed.
> I looked at the javadocs and found that there is Field.UnStored also.
>
> The problem is I am not too sure which one to change to what. It would be
> really enlightening if you could point the differences
> between those three and what would I need to change in my search code.
>
> If I make some of them Field.Unstored, I can see from the javadocs that
> it will be indexed and tokenized but not stored. If it is not stored,
> how can I use it while searching? Basically what is meant by indexed and
> stored, indexed and not stored and not indexed and stored?

If all you need is to seach a field, you do not need to store it. If it is
not stored it can still be tokenised and analysed by lucene. It will then
be only stored as a set of token, but not as whole. You can thus use it
for fields that you never need to retrieve from the index.

For example:
the quick brown fox jumped over the lazy dog.

will be store in lucene only as tokens, not as a whole, so using a
whitespace analyser using a stopword list {the}:

You will have these tokens in lucene:
quick
brown
fox
jumped
over
dog

You will NOT be able to retrieve the original text, but you will be able
to search it.

HTH,
sv

>
> Regards,
> Niraj
> ----- Original Message -----
> From: "petite_abeille" <petite_abeille@mac.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, August 31, 2004 8:57 PM
> Subject: Re: indexing size
>
>
> >
> > On Aug 31, 2004, at 17:17, Otis Gospodnetic wrote:
> >
> > > You also have a large number of
> > > fields, and it looks like a lot (all?) of them are stored and indexed.
> > > That's what that large .fdt file indicated.  That file is > 206 MB in
> > > size.
> >
> > Try using Field.UnStored() to avoid storing all those data in your
> > indices as it's usually not necessary.
> >
> > PA.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message