lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: indexing size
Date Wed, 01 Sep 2004 07:18:28 GMT
Hi Niraj,

I'd rather respond to the list as others may be interested in your
questions, and since I don't consider myself a guru, I appreciate being
corrected.

For a title, I'd say yes, use the Field Text(String name, String value)
constructor. Not the others that use a reader as they do not store the
value.

You want for it to be:
1) tokenised (so to have its fragments saved for searching, not only the
totality of the text)
2) indexed so to make it searchable
3) store as to make the field retrievable from the index

hth,
sv
p.s. my name is Stephane, it's been a while since I've been in Oz
that I haven't been called James

On Wed, 1 Sep 2004, Niraj Alok wrote:

> Hi James,
>
> Since this would be a minor issue hence I am not posting it on the lucene.
>
> Lets say I have one field as "title" which has a value of "George Bush".
> I would need to search on that title and also retrieve its value. So you are
> saying that I should have it as Field.Text?
>
> Also, if I need to just search on that "title" but want to retrieve the
> value of another field "content", then title should be unstored while
> content should be stored?
>
> Regards,
> Niraj
> ----- Original Message -----
> From: "Stephane James Vaucher" <vauchers@cirano.qc.ca>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, September 01, 2004 10:59 AM
> Subject: Re: indexing size
>
>
> > On Wed, 1 Sep 2004, Niraj Alok wrote
> > > I was also thinking on the same lines.
> > > Actually the original code was written by some one else who has left and
> so
> > > I have to own this.
> > >
> > > At almost all the places, it is Field.Text and at some few places its
> > > Field.UnIndexed.
> > > I looked at the javadocs and found that there is Field.UnStored also.
> > >
> > > The problem is I am not too sure which one to change to what. It would
> be
> > > really enlightening if you could point the differences
> > > between those three and what would I need to change in my search code.
> > >
> > > If I make some of them Field.Unstored, I can see from the javadocs that
> > > it will be indexed and tokenized but not stored. If it is not stored,
> > > how can I use it while searching? Basically what is meant by indexed and
> > > stored, indexed and not stored and not indexed and stored?
> >
> > If all you need is to seach a field, you do not need to store it. If it is
> > not stored it can still be tokenised and analysed by lucene. It will then
> > be only stored as a set of token, but not as whole. You can thus use it
> > for fields that you never need to retrieve from the index.
> >
> > For example:
> > the quick brown fox jumped over the lazy dog.
> >
> > will be store in lucene only as tokens, not as a whole, so using a
> > whitespace analyser using a stopword list {the}:
> >
> > You will have these tokens in lucene:
> > quick
> > brown
> > fox
> > jumped
> > over
> > dog
> >
> > You will NOT be able to retrieve the original text, but you will be able
> > to search it.
> >
> > HTH,
> > sv
> >
> > >
> > > Regards,
> > > Niraj
> > > ----- Original Message -----
> > > From: "petite_abeille" <petite_abeille@mac.com>
> > > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > > Sent: Tuesday, August 31, 2004 8:57 PM
> > > Subject: Re: indexing size
> > >
> > >
> > > >
> > > > On Aug 31, 2004, at 17:17, Otis Gospodnetic wrote:
> > > >
> > > > > You also have a large number of
> > > > > fields, and it looks like a lot (all?) of them are stored and
> indexed.
> > > > > That's what that large .fdt file indicated.  That file is > 206
MB
> in
> > > > > size.
> > > >
> > > > Try using Field.UnStored() to avoid storing all those data in your
> > > > indices as it's usually not necessary.
> > > >
> > > > PA.
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message