Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 55270 invoked from network); 1 Sep 2004 07:19:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 1 Sep 2004 07:19:02 -0000 Received: (qmail 30505 invoked by uid 500); 1 Sep 2004 07:18:36 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 30474 invoked by uid 500); 1 Sep 2004 07:18:35 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 30458 invoked by uid 99); 1 Sep 2004 07:18:35 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [207.162.57.26] (HELO hercule.cirano.qc.ca) (207.162.57.26) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 01 Sep 2004 00:18:31 -0700 Received: from localhost (vauchers@localhost) by hercule.cirano.qc.ca (8.11.6/8.11.0) with ESMTP id i817ITZ18127 for ; Wed, 1 Sep 2004 03:18:29 -0400 X-Authentication-Warning: hercule.cirano.qc.ca: vauchers owned process doing -bs Date: Wed, 1 Sep 2004 03:18:28 -0400 (EDT) From: Stephane James Vaucher To: Lucene Users List Subject: Re: indexing size In-Reply-To: <01c901c48ff1$da596b00$b101a8c0@emacmillan.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Niraj, I'd rather respond to the list as others may be interested in your questions, and since I don't consider myself a guru, I appreciate being corrected. For a title, I'd say yes, use the Field Text(String name, String value) constructor. Not the others that use a reader as they do not store the value. You want for it to be: 1) tokenised (so to have its fragments saved for searching, not only the totality of the text) 2) indexed so to make it searchable 3) store as to make the field retrievable from the index hth, sv p.s. my name is Stephane, it's been a while since I've been in Oz that I haven't been called James On Wed, 1 Sep 2004, Niraj Alok wrote: > Hi James, > > Since this would be a minor issue hence I am not posting it on the lucene. > > Lets say I have one field as "title" which has a value of "George Bush". > I would need to search on that title and also retrieve its value. So you are > saying that I should have it as Field.Text? > > Also, if I need to just search on that "title" but want to retrieve the > value of another field "content", then title should be unstored while > content should be stored? > > Regards, > Niraj > ----- Original Message ----- > From: "Stephane James Vaucher" > To: "Lucene Users List" > Sent: Wednesday, September 01, 2004 10:59 AM > Subject: Re: indexing size > > > > On Wed, 1 Sep 2004, Niraj Alok wrote > > > I was also thinking on the same lines. > > > Actually the original code was written by some one else who has left and > so > > > I have to own this. > > > > > > At almost all the places, it is Field.Text and at some few places its > > > Field.UnIndexed. > > > I looked at the javadocs and found that there is Field.UnStored also. > > > > > > The problem is I am not too sure which one to change to what. It would > be > > > really enlightening if you could point the differences > > > between those three and what would I need to change in my search code. > > > > > > If I make some of them Field.Unstored, I can see from the javadocs that > > > it will be indexed and tokenized but not stored. If it is not stored, > > > how can I use it while searching? Basically what is meant by indexed and > > > stored, indexed and not stored and not indexed and stored? > > > > If all you need is to seach a field, you do not need to store it. If it is > > not stored it can still be tokenised and analysed by lucene. It will then > > be only stored as a set of token, but not as whole. You can thus use it > > for fields that you never need to retrieve from the index. > > > > For example: > > the quick brown fox jumped over the lazy dog. > > > > will be store in lucene only as tokens, not as a whole, so using a > > whitespace analyser using a stopword list {the}: > > > > You will have these tokens in lucene: > > quick > > brown > > fox > > jumped > > over > > dog > > > > You will NOT be able to retrieve the original text, but you will be able > > to search it. > > > > HTH, > > sv > > > > > > > > Regards, > > > Niraj > > > ----- Original Message ----- > > > From: "petite_abeille" > > > To: "Lucene Users List" > > > Sent: Tuesday, August 31, 2004 8:57 PM > > > Subject: Re: indexing size > > > > > > > > > > > > > > On Aug 31, 2004, at 17:17, Otis Gospodnetic wrote: > > > > > > > > > You also have a large number of > > > > > fields, and it looks like a lot (all?) of them are stored and > indexed. > > > > > That's what that large .fdt file indicated. That file is > 206 MB > in > > > > > size. > > > > > > > > Try using Field.UnStored() to avoid storing all those data in your > > > > indices as it's usually not necessary. > > > > > > > > PA. > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org