Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 352E1647E for ; Sun, 22 May 2011 13:36:23 +0000 (UTC) Received: (qmail 40170 invoked by uid 500); 22 May 2011 13:36:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40122 invoked by uid 500); 22 May 2011 13:36:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40114 invoked by uid 99); 22 May 2011 13:36:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 May 2011 13:36:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 May 2011 13:36:15 +0000 Received: by wwi18 with SMTP id 18so3636914wwi.5 for ; Sun, 22 May 2011 06:35:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.201.9 with SMTP id ey9mr1348307wbb.41.1306071354025; Sun, 22 May 2011 06:35:54 -0700 (PDT) Received: by 10.227.24.11 with HTTP; Sun, 22 May 2011 06:35:54 -0700 (PDT) In-Reply-To: <003201cc1883$11e7d450$35b77cf0$@com> References: <003101cc1876$6dd9f570$498de050$@com> <003201cc1883$11e7d450$35b77cf0$@com> Date: Sun, 22 May 2011 09:35:54 -0400 Message-ID: Subject: Re: How to create document objects in our case From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable You're welcome! Mike http://blog.mikemccandless.com On Sun, May 22, 2011 at 9:20 AM, zhoucheng2008 wr= ote: > Great, thanks Mike. > > -----Original Message----- > From: Michael McCandless [mailto:lucene@mikemccandless.com] > Sent: Sunday, May 22, 2011 8:09 PM > To: java-user@lucene.apache.org > Subject: Re: How to create document objects in our case > > Norms is how Lucene records what the apriori boost is for each > docXfield. =A0This boost is the product of per-field boost, per-doc > boost (both of which your app would set when it creates the doc), as > well as the "length normalization" Lucene's default similarity applies > (shorter docs have higher boost). > > It's a quantized float, encoded as 8 bits. > > If your fields tend not to vary in length much, and you don't use > boosting yourself, and you are worried about RAM, you should omit the > norms. =A0(Field has a method to omit norms). > > Not sure when nested docs will be available but I hope soon... even > so, once it's committed, it won't be released until the next release > (though you can use it on the trunk/3.x tip as well, once it's > committed). > > Mike > > http://blog.mikemccandless.com > > On Sun, May 22, 2011 at 7:49 AM, zhoucheng2008 > wrote: >> Mike, thanks for reply. >> >> Can you please elaborate a little bit more on " If you don't need norms >> (don't boost, lengths don't vary much or you >> don't care to have field length impact scoring) you can omit norms"? >> >> When do you expect the handling of nested document will be applicable? >> >> Cheng >> >> >> -----Original Message----- >> From: Michael McCandless [mailto:lucene@mikemccandless.com] >> Sent: Sunday, May 22, 2011 6:58 PM >> To: java-user@lucene.apache.org >> Subject: Re: How to create document objects in our case >> >> 30 fields is fine, but if they are all indexed you should watch out >> for memory usage. =A0Ie, norms require 1 byte per doc per indexed field. >> =A0If you don't need norms (don't boost, lengths don't vary much or you >> don't care to have field length impact scoring) you can omit norms. >> >> The relationship b/w Group and Subgroup is not something Lucene can do >> today -- all docs are "independent" in a Lucene index. >> >> So.. you can either "denormalize", meaning duplicate all Group fields >> onto each subgroup, or you can check out >> https://issues.apache.org/jira/browse/LUCENE-2454 which I think adds >> exactly what you need. =A0It's not yet committed, but we are finally >> starting to make progress towards that, eg >> https://issues.apache.org/jira/browse/LUCENE-3112 >> >> Mike >> >> http://blog.mikemccandless.com >> >> On Fri, May 20, 2011 at 8:27 PM, Cheng Zhou > wrote: >>> Hi, >>> >>> I have a large number of XML files to be indexed by Lucene. All the fil= es >>> share similar structure as below: >>> >>> >>> =A0 >>> =A0 >>> =A0 >>> =A0 ...... >>> >>> >>> Things to be noted are: >>> >>> The root element of Group has 30 or so attributes, and it usually has > over >>> 2000 Subgroup elements, which in turn also have more than 20 attributes= . >>> >>> I want to create one Document object which holds the contents of the > Group >>> element, and one Document object which holds all the Subgroup elements. >>> >>> Here are my challenges however: >>> >>> 1. How many fields are advised for a Document to be indexed by Lucene? >> Will >>> over 30 fields (for the Group element) be too many? >>> >>> 2. How to create a Document object and fields for holding all the > Subgroup >>> elements? Is this a good way to think of? >>> >>> 3. How can I link the Document object of the Group element to the > Document >>> object of all the Subgroup elements? >>> >>> Please note that I intend to use such two Document objects to achieve t= he >>> group while I don't know whether it is a good solution or not. I am ope= n >> to >>> using more than two Documents to do the job, but I don't know how to >> connect >>> all the objects in Lucene. >>> >>> Many thanks! >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org