lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to create document objects in our case
Date Sun, 22 May 2011 12:08:53 GMT
Norms is how Lucene records what the apriori boost is for each
docXfield.  This boost is the product of per-field boost, per-doc
boost (both of which your app would set when it creates the doc), as
well as the "length normalization" Lucene's default similarity applies
(shorter docs have higher boost).

It's a quantized float, encoded as 8 bits.

If your fields tend not to vary in length much, and you don't use
boosting yourself, and you are worried about RAM, you should omit the
norms.  (Field has a method to omit norms).

Not sure when nested docs will be available but I hope soon... even
so, once it's committed, it won't be released until the next release
(though you can use it on the trunk/3.x tip as well, once it's
committed).

Mike

http://blog.mikemccandless.com

On Sun, May 22, 2011 at 7:49 AM, zhoucheng2008 <zhoucheng2008@gmail.com> wrote:
> Mike, thanks for reply.
>
> Can you please elaborate a little bit more on " If you don't need norms
> (don't boost, lengths don't vary much or you
> don't care to have field length impact scoring) you can omit norms"?
>
> When do you expect the handling of nested document will be applicable?
>
> Cheng
>
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Sunday, May 22, 2011 6:58 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to create document objects in our case
>
> 30 fields is fine, but if they are all indexed you should watch out
> for memory usage.  Ie, norms require 1 byte per doc per indexed field.
>  If you don't need norms (don't boost, lengths don't vary much or you
> don't care to have field length impact scoring) you can omit norms.
>
> The relationship b/w Group and Subgroup is not something Lucene can do
> today -- all docs are "independent" in a Lucene index.
>
> So.. you can either "denormalize", meaning duplicate all Group fields
> onto each subgroup, or you can check out
> https://issues.apache.org/jira/browse/LUCENE-2454 which I think adds
> exactly what you need.  It's not yet committed, but we are finally
> starting to make progress towards that, eg
> https://issues.apache.org/jira/browse/LUCENE-3112
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Fri, May 20, 2011 at 8:27 PM, Cheng Zhou <zhoucheng2008@gmail.com> wrote:
>> Hi,
>>
>> I have a large number of XML files to be indexed by Lucene. All the files
>> share similar structure as below:
>>
>> <Group id="abc" member="cde" blah blah ....>
>>   <Subgroup id="abc1" member ="fgh" blah blah ...>
>>   <Subgroup id="abc2" member ="fgh" blah blah ...>
>>   <Subgroup id="abc3" member ="fgh" blah blah ...>
>>   ......
>> </Group>
>>
>> Things to be noted are:
>>
>> The root element of Group has 30 or so attributes, and it usually has over
>> 2000 Subgroup elements, which in turn also have more than 20 attributes.
>>
>> I want to create one Document object which holds the contents of the Group
>> element, and one Document object which holds all the Subgroup elements.
>>
>> Here are my challenges however:
>>
>> 1. How many fields are advised for a Document to be indexed by Lucene?
> Will
>> over 30 fields (for the Group element) be too many?
>>
>> 2. How to create a Document object and fields for holding all the Subgroup
>> elements? Is this a good way to think of?
>>
>> 3. How can I link the Document object of the Group element to the Document
>> object of all the Subgroup elements?
>>
>> Please note that I intend to use such two Document objects to achieve the
>> group while I don't know whether it is a good solution or not. I am open
> to
>> using more than two Documents to do the job, but I don't know how to
> connect
>> all the objects in Lucene.
>>
>> Many thanks!
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message