lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to create document objects in our case
Date Sun, 22 May 2011 10:58:09 GMT
30 fields is fine, but if they are all indexed you should watch out
for memory usage.  Ie, norms require 1 byte per doc per indexed field.
 If you don't need norms (don't boost, lengths don't vary much or you
don't care to have field length impact scoring) you can omit norms.

The relationship b/w Group and Subgroup is not something Lucene can do
today -- all docs are "independent" in a Lucene index.

So.. you can either "denormalize", meaning duplicate all Group fields
onto each subgroup, or you can check out
https://issues.apache.org/jira/browse/LUCENE-2454 which I think adds
exactly what you need.  It's not yet committed, but we are finally
starting to make progress towards that, eg
https://issues.apache.org/jira/browse/LUCENE-3112

Mike

http://blog.mikemccandless.com

On Fri, May 20, 2011 at 8:27 PM, Cheng Zhou <zhoucheng2008@gmail.com> wrote:
> Hi,
>
> I have a large number of XML files to be indexed by Lucene. All the files
> share similar structure as below:
>
> <Group id="abc" member="cde" blah blah ....>
>   <Subgroup id="abc1" member ="fgh" blah blah ...>
>   <Subgroup id="abc2" member ="fgh" blah blah ...>
>   <Subgroup id="abc3" member ="fgh" blah blah ...>
>   ......
> </Group>
>
> Things to be noted are:
>
> The root element of Group has 30 or so attributes, and it usually has over
> 2000 Subgroup elements, which in turn also have more than 20 attributes.
>
> I want to create one Document object which holds the contents of the Group
> element, and one Document object which holds all the Subgroup elements.
>
> Here are my challenges however:
>
> 1. How many fields are advised for a Document to be indexed by Lucene? Will
> over 30 fields (for the Group element) be too many?
>
> 2. How to create a Document object and fields for holding all the Subgroup
> elements? Is this a good way to think of?
>
> 3. How can I link the Document object of the Group element to the Document
> object of all the Subgroup elements?
>
> Please note that I intend to use such two Document objects to achieve the
> group while I don't know whether it is a good solution or not. I am open to
> using more than two Documents to do the job, but I don't know how to connect
> all the objects in Lucene.
>
> Many thanks!
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message