lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Zhou <>
Subject How to create document objects in our case
Date Sat, 21 May 2011 00:27:11 GMT

I have a large number of XML files to be indexed by Lucene. All the files
share similar structure as below:

<Group id="abc" member="cde" blah blah ....>
   <Subgroup id="abc1" member ="fgh" blah blah ...>
   <Subgroup id="abc2" member ="fgh" blah blah ...>
   <Subgroup id="abc3" member ="fgh" blah blah ...>

Things to be noted are:

The root element of Group has 30 or so attributes, and it usually has over
2000 Subgroup elements, which in turn also have more than 20 attributes.

I want to create one Document object which holds the contents of the Group
element, and one Document object which holds all the Subgroup elements.

Here are my challenges however:

1. How many fields are advised for a Document to be indexed by Lucene? Will
over 30 fields (for the Group element) be too many?

2. How to create a Document object and fields for holding all the Subgroup
elements? Is this a good way to think of?

3. How can I link the Document object of the Group element to the Document
object of all the Subgroup elements?

Please note that I intend to use such two Document objects to achieve the
group while I don't know whether it is a good solution or not. I am open to
using more than two Documents to do the job, but I don't know how to connect
all the objects in Lucene.

Many thanks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message