Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 64019 invoked from network); 10 Nov 2004 14:47:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Nov 2004 14:47:26 -0000 Received: (qmail 85025 invoked by uid 500); 10 Nov 2004 14:47:02 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 84901 invoked by uid 500); 10 Nov 2004 14:47:01 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 84630 invoked by uid 99); 10 Nov 2004 14:46:58 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [216.136.173.238] (HELO web12701.mail.yahoo.com) (216.136.173.238) by apache.org (qpsmtpd/0.28) with SMTP; Wed, 10 Nov 2004 06:46:53 -0800 Received: (qmail 68117 invoked by uid 60001); 10 Nov 2004 14:46:52 -0000 Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; b=Nz0PRFOXAPiXNkJp9YyuoSb7amvdWS3TPlZpS//rTaZAn3lDnUddvm5IV1m5GZycLoc6OjULwAlGOCdX33wMwj2F0uM98W8jM31TUFHLFepugSvWfgcvOLnJI0m4B0kT0xkC659La4GRDox1cx/VsRhY5bcOLfLKmjgupqPe1dI= ; Message-ID: <20041110144651.68115.qmail@web12701.mail.yahoo.com> Received: from [216.194.17.194] by web12701.mail.yahoo.com via HTTP; Wed, 10 Nov 2004 06:46:51 PST Date: Wed, 10 Nov 2004 06:46:51 -0800 (PST) From: Otis Gospodnetic Subject: Re: Indexing within an XML document To: lucene-user@jakarta.apache.org Cc: m.altheim@open.ac.uk In-Reply-To: <418F60BD.6040609@open.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Redirecting to lucene-user, which is more appropriate. I'm not sure what exactly the question is here, but: Parse your XML document and for each

element you encounter create a new Document instance, and then populate its fields with some data, like the URI data you mentioned. If you parse with DOM - just walk the node tree and make new Document whenever you encounter an element you want as a separate Document. If you are using the SAX API you'll probably want some logic in start/endElement and characters methods. When you reach the end of the element you are done with your Document instance, so add it to the IndexWriter instance that you opened once, before the parser. When you are done with the whole XML document close the IndexWriter. Otis --- Murray Altheim wrote: > Hi, > > I'm trying to develop a class to handle an XML document, where > the contents aren't so much indexed on a per-document basis, > rather on an element basis. Each element has a unique ID, so > I'm looking to create a class/method similar to Lucene's > Document.Document(). By way of example, I'll use some XHTML > markup to illustrate what I'm trying to do: > > > > [...] > >

> some text to index... >

>

> some more text to index... >

>

> even more text to index... >

> > > > I'd very much appreciate any help in explaining how I'd go about > creating a method to return a Lucene Document to index this via > ID. Would I want a separate Document per

? (There are many > thousands of such elements.) Everything in my system, both at the > document and the individual element level is done via URL, so > the method should create URLs for each

element like > > http://purl.org/ceryle/blat.xml#p1 > http://purl.org/ceryle/blat.xml#p2 > http://purl.org/ceryle/blat.xml#p3 > etc. > > I don't need anyone to go to the trouble of coding this, just point > me to how it might be done, or to any existing examples that do this > kind of thing. > > Thanks very much! > > Murray > > ...................................................................... > Murray Altheim > http://kmi.open.ac.uk/people/murray/ > Knowledge Media Institute > The Open University, Milton Keynes, Bucks, MK7 6AA, UK > . > > "If we can just get the people that can reconcile themselves > to the new dispensation out of the way and then kill the few > thousand people who can't reconcile themselves, then we can > let the remaining 98 percent come back and live out their > lives," Pike said. "If we bomb the place to the ground, those > peace-loving people won't have a home to live in. [...] If we > simply pulverize the city, it would look bad on TV." -- John Pike > > U.S., Iraqi troops mass for assault on Fallujah > STRATEGY: U.S. to employ snipers, robots to cut down casualties > Matthew B. Stannard, San Francisco Chronicle > > http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2004/11/06/MNGHL9NBU11.DTL > > "We have a growing, maturing insurgency group. We see larger > and more coordinated military attacks. They are getting better > and they can self-regenerate. The idea there are x number of > insurgents, and that when they're all dead we can get out is > wrong. The insurgency has shown an ability to regenerate itself > because there are people willing to fill the ranks of those who > are killed. The political culture is more hostile to the US > presence. The longer we stay, the more they are confirmed in > that view." -- W Andrew Terrill > > Far Graver Than Vietnam, Sidney Blumenthal, The Guardian > http://www.guardian.co.uk/comment/story/0,,1305360,00.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org