From java-user-return-39805-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Sat Apr 18 18:10:01 2009 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 53607 invoked from network); 18 Apr 2009 18:10:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Apr 2009 18:10:01 -0000 Received: (qmail 82899 invoked by uid 500); 18 Apr 2009 18:09:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 82816 invoked by uid 500); 18 Apr 2009 18:09:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 82806 invoked by uid 99); 18 Apr 2009 18:09:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Apr 2009 18:09:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [203.188.202.97] (HELO n3a.bullet.mail.tp2.yahoo.com) (203.188.202.97) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 18 Apr 2009 18:09:48 +0000 Received: from [203.188.202.70] by n3.bullet.mail.tp2.yahoo.com with NNFMP; 18 Apr 2009 18:09:25 -0000 Received: from [124.108.115.243] by t1.bullet.mail.tp2.yahoo.com with NNFMP; 18 Apr 2009 18:09:24 -0000 Received: from [124.108.114.85] by t2.bullet.mail.sg1.yahoo.com with NNFMP; 18 Apr 2009 18:09:24 -0000 Received: from [127.0.0.1] by omp105.mail.sg1.yahoo.com with NNFMP; 18 Apr 2009 18:09:24 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 498833.80081.bm@omp105.mail.sg1.yahoo.com Received: (qmail 11115 invoked by uid 60001); 18 Apr 2009 18:09:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1240078164; bh=Tsu3WkBSN36qZ/8FuAT8m7tKNTXfhmmToxXsybxTKcg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=EAOPMlZxodrxaw7moxu+5DFu1WtgOX/ITn98n1lFVAVE9mT/sPqrQpxQjHvMbt8sg47kDUmpd6GvMyXxaNmGhEj9/LrUULN4L+PPE8E0F5eo355YJ45VF/GeqAddUH01YFOW6QQZZ2N0UWW+CiCFpghBwPjSd1Qec5oBz51E5sU= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=5WyUc0t1CFRrvtMXvjOVJTzf4SKeHWg1Ba4qd8xMqUEO17nCauwpX3ADiL/YihL2d88kIrDvgkaNTruaDQ3qQzvB4ajl63np7JPPfJi60XY32YTWwDSn5awIbguIhNIJdcx9rwI4/OWQJs7g5cOOBlUjCoXgNp/R1ZCZ2QWm4XA=; Message-ID: <407281.7693.qm@web76002.mail.sg1.yahoo.com> X-YMail-OSG: VnlsLngVM1ncZz3w.a0MIfdsZDwUqPyjepyxJrKZVdenc7TqjhAFC_PM9AlcmEnF5y7drQ0H6OOTS7z8dTGrOFdqtNJWG5K1T4f8ZevPyAhyFK8F6FxK6t30qU12V6G9umTvcGYawLZC67mEHHfmS92NYG5wiQ9rVaqNMuBlZgGiMKGYOANc3MAzF1EroW9G.Yb8LXpaR5r04b_ZpQYYfJXO_tOgb7LMv3dFKAPMh.6AGFhaiIvHJd_AVpmnyxmQOisSp2hFqQhko4cOH_CLTHo- Received: from [114.59.2.51] by web76002.mail.sg1.yahoo.com via HTTP; Sun, 19 Apr 2009 02:09:24 SGT X-Mailer: YahooMailClassic/5.2.18 YahooMailWebService/0.7.289.10 Date: Sun, 19 Apr 2009 02:09:24 +0800 (SGT) From: Daniel Susanto Subject: Re: Indexing Complex XML To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1682039705-1240078164=:7693" X-Virus-Checked: Checked by ClamAV on apache.org --0-1682039705-1240078164=:7693 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Erick, In more complex xml I mean, for example this xml: Lucene Book Book author 1 Book author 2 Book for Lucene =0ALucene Book 2 =0A =0ABook 2 author 1 =0ABook 2 author 2 =0A=0A =0ABook 2 for Lucene =0A=0A for each 'book' node is handled by one Document rite? and now how to handle the 'authors' node? should I put in new Document? or how? thx. :) Daniel Daniel Susanto http://susantodaniel.wordpress.com --- On Sun, 4/19/09, Erick Erickson wrote: From: Erick Erickson Subject: Re: Indexing Complex XML To: java-user@lucene.apache.org Date: Sunday, April 19, 2009, 12:01 AM Lucene is an *engine*, not an application. *You* have to process the XML, decide what the structure of your index is and index the data. There are many XML parser options, this is just straight Java code. You'll decide what's relevant, and add the contents of the relevant elements to a Lucene document then add that to your index. Similarly for searching. So, say you have the following simple XML doc =A0=A0=A0ele 1 text =A0=A0=A0ele 2 text You'd have to parse that text, then, say, add (semi-pseudo-code) Document doc =3D new Document() doc.add(new Field("ele1field", "ele 1 text", StoreOPtion, IndexOption); doc.add(new Field("ele2field", "ele 2 text", StoreOption, IndexOption); writer.add(doc); Then at search time you'd form your queries on "ele1field" and ele2field". HTH Erick On Sat, Apr 18, 2009 at 11:19 AM, daniel susanto w= rote: > Hi, > > I need advise or example to index complex XML file, I mean the XML note > just in one level node but more than one. for example indexing rss or ato= m. > > thx b4. > Daniel Susanto > http://susantodaniel.wordpress.com > > > =0A=0A=0A --0-1682039705-1240078164=:7693--