Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 77066 invoked from network); 18 Apr 2009 19:25:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Apr 2009 19:25:09 -0000 Received: (qmail 15161 invoked by uid 500); 18 Apr 2009 19:25:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 15076 invoked by uid 500); 18 Apr 2009 19:25:07 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15066 invoked by uid 99); 18 Apr 2009 19:25:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Apr 2009 19:25:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of digydigy@gmail.com designates 72.14.220.157 as permitted sender) Received: from [72.14.220.157] (HELO fg-out-1718.google.com) (72.14.220.157) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Apr 2009 19:24:57 +0000 Received: by fg-out-1718.google.com with SMTP id l27so249788fgb.4 for ; Sat, 18 Apr 2009 12:24:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:message-id:mime-version:content-type :content-transfer-encoding:x-mailer:thread-index:content-language; bh=ipZOxaW+8xHPIVm/Rf527yammtEBGTqvrRrzXLQoZzM=; b=RMe6uJab7PGTfbSPz5tYbrZz5LPQYGdhsPpuH+ITdWJt1rHIzpIkrYxW6w68F5u+l2 3vFudXGDzFsy7WA08Ei1eV5+f55ZxDIBvAWcvFMqBIIbgmR0jJM1gM8HXSGQYQMnDV4T Gn/5s/peBMYJfs+Zm1ZepSGUznht3Vl+oex9M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; b=hAjnyPmusbOXY5qpdv4NZmCFezx2Bo/DAKl/lTm3XQVpL8Pr0NzUk0iZo0Cs6zp0AL kp48Ibr1i5WBdeA0EfhNem/fot5hbElISAzRmZRcG/CV055TkMNY3Fu/BL7Yq/XDUide rqxYlYKZkmWoPoNCvatzCvMF6Bn5sllLIxDlQ= Received: by 10.86.65.9 with SMTP id n9mr2883341fga.47.1240082675417; Sat, 18 Apr 2009 12:24:35 -0700 (PDT) Received: from NEWPC ([81.213.206.230]) by mx.google.com with ESMTPS id 3sm239918fge.5.2009.04.18.12.24.34 (version=SSLv3 cipher=RC4-MD5); Sat, 18 Apr 2009 12:24:35 -0700 (PDT) From: "Digy" To: References: <407281.7693.qm@web76002.mail.sg1.yahoo.com> In-Reply-To: <407281.7693.qm@web76002.mail.sg1.yahoo.com> Subject: RE: Indexing Complex XML Date: Sat, 18 Apr 2009 22:25:21 +0300 Message-ID: <001f01c9c05b$6b0f94f0$412ebed0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-9" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 thread-index: AcnAUOPjI5imlu4wRWig5aLGXJQ7fgACZYgw Content-Language: tr X-Virus-Checked: Checked by ClamAV on apache.org doc.add(new Field("authors", "name1 surname1 name2 surmane2", = StoreOption, IndexOption);=20 So you can make a search like=20 authors:"name1 surname1" (Disadvantage: you will also get result with a search like = authors:"surname1 name2" ) DIGY -----Original Message----- From: Daniel Susanto [mailto:daniel_sus777@yahoo.com]=20 Sent: Saturday, April 18, 2009 9:09 PM To: java-user@lucene.apache.org Subject: Re: Indexing Complex XML Thanks Erick, In more complex xml I mean, for example this xml: Lucene Book Book author 1 Book author 2 Book for Lucene Lucene Book 2 Book 2 author 1 Book 2 author 2 Book 2 for Lucene for each 'book' node is handled by one Document rite? and now how to handle the 'authors' node? should I put in new Document? or how? thx. :) Daniel Daniel Susanto http://susantodaniel.wordpress.com --- On Sun, 4/19/09, Erick Erickson wrote: From: Erick Erickson Subject: Re: Indexing Complex XML To: java-user@lucene.apache.org Date: Sunday, April 19, 2009, 12:01 AM Lucene is an *engine*, not an application. *You* have to process the XML, decide what the structure of your index is and index the data. = There are many XML parser options, this is just straight Java code. You'll decide what's relevant, and add the contents of the relevant elements to a = Lucene document then add that to your index. Similarly for searching. So, say you have the following simple XML doc =A0=A0=A0ele 1 text =A0=A0=A0ele 2 text You'd have to parse that text, then, say, add (semi-pseudo-code) Document doc =3D new Document() doc.add(new Field("ele1field", "ele 1 text", StoreOPtion, IndexOption); doc.add(new Field("ele2field", "ele 2 text", StoreOption, IndexOption); writer.add(doc); Then at search time you'd form your queries on "ele1field" and = ele2field". HTH Erick On Sat, Apr 18, 2009 at 11:19 AM, daniel susanto wrote: > Hi, > > I need advise or example to index complex XML file, I mean the XML = note > just in one level node but more than one. for example indexing rss or atom. > > thx b4. > Daniel Susanto > http://susantodaniel.wordpress.com > > > =20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org