Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23887 invoked from network); 18 Apr 2009 17:02:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Apr 2009 17:02:09 -0000 Received: (qmail 38933 invoked by uid 500); 18 Apr 2009 17:02:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 38850 invoked by uid 500); 18 Apr 2009 17:02:07 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 38840 invoked by uid 99); 18 Apr 2009 17:02:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Apr 2009 17:02:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 74.125.92.25 as permitted sender) Received: from [74.125.92.25] (HELO qw-out-2122.google.com) (74.125.92.25) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Apr 2009 17:01:58 +0000 Received: by qw-out-2122.google.com with SMTP id 5so439953qwd.53 for ; Sat, 18 Apr 2009 10:01:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=yC6yg/PaDRpZSS4CiWLypYUDH7/KQiTlm2o1z8pr6No=; b=MVHcAKhRhfYzEh0tejaxVyK0573u7Uu8RtsKaBOFngA1VvNmpfsWvm+K4+zQW4mTn5 PIK72ZiGsI1V9cWytyxOmhopihGr2JbRSwLuVXqAEEjNe+0NCPsDjZ+aytyM+DNt8xJj HCYmUN3OTtwrNT2LXPbECOi/6BSeWj8vPkgwY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=B0/+Ps5ZBjXgvRxJwP5lnWk0RN2b46RTeGeEl7sMScJ+NM7acyTeOyUfW717Qh17XI aq9b/CrEhoLt1New+KjTsnAnbed3WoKmBXAq1B7B0Uth9QGct8hnsn7D+UcNtPk2hY27 sVNckOHxbucKO6NH6hrlcS3Fg8ItI9xY8laf8= MIME-Version: 1.0 Received: by 10.220.92.21 with SMTP id p21mr4242529vcm.47.1240074096449; Sat, 18 Apr 2009 10:01:36 -0700 (PDT) In-Reply-To: <73526.84423.qm@web76013.mail.sg1.yahoo.com> References: <73526.84423.qm@web76013.mail.sg1.yahoo.com> Date: Sat, 18 Apr 2009 13:01:36 -0400 Message-ID: <359a92830904181001l97da9a6gefbb9d741140786@mail.gmail.com> Subject: Re: Indexing Complex XML From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6470ece03c78e0467d74071 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6470ece03c78e0467d74071 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Lucene is an *engine*, not an application. *You* have to process the XML, decide what the structure of your index is and index the data. There are many XML parser options, this is just straight Java code. You'll decide what's relevant, and add the contents of the relevant elements to a Lucene document then add that to your index. Similarly for searching. So, say you have the following simple XML doc ele 1 text ele 2 text You'd have to parse that text, then, say, add (semi-pseudo-code) Document doc = new Document() doc.add(new Field("ele1field", "ele 1 text", StoreOPtion, IndexOption); doc.add(new Field("ele2field", "ele 2 text", StoreOption, IndexOption); writer.add(doc); Then at search time you'd form your queries on "ele1field" and ele2field". HTH Erick On Sat, Apr 18, 2009 at 11:19 AM, daniel susanto wrote: > Hi, > > I need advise or example to index complex XML file, I mean the XML note > just in one level node but more than one. for example indexing rss or atom. > > thx b4. > Daniel Susanto > http://susantodaniel.wordpress.com > > > --0016e6470ece03c78e0467d74071--