Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 18177 invoked from network); 14 Feb 2003 14:39:00 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 14 Feb 2003 14:39:00 -0000 Received: (qmail 24681 invoked by uid 97); 14 Feb 2003 14:40:31 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 24674 invoked from network); 14 Feb 2003 14:40:31 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 14 Feb 2003 14:40:31 -0000 Received: (qmail 17482 invoked by uid 500); 14 Feb 2003 14:38:50 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 17433 invoked from network); 14 Feb 2003 14:38:49 -0000 Received: from cosmos.phpwebhosting.com (66.33.60.222) by daedalus.apache.org with SMTP; 14 Feb 2003 14:38:49 -0000 Received: (qmail 24715 invoked by uid 508); 14 Feb 2003 14:37:13 -0000 Received: from unknown (HELO greenninja.com) (66.25.140.5) by cosmos.phpwebhosting.com with SMTP; 14 Feb 2003 14:37:13 -0000 Message-ID: <3E4CFF75.2000206@greenninja.com> Date: Fri, 14 Feb 2003 08:38:45 -0600 From: Jeff Linwood User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Indexing XML with Lucene References: <63DF0F1EBF89D21191B30090271F42793F5DC9@asterix.intrapeople.lu> In-Reply-To: <63DF0F1EBF89D21191B30090271F42793F5DC9@asterix.intrapeople.lu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi, To use Lucene, you will have to have some way of creating Lucene Document objects, which you can then add to a Lucene index. Most of the translation work is custom, because everyone's got different XML DTD's and schemas. There are examples of indexing XML with DOM and SAX for Lucene in the Lucene Sandbox. There are probably a few steps you will need to take: 1) Figure out how your XML Schema or DTD maps to a Lucene Document - which XML elements are going to be which Lucene Fields, and how are they going to be indexed and stored? 2) Write a JUnit test that will be used to test your document conversion utility. It should take your XML documents, run them through your converter, and then check the fields in the Lucene document to make sure they are what you want. Do this for each type of XML document that you have. 3) Write conversion code that translates an XML document to a Lucene document. Do this for each type of XML document that you have. 4) Write an indexer utility that goes through your XML database and feeds XML documents throught the conversion utility and then into the Lucene indexer. I might be forgetting one or two steps here :) Jeff Pierre Lacchini wrote: >Hello, > >I'm using Lucene, and I need to index an XML Database (Tamino). >How can I do that ? Do i have to use an XML parser as Digester ? > >I'm kinda noob with Lucene, and I really need help ;) > >Thx !Pierre Lacchini >Consultant d�veloppement > >PeopleWare >12, rue du Cimeti�re >L-8413 Steinfort >Phone : + 352 399 968 35 >http://www.peopleware.lu > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org