lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: enquiries - pls help, thanks
Date Wed, 02 Feb 2005 11:10:39 GMT

On Feb 2, 2005, at 2:40 AM, jac jac wrote:
> May I know whether Lucene currently supports indexing of xml documents?

That's a loaded question.  Lucene "supports" it by being able to index 
text, sure.  But Lucene does not include an XML parser and the facility 
to automatically turn an XML file into a Lucene document, nor would you 
want that.  For example - in my current project, I'm parsing XML 
documents, and indexing pieces of them individually as Lucene Documents 
- in fact I'm doing that in all kinds of various ways too.

The demo applications that you've tried are not designed for anything 
but a very very basic demonstration of how to use Lucene - these 
example applications were never intended to be used as-is for anything 
other than some code you could borrow and learn from to build your own 
custom solutions.

If you want a quick jump on processing XML with Lucene, try out the 
code that comes with Lucene in Action (grab it from  When you get the code, run this:

$ ant ExtensionFileHandler
Buildfile: build.xml


      [echo]       This example demonstrates the file extension document 
      [echo]       Documents with extensions .xml, .rtf, .doc, .pdf, 
.html, and .txt are
      [echo]       all handled by the framework.  The contents of the 
Lucene Document
      [echo]       built for the specified file is displayed.
     [input] Press return to continue...

     [input] File: [src/lia/handlingtypes/data/HTML.html]
      [echo] Running lia.handlingtypes.framework.ExtensionFileHandler...
      [java] log4j:WARN No appenders could be found for logger 
      [java] log4j:WARN Please initialize the log4j system properly.
      [java] Document<Keyword<type:business> Keyword<name:SAMOFIX 
d.o.o.> Keyword<address:Ilica 47-2> Keyword<city:Zagreb> 
Keyword<province:> Keyword<postalcode:10000> Keyword<country:Croatia> 
Keyword<telephone:+385 1 123 4567>>

Total time: 18 seconds

Note that I typed in the path to an XML file where it asks for [input]. 
  Now dig into the source tree and borrow what you need from 


> I tried building an index to index all my directories in webapps:
> via:
> java org.apache.lucene.demo.IndexFiles /homedir/tomcat/webapps
> then I tried using the following command to search:
> java org.apache.lucene.demo.SearchFiles
> and i typed in my query. I was able to see the files which directs me 
> the path which holds my data.
> However, when I do
> java org.apache.lucene.demo.IndexHTML -create -index /homedir/index ..
> and I went to my website I realised it can't serach for the data I 
> wanted instead.
> I want to search data within XML documents... May I know if the 
> current demo version allows indexing of XML documents?
> Why is it that after I do "java org.apache.lucene.demo.IndexHTML 
> -create -index /homedir/index .." then the data I wanted can't be 
> searched? thanks alot!
> jac
>  Yahoo! Mobile
> - Download the latest ringtones, games, and more!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message