lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jim shirreffs" <>
Subject Indexing help needed
Date Fri, 25 May 2007 17:12:58 GMT
I've been working on this for a while, I am trying to get the demo code that 
comes with Lucene to index OpenOffice documentss. I've looked at LIUS code 
and at Nutch code. But can't find an easy way. So I am digging into the 

I wrote a KcmiDocument class that returns a Document. In it I do a doc.add() 
where I the specify "contents" and a FileReader


* Add the contents of the file to a field named "contents". Specify a 

* so that the text of the file is tokenized and indexed, but not stored.

* Note that FileReader expects the file to be in the system's default 

* If that's not the case searching for special characters will fail.

* FileReader is the key, need to add the correct reader for none text 


doc.add(new Field("contents", new FileReader(f)));

Now if I could just add a file reader for OpenOffice say OOFileReader() that 
unzip and did all the dom stuff hen everything would work and the code 
changes would be minimal, right? My question is, am I correct in my 
thinking? And if so does any one know of an OOFileReader? If I am not 
correct what am I missing here. It is kind of important that I learn how to 
add different files types like OO or AutoCad, so we can make a build (with 
Lucene) or buy call.

Thanks to all that try to help me out

Jim S

P.S. If I get it working I will be happy to email post the code.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message