lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <>
Subject Re: Filters for Openoffice File Indexing available (Java)
Date Wed, 10 Nov 2004 14:05:43 GMT
On Monday 08 November 2004 11:30, Joachim Arrasz wrote:

> So now we are looking for search and index Filters for Lucene, that
> weÂŽre able to integrate out OpenOffice Files also into search result.

I don't know of any existing solutions, but it's not so difficult to write 
one: Extract the ZIP file using Java's built-in ZIP classes and parse 
content.xml and meta.xml. I'm not sure if whitespace issues might become 
tricky, e.g. two paragraphs could be in the file as 
"<p>one</p><p>two</p>", but for indexing a whitespace needs to be
between them ("<p>" was just an example, I don't know what 
actually uses).



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message