lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Arrasz <i...@arrasz.de>
Subject Re: Filters for Openoffice File Indexing available (Java)
Date Wed, 10 Nov 2004 14:18:32 GMT
Hi Daniel,

>I don't know of any existing solutions, but it's not so difficult to write 
>one: Extract the ZIP file using Java's built-in ZIP classes and parse 
>content.xml and meta.xml. I'm not sure if whitespace issues might become 
>tricky, e.g. two paragraphs could be in the file as 
>"<p>one</p><p>two</p>", but for indexing a whitespace needs to
be inserted 
>between them ("<p>" was just an example, I don't know what OpenOffice.org 
>actually uses).
>  
>
that seems to be not so hard, but i never have developed something like 
that, so i think i need a tutorial doing this. Why should i parse 
meta.xml? I thaught content.xml should be enough.

Thanks a lot

Bye Achim


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message