lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Becker <pe...@peterbecker.de>
Subject Re: Bridge with OpenOffice
Date Tue, 20 Apr 2004 04:25:06 GMT
We did a simple one a while ago. Could probably be a bit more 
sophisticated, but it seems to do it job on the little bit of testing we 
did.

See 
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/source/org/tockit/docco/documenthandler/OpenOfficeDocumentHandler.java?rev=1.4&view=auto

HTH,
  Peter


PS: sorry for the broken whitespace -- I just noticed that myself.


Tatu Saloranta wrote:

>On Monday 19 April 2004 14:01, Mario Ivankovits wrote:
>  
>
>>Stephane James Vaucher wrote:
>>    
>>
>>>Anyone try what Joerg suggested here?
>>>http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.a
>>>pache.org&msgNo=6231
>>>      
>>>
>>Dont know what you would like to do, but if you simply would like to
>>extract text, you could simply try this sniplet:
>>    
>>
>
>This leads to question I was thinking; it seems that originally this thread 
>started by someone pointing that OO can be used as converter from other 
>formats... but how about tokenizer for native OO documents? I have written 
>full-featured converters from OO to (simplified) DocBook and HTML, and 
>creating one for just tokenizing to be used by Lucene would be much easier. 
>Even if it would tokenize into separate fields (document metadata, content, 
>maybe bibliography separately etc), it'd be easy to do.
>
>Would anyone find full-featured, customizable OpenOffice document tokenizer 
>useful?
>
>-+ Tatu +-
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message