lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amigo@max3d.com" <am...@max3d.com>
Subject Re: How to index Windows' Compiled HTML Help (CHM) Format
Date Sun, 12 Dec 2004 03:31:50 GMT
I suggest you look at the chmlib at: 
http://66.93.236.84/~jedwin/projects/chmlib/

-pedja


Tom said the following on 12/11/2004 11:20 AM:

>Hi,
>
>Does anybody know how to index chm-files? 
>A possible solution I know is to convert chm-files to pdf-files (there are
>converters available for this job) and then use the known tools (e.g.
>PDFBox) to index the content of the pdf files (which contain the content of
>the chm-files). Are there any tools which can directly grab the textual
>content out of the (binary) chm-files?
>
>I think chm-file indexing-support is really a big missing piece in the
>currently supported indexable filetype-collection (XML, HTML, PDF,
>MSWord-DOC, RTF, Plaintext). 
>
>WBR,
>Tom.  
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message