lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: How to index Windows' Compiled HTML Help (CHM) Format
Date Sun, 12 Dec 2004 03:31:50 GMT
I suggest you look at the chmlib at:


Tom said the following on 12/11/2004 11:20 AM:

>Does anybody know how to index chm-files? 
>A possible solution I know is to convert chm-files to pdf-files (there are
>converters available for this job) and then use the known tools (e.g.
>PDFBox) to index the content of the pdf files (which contain the content of
>the chm-files). Are there any tools which can directly grab the textual
>content out of the (binary) chm-files?
>I think chm-file indexing-support is really a big missing piece in the
>currently supported indexable filetype-collection (XML, HTML, PDF,
>MSWord-DOC, RTF, Plaintext). 
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message