lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: Indexing MS Powerpoint files with Lucene
Date Thu, 07 Sep 2006 12:04:45 GMT
On Thu, 7 Sep 2006, Tomi NA wrote:
> On 9/7/06, Venkateshprasanna <prasannahmv@yahoo.co.in> wrote:
>> Is there any filter available for extracting text from MS Powerpoint files
>> and indexing them?
>> The lucene website suggests the POI project, which, it seems does not
>> support PPT files as of now.
>
> http://jakarta.apache.org/poi/hslf/index.html
>
> It doesn't say poi doesn't support ppt. It just says support is limited. 
> Don't know exactly how limited, but certainly not useless for indexing 
> purposes.

Support for editing and adding things to PowerPoint files is limited, as 
is getting out the finer points of fonts and positioning.

Getting text out should "just work" for you. The only thing you'll need to 
decide is if you want hslf.PowerPointExtractor to give you slide and notes 
text, or just slide text :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message