Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 84516 invoked from network); 1 Aug 2004 22:08:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 1 Aug 2004 22:08:53 -0000 Received: (qmail 36081 invoked by uid 500); 1 Aug 2004 22:08:51 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 35943 invoked by uid 500); 1 Aug 2004 22:08:50 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 35930 invoked by uid 99); 1 Aug 2004 22:08:50 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [207.162.57.26] (HELO hercule.cirano.qc.ca) (207.162.57.26) by apache.org (qpsmtpd/0.27.1) with ESMTP; Sun, 01 Aug 2004 15:08:47 -0700 Received: from localhost (vauchers@localhost) by hercule.cirano.qc.ca (8.11.6/8.11.0) with ESMTP id i71M8e908948 for ; Sun, 1 Aug 2004 18:08:40 -0400 X-Authentication-Warning: hercule.cirano.qc.ca: vauchers owned process doing -bs Date: Sun, 1 Aug 2004 18:08:39 -0400 (EDT) From: Stephane James Vaucher To: Lucene Developers List Subject: RE: Powerpoint search using Lucene In-Reply-To: <200407291249.i6TCng203896@smtp-mclean.mitre.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I've seen a post on poi-user list with some more code. The links have been added to the wiki. http://wiki.apache.org/jakarta-lucene/PowerPoint sv On Thu, 29 Jul 2004, Divya S. Jesuraj wrote: > The second link - does things a bit differently than one would expect. > > It creates multiple files "1.txt", "2.txt", so on, extracts the text and > keeps it only in "1.txt" and doesn't save the name of the initial powerpoint > file so it can't link to it when you search for it. > > What would be ideal is to extract the powerpoint text into an object > {String?} and create a Lucene Doc that would add it to the index... > > I have been playing with the idea of using the code by Mr.Koundinya and > somehow storing those contents to a string object which then got added as > "content" to the Lucene Doc. The file name ( .ppt ) and path would get added > too...will let you folks know how it goes... > > ~Divya > > -----Original Message----- > From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca] > Sent: Wednesday, July 28, 2004 11:41 PM > To: Lucene Developers List > Subject: Re: Powerpoint search using Lucene > > I haven't, I've found a few link though... > > I just saw this on the poi list. I can't confirm if it works or not (if > you try it, can you tell us) > > http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04782.html > > This is a reference to some code that I found works on some ppts: > http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.o > rg&msgNo=4326 > > sv > > On Wed, 28 Jul 2004, Divya S. Jesuraj wrote: > > > Hello, > > > > I am a VERY new Java Programmer and have now been thrust into development > > using Lucene. I was able to figure out parsing/indexing of MS Word, MS > > Excel, RTF, Text files, and PDFs with a lot of reading and using Poi& PDF > > Sandbox. I however haven't been able to do anything with PPTs [or htmls - > > that is the least of my worries]... > > > > I am indexing a directory on my machine and have a user interface with a > > JSP. Has anyone figured out how to get a Powerpoint search to work? I > > searched the forums but I can't find anything that would help my > situation. > > Some sample code would be appreciated. > > > > Thank you. > > > > ~Divya Jesuraj > > Technical Summer Intern 2004 > > MITRE Corporation > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org