From lucene-user-return-1615-jakarta-archive-lucene-user=jakarta.apache.org@jakarta.apache.org Fri May 03 09:34:54 2002 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 25603 invoked from network); 3 May 2002 09:34:54 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 3 May 2002 09:34:54 -0000 Received: (qmail 13934 invoked by uid 97); 3 May 2002 09:35:05 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 13809 invoked by alias); 3 May 2002 09:35:04 -0000 Delivered-To: jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 13786 invoked by uid 97); 3 May 2002 09:35:04 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 13764 invoked by uid 98); 3 May 2002 09:35:03 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Date: Fri, 3 May 2002 11:35:10 +0200 Subject: Re: indexing PDF files Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v481) From: petite_abeille To: "Lucene Users List" Content-Transfer-Encoding: 7bit In-Reply-To: <20020501154156.36966.qmail@web12703.mail.yahoo.com> Message-Id: <0FF45A9C-5E79-11D6-BFCD-000393760B7E@mac.com> X-Mailer: Apple Mail (2.481) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Wednesday, May 1, 2002, at 05:41 PM, Otis Gospodnetic wrote: > Wouldn't you want to convert to XML instead and use XSLT to transform > the XML representation to any desired format by just applying a style > sheet? > Sounds like less work with bigger document type coverage. Sounds good... But what does it mean? I'm not that familiar with any of the XML, XSLT hype so I don't really understand what you are getting at... I just want to convert any type of document to text for indexing purpose... I'm not planning to do anything else with it... However, converting everything to PDF as a first step allow you to provide a "preview" of any documents even if you happen not to understand the original format (eg MS Office)... PA -- To unsubscribe, e-mail: For additional commands, e-mail: