Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 35752 invoked from network); 24 Aug 2002 05:42:28 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 24 Aug 2002 05:42:28 -0000 Received: (qmail 7250 invoked by uid 97); 24 Aug 2002 05:43:04 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 7206 invoked by uid 97); 24 Aug 2002 05:43:04 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 7194 invoked by uid 98); 24 Aug 2002 05:43:03 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Message-ID: <3D671B16.3020708@robosoftin.com> Disposition-Notification-To: Pradeep Kumar Date: Sat, 24 Aug 2002 11:05:18 +0530 From: Pradeep Kumar K User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0rc2) Gecko/20020512 Netscape/7.0b1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Parsers References: Content-Type: multipart/alternative; boundary="------------030401090706060706070604" X-Mailserver: Sent using PostMaster (v4.1.09) X-Loop-Detect: 1 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --------------030401090706060706070604 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Thanks joshua for information. SimpleText I mean't was 'Text' file -Pradeep Joshua O'Madadhain wrote: >On Sat, 24 Aug 2002, Pradeep Kumar K wrote: > > > >>Hi friends >> >>I need parsers for the following file formats >>1. HTML >>2. PDF >>3. MSWord >>4. RTF >>4. Simple text >> >>Do any body developed parsers( in java) for all/any of the file formats? >>If you have please tell me the links so that I can download. >> >> > >A simple HTML parser is part of the download package (one of the >examples). Check the contrib section on the Lucene web page; I believe a >couple of different PDF parsers are there, and perhaps others. > >Not sure what you mean by a "simple text" parser. Do you mean something >more complicated than what you can do with StringTokenizer? > >Joshua O'Madadhain > > jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden > Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall > It's that moment of dawning comprehension that I live for--Bill Watterson >My opinions are too rational and insightful to be those of any organization. > > > > >-- >To unsubscribe, e-mail: >For additional commands, e-mail: > > > --------------030401090706060706070604--