Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 78353 invoked from network); 8 Jun 2007 17:53:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jun 2007 17:53:15 -0000 Received: (qmail 9631 invoked by uid 500); 8 Jun 2007 17:53:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 9369 invoked by uid 500); 8 Jun 2007 17:53:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 9355 invoked by uid 99); 8 Jun 2007 17:53:11 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jun 2007 10:53:11 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of gresh@us.ibm.com designates 32.97.182.141 as permitted sender) Received: from [32.97.182.141] (HELO e1.ny.us.ibm.com) (32.97.182.141) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jun 2007 10:53:06 -0700 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l58HqjeR017433 for ; Fri, 8 Jun 2007 13:52:45 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l58HqjpC339688 for ; Fri, 8 Jun 2007 13:52:45 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l58Hqjof017924 for ; Fri, 8 Jun 2007 13:52:45 -0400 Received: from d01ml605.pok.ibm.com (d01ml605.pok.ibm.com [9.56.227.91]) by d01av02.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id l58HqjRj017921 for ; Fri, 8 Jun 2007 13:52:45 -0400 In-Reply-To: <00b701c7a9f1$bb4bada0$2e01a8c0@dorthy> To: java-user@lucene.apache.org MIME-Version: 1.0 Subject: Re: Indexing MSword Documents X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 From: Donna L Gresh Message-ID: Date: Fri, 8 Jun 2007 13:52:41 -0400 X-MIMETrack: Serialize by Router on D01ML605/01/M/IBM(Build V80_M5_05202007|May 20, 2007) at 06/08/2007 13:52:45, Serialize complete at 06/08/2007 13:52:45 Content-Type: multipart/alternative; boundary="=_alternative 006235C4852572F4_=" X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 006235C4852572F4_= Content-Type: text/plain; charset="US-ASCII" I do this exact thing. "text" (the second input to the Field constructor) is MSWord text that I've extracted from the Word document textField = new org.apache.lucene.document.Field(textFieldName,text, org.apache.lucene.document.Field.Store.NO, org.apache.lucene.document.Field.Index.TOKENIZED); doc.add(textField); Donna L. Gresh Services Research, Mathematical Sciences Department IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donnagresh gresh@us.ibm.com --=_alternative 006235C4852572F4_=--