Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 26010 invoked from network); 26 Mar 2007 12:28:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Mar 2007 12:28:45 -0000 Received: (qmail 88698 invoked by uid 500); 26 Mar 2007 12:28:42 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88671 invoked by uid 500); 26 Mar 2007 12:28:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88659 invoked by uid 99); 26 Mar 2007 12:28:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2007 05:28:42 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [212.226.92.15] (HELO monkey.teamware.com) (212.226.92.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2007 05:28:32 -0700 Received: from nimitz (nimitz.teamw.com [10.142.128.10]) by monkey.teamware.com (8.13.1/8.13.1) with ESMTP id l2QCS5ha028685 for ; Mon, 26 Mar 2007 15:28:05 +0300 Received: from [10.142.3.11] ([10.142.3.11]) by nimitz with ESMTP id m3qfs65r; 26 Mar 2007 15:28:00 +0300 Message-ID: <4607BC53.30404@teamware.com> Date: Mon, 26 Mar 2007 22:28:03 +1000 From: Antony Bowesman Organization: Teamware Group User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: index word files ( doc ) References: <3F5099632A78C7488A80D6535C4F4E8026631D@EX01.service.utwente.nl> <3F5099632A78C7488A80D6535C4F4E8026631E@EX01.service.utwente.nl> <4604CCFD.9030803@teamware.com> <4604D1B9.6010509@gmail.com> <4606FEEE.2070704@teamware.com> <4607717C.3070706@teamware.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (monkey.teamware.com [212.226.92.15]); Mon, 26 Mar 2007 15:28:06 +0300 (EEST) X-TWG-MailScanner-Information: See www.mailscanner.info for information X-TWG-MailScanner: Found to be clean X-TWG-MailScanner-SpamCheck: not spam, SpamAssassin (score=0, required 5, autolearn=not spam) X-MailScanner-From: adb@teamware.com X-Virus-Checked: Checked by ClamAV on apache.org Ryan Ackley wrote: > The 512 byte thing is a limitation of POIFS I think. I could be wrong > though. Have you tried opening the file with just POIFS? It was some time ago, but it looks like I used both org.apache.poi.hwpf.extractor.WordExtractor org.apache.poi.hdf.extractor.WordDocument with the same problem. Antony --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org