Return-Path: X-Original-To: apmail-poi-dev-archive@www.apache.org Delivered-To: apmail-poi-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0B66F9DB for ; Tue, 2 Apr 2013 20:21:02 +0000 (UTC) Received: (qmail 45265 invoked by uid 500); 2 Apr 2013 20:21:02 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 45239 invoked by uid 500); 2 Apr 2013 20:21:02 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 45231 invoked by uid 99); 2 Apr 2013 20:21:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 20:21:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.115] (HELO eir.zones.apache.org) (140.211.11.115) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 20:21:00 +0000 Received: by eir.zones.apache.org (Postfix, from userid 80) id 6C2FC77F3; Tue, 2 Apr 2013 20:20:39 +0000 (UTC) From: bugzilla@apache.org To: dev@poi.apache.org Subject: [Bug 54790] Word Document loading strategy is memory hungry and causes OutOfMemoryError Date: Tue, 02 Apr 2013 20:20:39 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: POI X-Bugzilla-Component: HWPF X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: vlsergey@gmail.com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: dev@poi.apache.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://issues.apache.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org https://issues.apache.org/bugzilla/show_bug.cgi?id=54790 --- Comment #1 from Sergey Vladimirov --- Dmitry, How much memory does you JVM have? Is it standard (JVM-default) 64/128 Mb setting, or is it some kind of mobile system? Somtimes to load the whole file into memory is the only way to process it. For example, you can't even break text into paragraphs without checking TextPiece content. And to use TextPiece just as some lightweigh proxy to DocumentStream going to be very ineffective (due to required character encoding-deconding process). Also, disabling preserveTextTable means the whole text is reconstructed into single buffer (StringBuilder). And in most cases there is no single pointer to document stream. Is a reconstruction of pretty complex structure using data from ComplexFileTable. Perhaps is it possible to use "lightweight" "TextPieceProxy" when "preserveTextTable=true" if we need only to read text. But from my point of view, it is not a nice way. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org