poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 51946] New: [BUG] TextPieceTable <init> ArrayIndexOutOfBoundsException and IllegalStateException - Hong Kong encoding?
Date Tue, 04 Oct 2011 00:27:55 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=51946

             Bug #: 51946
           Summary: [BUG] TextPieceTable <init>
                    ArrayIndexOutOfBoundsException and
                    IllegalStateException - Hong Kong encoding?
           Product: POI
           Version: 3.8-dev
          Platform: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
        AssignedTo: dev@poi.apache.org
        ReportedBy: hockey69guy@yahoo.com
    Classification: Unclassified


Unable to include sample document due to sensitive nature.

If there any pointers for utilities that can further investigate the documents,
let me know and I'll see what further information I can supply.

A few of my documents are trying to perform an arraycopy with a length thats
greater than the amount remaining in the stream buffer.  File opens
successfully in Word 2010, and may be older than a Word97 document.  Documents
likely has encoding from Hong Kong region.


A couple produce the following Stack Trace (Daily Build)
Caused by: java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:108)
    at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:71)
    at
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:410)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:69)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)



More than a handful are caught earlier on and produce this stack trace:
Caused by: java.lang.IllegalStateException: Told we're for characters 0 ->
6385, but actually covers 6373 characters!
    at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:73)
    at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:115)
    at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:71)
    at
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:410)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:69)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message