poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [VOTE] Apache POI 3.15-beta3
Date Fri, 09 Sep 2016 18:44:29 GMT
Thank you, Dominik, for catching these!  3 cheers for mass regression testing!


I'm finally back from break and catching up on emails...

-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at] 
Sent: Monday, August 15, 2016 6:09 AM
To: POI Developers List <dev@poi.apache.org>
Subject: Re: [VOTE] Apache POI 3.15-beta3

Hi,

Running the regression tests for POI 3.15-beta3 against the CommonCrawl corpus is now finished,
initial results are as follows:

* 11966 fail because I did not add commons-collections4, I'll trigger a re-run to get document-counts
correctly show  the number of regressing documents

* 456 times: ArrayIndexOutOfBoundsException in SprmOperation.getOperand()

java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
	at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:317)
	at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85)
	at o.a.p.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:60)
	at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
	at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:113)
	at o.a.p.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:62)
	at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:44)
	at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
	at o.a.p.hwpf.usermodel.Section.(Section.java:36)
	at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
	at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
	at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:299)
	... 9 more

* 4 times NullPointerException in XSLFTextParagraph.getDefaultFontSize()

java.lang.NullPointerException
	at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(XSLFTextParagraph.java:935)
	at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(DrawTextParagraph.java:567)
	at o.a.p.sl.draw.DrawTextParagraph.breakText(DrawTextParagraph.java:235)
	at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.java:158)
	at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.java:219)
	at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.java:102)
	at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
	at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
	at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
	at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
	at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.java:120)
	at o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:43)
	at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.java:43)
	at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)



The others are probably flaky things where files caused OOM/Timeout before and thus were not
reported with these errors before.


See http://people.apache.org/~centic/poi_regression/reports/ and http://people.apache.org/~centic/poi_regression/reportsAll/
for detailed results.


Thanks... Dominik.


On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <onealj@apache.org> wrote:

> Correction: HSLF. This is a ppt/OLE2 file.
>
> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <onealj@apache.org> wrote:
> > Tim,
> >
> > I have extracted the pptx PowerPoint file containing the Prague 
> > footer. I'm want to write a unit test for POI to find the Prague 
> > string so I can figure why Prague was not included in the Tika 
> > regression test using POI 3.15 beta 3 but was found by POI 3.15 beta 
> > 1.
> >
> > Could you point me to the Tika code that generated the potential 
> > regressions zip file in TIKA-2013, or the POI class/function that is 
> > used to extract the text from a document?
> >
> > Also, is the pptx file shareable and ASL 2.0 licensed so that it can 
> > be included as part of POI's unit test suite?
> >
> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal <javenoneal@gmail.com>
> wrote:
> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <tallison@mitre.org>
> wrote:
> >>>...the two potential content regressions may be caused by something 
> >>>at
> the
> >>> Tika level.  If anyone has time to take a look, that'd be great.
> >>
> >> I can take a look this weekend.
> >>
> >> Did you use the same Tika code with different POI versions for 
> >> these
> tests
> >> (so that we can attribute the change in behavior to a POI commit,
> regardless
> >> of whether the bug is in Tika or POI)?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
> commands, e-mail: dev-help@poi.apache.org
>
>
Mime
View raw message