poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject FW: [jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files
Date Fri, 16 Sep 2016 13:12:57 GMT

Thank you, Luis!

-----Original Message-----
From: Tim Barrett (JIRA) [mailto:jira@apache.org] 
Sent: Friday, September 16, 2016 9:11 AM
To: tallison@apache.org
Subject: [jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions
of files

    [ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496297#comment-15496297

Tim Barrett commented on TIKA-2058:

It completed an hour ago without any OOM problems, so very good.

re. my question about  poifsFileSystem.close() - I’m pretty sure that was commented out
because the method was unavailable. Not closing the poifsFileSystem sounds dangerous to me.

> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>                 Key: TIKA-2058
>                 URL: https://issues.apache.org/jira/browse/TIKA-2058
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Tim Barrett
>         Attachments: Yourkit screenshot.png, poi-3.15-beta1-p1.jar, poi-3.15-beta1-p1.pom,
prevents-OOM-when-writable-is-false.patch, screenshot-1.png, screenshot-2.png, screenshot-3.png
> We have an application using Tika which parses roughly 7,000,000 files of different types,
many of the files are MSG files with attachments. This works correctly with Tika 1.9, and
has been in production for over a year,  with parsing runs taking place every few weeks. The
same application runs into insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however neither shows
any open files when the application is running. I did find an issue with open files https://issues.apache.org/jira/browse/TIKA-2015,
however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof turning nothing
up I am a bit stuck as to how to investigate further. We are more than willing to help by
testing on the basis of any ideas provided.

This message was sent by Atlassian JIRA
View raw message