tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-529) IBM420 charset detection's isLamAlef is allocation-happy
Date Sat, 05 Nov 2011 10:58:51 GMT

    [ https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144652#comment-13144652

Michael McCandless commented on TIKA-529:

This patch looks safe, and avoids crazy allocations inside this detector.... can we commit
it (reversing the first 2 conditions)?
> IBM420 charset detection's isLamAlef is allocation-happy
> --------------------------------------------------------
>                 Key: TIKA-529
>                 URL: https://issues.apache.org/jira/browse/TIKA-529
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8
>            Reporter: Radek
>            Assignee: Ken Krugler
>            Priority: Minor
>         Attachments: isLamAlef.diff
> Two IBM420 charset detectors (rtl and ltr) run isLamAlef() for each byte of detection
> The code is allocating and filling a bytes array every time it runs, which makes it responsible
for approximately 70% of all object allocations in my current test case (many text files).
> Since array is identical every time, and the entire thing can be achieved without any
array, this is wasteful.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message