manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mr.Keuz (JIRA)" <>
Subject [jira] [Created] (CONNECTORS-1317) Hang parsing on some ZIP document
Date Sat, 21 May 2016 01:31:12 GMT
Mr.Keuz created CONNECTORS-1317:

             Summary: Hang parsing on some ZIP document
                 Key: CONNECTORS-1317
             Project: ManifoldCF
          Issue Type: Bug
          Components: File system connector
    Affects Versions: ManifoldCF 2.3
Ubuntu 14.04 Linux 3.13.0-86-generic i686 i686

java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

DB: Postgres 9.5.1

            Reporter: Mr.Keuz

I use ManifolCF as file crawler. But I found, that crawling process hangs on some zip files.
Although some files parsing normally. 

1. Run ManfoldCF by  "example/" and Posgres as DB
2. Create manifold pipeline: File -> Tika -> Solr
3. Put zip file in folder (in attach below)
4. Run job

Here zip file that should reproduce bug: 

As I investigated (by strace) - crawler process tries to open and parse same zip file again
and again (it seems from different workers threads). And It seems that document not removes
from queue.

I am newbie in ManifoldCF, so it is hard task to me to find problem in source code.

I can send some additional info if needed.

This message was sent by Atlassian JIRA

View raw message