tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Rauch" <a...@labkey.com>
Subject RE: AutoDetectParser not thread-safe?
Date Wed, 27 Jan 2010 16:00:11 GMT
Issue created: http://issues.apache.org/jira/browse/TIKA-374

-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Sent: Tuesday, January 26, 2010 10:24 AM
To: tika-user@lucene.apache.org
Subject: Re: AutoDetectParser not thread-safe?

Hi,

On Tue, Jan 26, 2010 at 6:50 PM, Adam Rauch <adam@labkey.com> wrote:
> We are using Tika 0.5 to parse files that are added to a Lucene index.  If
> we assign multiple threads to the parsing task we find that the
> AutoDetectParser.parse() method will occasionally fail to return.  In our
> case, it appears that a HashMap inside Xerces gets corrupted, causing an
> infinite loop inside HashMap.get().  This seems to be a concurrency
problem;
> we have not seen the issue when running single threaded.

Hmm, that's indeed quite troublesome.

> I can open a JIRA issue if you’d prefer.

That would be great. Thanks to your in-depth analysis of the problem
it should be easy to come up with a fix.

BR,

Jukka Zitting


Mime
View raw message