manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Żyta <kamil.z...@pwr.edu.pl>
Subject Re: agents process ran out of memory
Date Wed, 15 Apr 2015 15:51:39 GMT
On Wed, Apr 15, 2015 at 11:16:44AM -0400, Karl Wright wrote:
> Hi Kamil,
> 
> I bet that it is one specific file that was causing the problem.  By
> increasing the stack space, you allowed the file to be processed.  Now it
> won't get processed again until it changes.
> 
> My thought is that this is *probably* related to Tika.  Are you using the
> Tika transformer?

yes, I use Tika transformation and I think this is related to Tika too but don't
know which file cause the problem. I have two identical jobs (one for continuous crawl
and one for deletion), these jobs report diffrent documents count and only
continuous job cause regex errors.

Another job give me "agents process ran out of memory - shutting down" but
this is related to Tika too. Excluded one file and now is working.

K

> 
> 
> On Wed, Apr 15, 2015 at 9:11 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:
> 
> > I stopped all agents, removed all logs, add '-Xss500m' to options file,
> > started agents and errors are gone. Now I removed '-Xss500m' from options
> > to trap the source of the problem, restart all agents and still no errors.
> >
> > *magic*
> >
> > Thx Karl for you patience and my weird problems.
> >
> > K
> >
> > On Wed, Apr 15, 2015 at 08:39:52AM -0400, Karl Wright wrote:
> > > Hi Kamil,
> > >
> > > I believe your logs are probably "rolling".  This means that when the log
> > > gets full, or another day starts, a new log file starts.  I don't know,
> > of
> > > course, because I did not configure your system.
> > >
> > > What I *do* know is that the stack trace that you are providing me is
> > > incomplete, and while it is clear that the Java regular expression parser
> > > is failing in some way (by doing infinite recursion), I have no idea what
> > > *context* this is occurring in, without the end of that stack trace.
> > >
> > > This may be occurring almost anywhere, which is why I need the trace.
> > Even
> > > String.replace() and String.split() use regexps and can be at fault.
> > > Without a definitive source, there's little I can do.
> > >
> > > One thing you can certainly try is to provide a larger amount of stack
> > > space to the JVM and just hope the problem goes away.  That would mean
> > > editing one of the options files and adding a parameter:
> > >
> > > -Xss500m
> > >
> > > (for instance)
> > >
> > > If you would rather get to the source of the problem, I suggest the
> > > following:
> > >
> > > (1) Shut down all agents processes
> > > (2) Remove all logs
> > > (3) Start the agents process
> > > (4) Tail the log looking for "FATAL": tail -f manifoldcf.log | grep FATAL
> > > (5) As soon as you see that, shut down the agents process
> > > (6) Look at the log file produced
> > >
> > > References:
> > >
> > http://stackoverflow.com/questions/7509905/java-lang-stackoverflowerror-while-using-a-regex-to-parse-big-strings
> > >
> > > Karl
> > >
> > >
> > > On Wed, Apr 15, 2015 at 8:28 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> > wrote:
> > >
> > > > # java -version
> > > > java version "1.8.0_45"
> > > > Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> > > > Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
> > > >
> > > > it's broken? I don't know. How can I prevend rolling backtrace?
> > > > It's look like infinity loop for me.
> > > >
> > > > K
> > > >
> > > > On Wed, Apr 15, 2015 at 07:41:37AM -0400, Karl Wright wrote:
> > > > > Clearly the logs must have rolled then?  Either that or you are
> > using a
> > > > > broken jdk.
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > > On Wed, Apr 15, 2015 at 7:37 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> > > > wrote:
> > > > >
> > > > > > On Wed, Apr 15, 2015 at 07:27:56AM -0400, Karl Wright wrote:
> > > > > > > Hi Kamil:
> > > > > > >
> > > > > > > kawright@duck76:/data/kawright/analysis$ gzip --version
> > > > > > > gzip 1.4
> > > > > > > Copyright (C) 2007 Free Software Foundation, Inc.
> > > > > > > Copyright (C) 1993 Jean-loup Gailly.
> > > > > > > This is free software.  You may redistribute copies of
it under
> > the
> > > > > > terms of
> > > > > > > the GNU General Public License <
> > http://www.gnu.org/licenses/gpl.html
> > > > >.
> > > > > > > There is NO WARRANTY, to the extent permitted by law.
> > > > > > >
> > > > > > > Written by Jean-loup Gailly.
> > > > > > > kawright@duck76:/data/kawright/analysis$
> > > > > > >
> > > > > > >
> > > > > > > But in any case the key part of the stack trace is further
down,
> > > > probably
> > > > > > > MUCH further down.
> > > > > > >
> > > > > > > If I were you, I'd unzip the whole log and use head, tail,
and
> > grep
> > > > to
> > > > > > find
> > > > > > > where the exception trace ends.
> > > > > >
> > > > > > I use grep -v and send you logs before but you don't belive
me.
> > > > > > It's all mcf logs http://pastebin.com/T54NKwTh
> > > > > > http://pastebin.com/uMxaUnGi
> > > > > >
> > > > > > K
> > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 15, 2015 at 7:18 AM, Kamil Żyta <
> > kamil.zyta@pwr.edu.pl>
> > > > > > wrote:
> > > > > > >
> > > > > > > > hmm, try tar -xf manifoldcf.log.gz or maybe zless?
> > > > > > > > It's work for me with:
> > > > > > > > > gzip --version
> > > > > > > > gzip 1.6
> > > > > > > >
> > > > > > > > For sure I attached uncompressed file.
> > > > > > > >
> > > > > > > > K
> > > > > > > >
> > > > > > > > On Wed, Apr 15, 2015 at 07:10:07AM -0400, Karl Wright
wrote:
> > > > > > > > > Hi Kamil,
> > > > > > > > >
> > > > > > > > > >>>>>>
> > > > > > > > > kawright@duck76:~$ cd /data/kawright/analysis/
> > > > > > > > > kawright@duck76:/data/kawright/analysis$ gunzip
> > > > manifoldcf.log.gz
> > > > > > > > >
> > > > > > > > > gzip: manifoldcf.log.gz: invalid compressed data--crc
error
> > > > > > > > >
> > > > > > > > > gzip: manifoldcf.log.gz: invalid compressed data--length
> > error
> > > > > > > > > kawright@duck76:/data/kawright/analysis$
> > > > > > > > >
> > > > > > > > > <<<<<<
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Apr 15, 2015 at 6:41 AM, Kamil Żyta
<
> > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > these 1k lines are the same. I attached
full
> > manifoldcf.log.
> > > > > > > > > >
> > > > > > > > > > K
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 15, 2015 at 06:33:06AM -0400,
Karl Wright
> > wrote:
> > > > > > > > > > > Hi Kamil,
> > > > > > > > > > >
> > > > > > > > > > > There is a complete trace in there,
believe me.  The JVM
> > did
> > > > not
> > > > > > > > say: "
> > > > > > > > > > (...)
> > > > > > > > > > > ~1k lines".  What I need is at the
bottom of those 1K
> > lines.
> > > > > > > > > > >
> > > > > > > > > > > Karl
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 15, 2015 at 6:23 AM, Kamil
Żyta <
> > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > How can I provide usable stack
trace? I can only copy
> > what
> > > > logs
> > > > > > > > says.
> > > > > > > > > > > > Now it's a lot of:
> > > > > > > > > > > > FATAL 2015-04-15 12:14:35,645
(Worker thread '5') -
> > Error
> > > > > > tossed:
> > > > > > > > null
> > > > > > > > > > > > java.lang.StackOverflowError
> > > > > > > > > > > >         at
> > > > > > > > > >
> > java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
> > > > > > > > > > > >         at
> > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
> > > > > > > > > > > >         at
> > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > >         (...) ~1k lines
> > > > > > > > > > > >
> > > > > > > > > > > > for continuous job but agents
is not exiting. Propably
> > > > this two
> > > > > > > > errors
> > > > > > > > > > > > below isn't correlated (patterns
and agents oom).
> > > > > > > > > > > >
> > > > > > > > > > > > K
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Apr 14, 2015 at 05:28:18PM
-0400, Karl Wright
> > > > wrote:
> > > > > > > > > > > > > Without some kind of usable
stack trace I can't
> > really
> > > > help
> > > > > > > > you.  It
> > > > > > > > > > > > looks
> > > > > > > > > > > > > like some regular expression
is going completely
> > haywire,
> > > > > > but I
> > > > > > > > have
> > > > > > > > > > no
> > > > > > > > > > > > > idea which one.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Karl
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Apr 14, 2015 at 4:31
PM, Kamil Żyta <
> > > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Apr 14, 2015
at 04:12:55PM -0400, Karl
> > Wright
> > > > > > wrote:
> > > > > > > > > > > > > > > Hi Kamil,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Without the bottom
of the stack trace, I can't
> > even
> > > > tell
> > > > > > > > what it
> > > > > > > > > > is
> > > > > > > > > > > > > > doing.
> > > > > > > > > > > > > > > Where are you supplying
a regular expression?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's all I have, the
only regular expression is in
> > > > 'Paths':
> > > > > > > > > > > > > > 3. Exclude file(s) or
directory(s) matching */.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I found files (~500MB,
logs) where solr logs ends,
> > > > > > > > > > > > > > exclude them solves
the problem. mcf use tika for
> > > > > > extracting
> > > > > > > > > > > > > > and only /update to
solr, these files causes
> > problem
> > > > befor,
> > > > > > > > > > > > > > when using solr for
extract docs. Now mcf dies and
> > I
> > > > do not
> > > > > > > > even
> > > > > > > > > > know
> > > > > > > > > > > > why.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > K
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Running out of
memory might be a side effect of
> > > > running
> > > > > > out
> > > > > > > > of
> > > > > > > > > > stack.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Apr 14,
2015 at 2:49 PM, Kamil Żyta <
> > > > > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > agent process
exit with:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > agents process
ran out of memory - shutting
> > down
> > > > > > > > > > > > > > > > java.lang.OutOfMemoryError:
Java heap space
> > > > > > > > > > > > > > > >         at
> > > > > > java.util.Arrays.copyOfRange(Arrays.java:3664)
> > > > > > > > > > > > > > > >         at
> > java.lang.String.<init>(String.java:201)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.lang.StringBuilder.toString(StringBuilder.java:407)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.buildSolrDocument(HttpPoster.java:987)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:882)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > workers threads:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > FATAL 2015-04-14
18:59:11,172 (Worker thread
> > '32')
> > > > -
> > > > > > Error
> > > > > > > > > > tossed:
> > > > > > > > > > > > null
> > > > > > > > > > > > > > > > java.lang.StackOverflowError
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > >
> > > > > > java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > >         (...)
~1k lines
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > no errors/warns
in solr logs.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > it's bug or
just corrupted file?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > K
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >

Mime
View raw message