lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aad Nales" <aad.na...@rotterdam-cs.com>
Subject RE: Indexing process causes Tomcat to stop working
Date Wed, 27 Oct 2004 15:14:28 GMT
James,

How do you kick off your reindex? Could it be a session timeout? 

cheers,
Aad


Hello,

I am a Java/Lucene/Tomcat newbie I know that does not bode well as a
start 
to a post but I really am in dire straits as far as Lucene goes so bear
with 
me. I am working on indexing and replacing search functionality for a 
website (about 10 gig in size, although only about 7 gig is indexed) I 
presently have a working model based on the luceneweb demo dispatched
with 
Lucene, this has already proven functional when tested on various sites 
(admittedly much smaller 200-400mb etc). However, issues occur when 
performing the index on the main site that I haven't found explained on
any 
of the Lucene forums thus far.

After a successful index and optimisation of the website (takes around
4hrs 
40m unoptimised) I can't get to the index.jsp or even access tomcat. My 
first thought was to restart tomcat. No joy and no access. Thinking the 
larger index had killed the test server I accessed apache on port 80,
which 
worked perfectly.  After a few checks I realised the test server was
fine, 
apache was fine, used the same application to create an index of the
tomcat 
docs so java was working. Confused I went back to the forums, FAQ's and 
groups to see if anyone had any similar problems and have come up with a

brief list of what my problem is not;

There is no index write.lock files found for Lucene in either /tmp or 
opt/tomcat/temp directories so the index is open to be searched. Nor
does 
'top' reveal anything overloading the system. Apache is running fine and

displays all relevant pages. Tomcat cannot be reached with a browser 
(neither the default congratulations page or the Luceneweb application) 
Tomcat was a fresh install as was Java, Tomcat logs show nothing
different 
to standard startup logs. So I logged the entire indexing process and
saw 
two errors occurring infrequently.

Parse Aborted: Encountered "\"" at line 6, column 129. //where these
values 
vary
Was expecting one of:
   <ArgName> ...
   "=" ...
   <TagEnd> ...

I'm satisfied this is just the HTML parser kicking off about some badly 
formatted HTML and is only affecting what is indexed but its here for 
completeness. The other error is more serious:

java.io.IOException: Pipe closed
       at java.io.PipedInputStream.receive(PipedInputStream.java:136)
       at java.io.PipedInputStream.receive(PipedInputStream.java:176)
       at java.io.PipedOutputStream.write(PipedOutputStream.java:129)
       at 
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
       at 
sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:395)
       at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
       at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
       at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
       at java.io.Writer.write(Writer.java:126)
       at 
org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:137)
       at 
org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:203)
       at
org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:31)

I'm again pretty sure that this is the same error that occurred once
before 
when I was using the maxFieldLength to limit the number of terms
recorded. 
I'm also confident it's a threading error and found the following post
by 
Doug Cutting that seemed to explain it
http://java2.5341.com/msg/80502.html 
however I am assuming that's what it is and haven't yet attempted to
change 
the threading system of the demo as yet due to my lack of java
knowledge.

The strange thing is after restarting the server all aspects of the
Lucene 
web application work perfectly stemming, alphanumeric indexing summaries
etc 
are all as expected, so I am left assuming due to this (and by running
out 
of options) that Lucene has somehow done something to Tomcat by doing
such a 
large index. Being that both run off Java I guess its something to do
with 
that but I have nowhere near enough experience in java to work out what

The system I am currently running on is Java - 1.4.2_05, Tomcat -
5.0.27, 
Lucene - 1.4.1, Linux version - 2.4.20-8 (gcc version 3.2.2 20030222
(Red 
Hat Linux 3.2.2-5)), Apache 2.0.42. I have not modified the mergeFactor
or 
MaxMergeDocuments nor am I using RAMdirectories. The processor is 800MHz
and 
there is 128mb of RAM.

If more info is required on setup, source code etc or you think this
should 
be moved to a tomcat forum just post.

Best regards and thanks in advance for any advice you can offer,

J Tyrrell



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message