lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rishabh Bajpai" <r_baj...@lycos.com>
Subject Strange problem while indexing?
Date Sat, 14 Jun 2003 04:57:24 GMT

i am using lucene to index xml+html files. the xml contains the metadata associated with the
html file.

the process, at a high level, is: 
-create a list of all xml files in a folder
-parse through each of the xml file using SAX parser
-create name:value pairs out of the tags and values, and index them
-one of the tag contains the url to the html page
-when you encounter that, parse the html file

when i do this for a few files, it seems to work fine. however, as the number of files increase,
it starts to throw an error!
initially, i get a "SAXException: Content is not allowed in trailing section." - but i checked
and the xml file seems to be well-formed! i even tried indexing this file individually, and
it worked!
then i get "Index locked for write: Lock@/export/home.../write.lock"
at times, i also get a "Timed out waiting for: Lock@/export/home/.../commit.lock"

as a result of this, the index doesnt get updates and the results are incorrect. i also observed
once that while the index is being built, i get the results, but when it exits, i stop getting
results. possibly, my hunch is that index updation didnot get commited?

what is particularly intersting to note is that this problem occurs at only some times. another
observation is that it worked fine for around 50 files, but not for about 100 files?

can anyone help me - or give pointers as to what is going on here?

-rishabh


 


____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message