lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik N S" <kart...@controlnet.co.in>
Subject RE: Time to index documents
Date Thu, 26 Aug 2004 03:57:04 GMT
Hi Hetan


   Th's the  major Problem of non Standatrdized Tags for HTML Document's
  u are Indexing ,resulting in lag time taken for Indexing process....


   If u can Tweak the HTMLParser.jj file within  lucene.zip   '/demo/html'
file
   [U have to have some Knowledge of JAVACC for this].



Karthik

-----Original Message-----
From: Hetan Shah [mailto:Hetan.Shah@Sun.COM]
Sent: Thursday, August 26, 2004 3:01 AM
To: Lucene Users List
Subject: Time to index documents


Hello all,

Is there a way to reduce the indexing time taken when the indexer is
indexing about 30,000 + files. It is roughly taking around 6-7 hours to
do this. I am using IndexHTML class to create the index out of HTML files.

Another issue that I see is every once in a while I get the following
output on the screen.

adding ../31/1104852.html
Parse Aborted: Encountered "\"" at line 7, column 1.
Was expecting one of:
     <ArgName> ...
     "=" ...
     <TagEnd> ...

Any suggestions on preventing this from happening?

Thanks in advance.
-H


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message