lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik N S" <>
Subject RE: Time to index documents
Date Thu, 26 Aug 2004 03:57:04 GMT
Hi Hetan

   Th's the  major Problem of non Standatrdized Tags for HTML Document's
  u are Indexing ,resulting in lag time taken for Indexing process....

   If u can Tweak the HTMLParser.jj file within   '/demo/html'
   [U have to have some Knowledge of JAVACC for this].


-----Original Message-----
From: Hetan Shah [mailto:Hetan.Shah@Sun.COM]
Sent: Thursday, August 26, 2004 3:01 AM
To: Lucene Users List
Subject: Time to index documents

Hello all,

Is there a way to reduce the indexing time taken when the indexer is
indexing about 30,000 + files. It is roughly taking around 6-7 hours to
do this. I am using IndexHTML class to create the index out of HTML files.

Another issue that I see is every once in a while I get the following
output on the screen.

adding ../31/1104852.html
Parse Aborted: Encountered "\"" at line 7, column 1.
Was expecting one of:
     <ArgName> ...
     "=" ...
     <TagEnd> ...

Any suggestions on preventing this from happening?

Thanks in advance.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message