lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hetan Shah <Hetan.S...@Sun.COM>
Subject Re: Time to index documents
Date Wed, 25 Aug 2004 22:07:30 GMT
Do you have any pointers for sample code for them?
Would highly appreciate it.
Thanks.
-H

Stephane James Vaucher wrote:

> I don't think that the demo parser is meant as a production 
> system component. You can look at Tidy or NekoHtml. They cleanup your html 
> and are probably optimised.
> 
> sv
> 
> On Wed, 25 Aug 2004, Hetan Shah wrote:
> 
> 
>>Hello all,
>>
>>Is there a way to reduce the indexing time taken when the indexer is 
>>indexing about 30,000 + files. It is roughly taking around 6-7 hours to 
>>do this. I am using IndexHTML class to create the index out of HTML files.
>>
>>Another issue that I see is every once in a while I get the following 
>>output on the screen.
>>
>>adding ../31/1104852.html
>>Parse Aborted: Encountered "\"" at line 7, column 1.
>>Was expecting one of:
>>     <ArgName> ...
>>     "=" ...
>>     <TagEnd> ...
>>
>>Any suggestions on preventing this from happening?
>>
>>Thanks in advance.
>>-H
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message