lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Olenin" <VOle...@cihi.ca>
Subject RE: experiences with lingpipe
Date Fri, 03 Nov 2006 15:03:06 GMT
> You need to increase the memory for java. I think 32-bit jave is
limited to a 1.3 gig heap but
> could be wrong. No heuristics at the tip of my fingers.

32-bit JVM under Linux/Windows. Solaris runs OK. Limit on the heap is
~1.7 - 1.8Gb.

-----Original Message-----
From: Breck Baldwin [mailto:breck@alias-i.com] 
Sent: Friday, November 03, 2006 9:59 AM
To: java-user@lucene.apache.org
Subject: Re: experiences with lingpipe



Martin Braun wrote:
> Hi Breck,
> 
> i have tried your tutorial and built (hopefully) a successful 
> SpellCheck.model File with 49M.
> My Lucene Index directory is 2,4G. When I try to read the Model with 
> the readmodel function, i get an "Exception in thread "main" 
> java.lang.OutOfMemoryError: Java heap space", though I started java 
> with -Xms1024m -Xmx1024m.
> 
> How many RAM will I need for the Model (I only have 2 GB of physical 
> RAM, and lucene's also using some memory).

You need to increase the memory for java. I think 32-bit jave is limited
to a 1.3 gig heap but could be wrong. No heuristics at the tip of my
fingers.

To make the spell checker smaller you can prune the tokens using the
pruneLM method in the TrainSpellChecker. Pruning the 1 counts should
make a big difference and not hurt spelling too much (depends on how
things are paramterized). Probably up to 5 counts won't matter.

Also look at my tuning tutorial that is in very rough shape but will get
you going on tuning at:

cvs -d:pserver:anonymous@alias-i.com:/usr/local/sandbox co
querySpellCheckTuner

I will try to get another pass at it over the weekend.

b reck


> 
> Is there a "rule of thumb" to calculate the needed amount of memory of

> the model?
> 
> thanks in advance,
> 
> martin
> 
> 
> 
>>>>Tuning params dominate the performance space. A small beam (16 
>>>>active
>>>>hypotheses) will be quite snappy (I have 200 queries/sec with a 32
beam.
>>>>over a 80 gig text collection that with some pruning was 5 gig in 
>>>>memory running an 8 gram model)
>>>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message