lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Output from a small Snowball benchmark
Date Thu, 08 Oct 2009 08:04:44 GMT
There have been a few small comments in the Jira about the reflection  
in Snowball's Among class. There is very little to do about this  
unless one want to redesign the stemmers so they include an inner  
class that handle the method callbacks. That's quite a bit of work and  
I don't even know how much CPU one would save by doing this.

So I was thinking maybe it would save a some resources if one reused  
the stemmers instead of reinstantiating them, which I presume  
everybody does.

I thought it would make most sense to simulate query time stemming so  
my benchmark contained 4 words where 2 of them are plural. Each test  
ran 1 000 000 times. The amount of CPU time used is bearly noticeable  
relative to what other things cost: 0.0109ms/iteration when  
reinstantiating, 0.0067ms/iteration when reusing.

The heap consuption was however rather different. At the end of  
reinstantiation it had consumed about 10x more than when reusing.  
~20MB vs. ~2MB.

I realize people don't usally run 1 000 000 queries in so short time,  
but at least this is an indication that one could save some GC time  
here. Many a mickle makes a muckle...

So I was thinking that perhaps it would make sense with something like  
a singleton concurrent queue in the SnowballFilter and a new  
constructor that takes the snowball program implementation class as an  

But this might also be way premature optimization.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message