lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Custom Filter Indexing Slow
Date Fri, 14 Sep 2012 10:10:47 GMT
The problem ist hat your transformation method needs Strings, but your incrementToken method
also has a serious bug: It does not respect the length of the buffer, so it may hit additional
garbage!


The easiest way to do this in lots less code and not having those bugs:

     public boolean incrementToken() throws IOException { 
     	if (!input.incrementToken()) {
 	    return false;
     	}
     	final String normalizedLCcallnum = getLCShelfkey(charTermAttr.toString());
	charTermAttr.setEmpty().append(normalizedLCcallnum);
	return true;
     }

This fixes part of your performance problem: It does not 2 times convert the result of your
transformation between char arrays, Strings,..

To further improve speed, make the method getLCShelfKey directly operatate on char[] and length.

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Osullivan L. [mailto:L.Osullivan@swansea.ac.uk]
> Sent: Friday, September 14, 2012 11:58 AM
> To: general@lucene.apache.org
> Subject: Custom Filter Indexing Slow
> 
> Hi Folks,
> 
> I have a custom filter which does everything I need it to but it has reduced my
> indexing speed to a crawl. Are there any methods I need to call to clear / clean
> things up once my script (details below) has done it's work?
> 
> Thanks,
> 
> Luke
> 
>   public LCCNormalizeFilter(TokenStream input)
>     {
>         super(input);
>         this.charTermAttr = addAttribute(CharTermAttribute.class);
>     }
> 
>     public boolean incrementToken() throws IOException {
> 
>     	if (!input.incrementToken()) {
> 	    return false;
>     	}
> 
>     	char[] buffer = charTermAttr.buffer();
>     	String rawLCcallnum = new String(buffer);
>     	String normalizedLCcallnum = getLCShelfkey(rawLCcallnum);
>     	char[] newBuffer = normalizedLCcallnum.toCharArray();
>         charTermAttr.setEmpty();
>         charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
>         return true;
>     }=


Mime
View raw message