lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Osullivan L. <L.Osulli...@swansea.ac.uk>
Subject RE: Custom Filter Indexing Slow
Date Fri, 14 Sep 2012 10:36:49 GMT
Hi Uwe,

Thanks for the advice! My indexing routine is back up to speed.

If I ever make it to Bremen or near by, I definitely owe you a beer!

Kind Regards,

Luke



________________________________________
From: Uwe Schindler [uwe@thetaphi.de]
Sent: 14 September 2012 11:10
To: general@lucene.apache.org
Subject: RE: Custom Filter Indexing Slow

The problem ist hat your transformation method needs Strings, but your incrementToken method
also has a serious bug: It does not respect the length of the buffer, so it may hit additional
garbage!


The easiest way to do this in lots less code and not having those bugs:

     public boolean incrementToken() throws IOException {
        if (!input.incrementToken()) {
            return false;
        }
        final String normalizedLCcallnum = getLCShelfkey(charTermAttr.toString());
        charTermAttr.setEmpty().append(normalizedLCcallnum);
        return true;
     }

This fixes part of your performance problem: It does not 2 times convert the result of your
transformation between char arrays, Strings,..

To further improve speed, make the method getLCShelfKey directly operatate on char[] and length.

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Osullivan L. [mailto:L.Osullivan@swansea.ac.uk]
> Sent: Friday, September 14, 2012 11:58 AM
> To: general@lucene.apache.org
> Subject: Custom Filter Indexing Slow
>
> Hi Folks,
>
> I have a custom filter which does everything I need it to but it has reduced my
> indexing speed to a crawl. Are there any methods I need to call to clear / clean
> things up once my script (details below) has done it's work?
>
> Thanks,
>
> Luke
>
>   public LCCNormalizeFilter(TokenStream input)
>     {
>         super(input);
>         this.charTermAttr = addAttribute(CharTermAttribute.class);
>     }
>
>     public boolean incrementToken() throws IOException {
>
>       if (!input.incrementToken()) {
>           return false;
>       }
>
>       char[] buffer = charTermAttr.buffer();
>       String rawLCcallnum = new String(buffer);
>       String normalizedLCcallnum = getLCShelfkey(rawLCcallnum);
>       char[] newBuffer = normalizedLCcallnum.toCharArray();
>         charTermAttr.setEmpty();
>         charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
>         return true;
>     }=


Mime
View raw message