lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: new TokenStream api Question
Date Sun, 26 Apr 2009 21:34:45 GMT

thanks Uwe, 

looks like nice use case to cover, but I was/am not sure what would be the way around it?

Your propsal sounds OK to me, but I am not familiar enough with this API to say for sure...

I guess we would need to make this put safer to prevent people making silly mistakes defining
class from completely different objects.  

For this particular case, I would argue it makes sense to add methods you usually find in
String like classes to TermAttribute, like 
starts(ends)Width(char)... but this sounds wrong, motivates duplication of code.

original motivation is to get char[] and length of the TermAttribute quickly, hmm maybe simply
adding:
char[] rawTermBuffer(){
return termBuffer;
}

the same for length...

with javadoc "feel free to shoot yourself" :)




 




----- Original Message ----
> From: Uwe Schindler <uwe@thetaphi.de>
> To: java-dev@lucene.apache.org
> Sent: Sunday, 26 April, 2009 23:03:06
> Subject: RE: new TokenStream api Question
> 
> There is one problem: if you extend TermAttribute, the class is different
> (which is the key in the attributes list). So when you initialize the
> TokenStream and do a
> 
> YourClass termAtt = (YourClass) addAttribute(YourClass.class)
> 
> ...you create a new attribute. So one possibility would be to also specify
> the instance and save the attribute by class (as key), but with your
> instance. If you are the first one that creates the attribute (if it is a
> token stream and not a filter it is ok, you will be the first, it adding the
> attribute in the ctor), everything is ok. Register the attribute by yourself
> (maybe we should add a specialized addAttribute, that can specify a instance
> as default)?:
> 
> YourClass termAtt = new YourClass();
> attributes.put(TermAttribute.class, termAtt);
> 
> In this case, for the indexer it is a standard TermAttribute, but you can
> more with it.
> 
> Replacing TermAttribute by an own class is not possible, as the indexer will
> get a ClassCastException when using the instance retrieved with
> getAttribute(TermAttribute.class).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: eks dev [mailto:eksdev@yahoo.co.uk]
> > Sent: Sunday, April 26, 2009 10:39 PM
> > To: java-dev@lucene.apache.org
> > Subject: new TokenStream api Question
> > 
> > 
> > I am just looking into new TermAttribute usage and wonder what would be
> > the best way to implement PrefixFilter that would filter out some Terms
> > that have some prefix,
> > 
> > something like this, where '-' represents my prefix:
> > 
> >   public final boolean incrementToken() throws IOException {
> >     // the first word we found
> >     while (input.incrementToken()) {
> >       int len = termAtt.termLength();
> > 
> >       if(len > 0 && termAtt.termBuffer()[0]!='-') //only length >
0 and
> > non LFs
> >     return true;
> >       // note: else we ignore it
> >     }
> >     // reached EOS
> >     return false;
> >   }
> > 
> > 
> > 
> > 
> > 
> > The question would be:
> > 
> > can I extend TermAttribute and add boolean startsWith(char c);
> > 
> > The point is speed and my code gets smaller.
> > TermAttribute has one method called in termLength() and termBuffer() I do
> > not understand (back compatibility, I guess)
> >   public int termLength() {
> >     initTermBuffer(); // I'd like to avoid it...
> >     return termLength;
> >   }
> > 
> > 
> > I'd like to get rid of initTermBuffer(), the first option is to *extend*
> > TermAttribute code (but fields are private, so no help there) or can I
> > implement my own MyTermAttribute (will Indexer know how to deal with it?)
> > 
> > Must I extend TermAttribute or I can add my own?
> > 
> > thanks,
> > eks
> > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message