lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: accessing to protected elements in PythonTokenizer
Date Fri, 10 Jul 2015 11:00:25 GMT

On Fri, 10 Jul 2015, Roxana Danger wrote:

> Hello,
>       I am trying to construct a custom PythonTokenizer (see above), but I
> am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
> readable" when accessing to it in reset class.
>       reader is a protected member in Tokenizer, I was supposing it to be
> exposed through PythonTokenizer, and it is passed to the super class in the
> constructor. Am I wrong?

You're right but there is no accessor for the reader object stored on the 
Java side that makes it usable from the Python side.
You can either:
   - add a getReader() method to the PythonTokenizer Java class that returns
     it (and rebuild PyLucene after 'make clean')
   - store the 'input' variable that is passed to your constructor on the
     Python side, on your ComposerTokenizer instance. That 'input' is the
     reader (at least, it's passed on to the Tokenizer Java class)

The first option is probably safer as it doesn't assume that 
Tokenizer(reader) is not changing it in some way before storing it.

Andi..

>       Thanks, best regards,
>             Roxana
>
> class ComposerTokenizer(PythonTokenizer):
>
>     def __init__(self, input):
>
>           PythonTokenizer.__init__(self, input)
>
>           self.reset()
>
>
>
>     def incrementToken(self):
>
>          if self.index < len(self.finaltokens):
>
>                self.clearAttributes()
>
>                offsetAttr = OffsetAttributeImpl()
>
>                offsetAttr.setOffset( ... )
>
>                self.index = self.index + 1
>
>                return True
>
>            else:
>
>                 return False
>
>
>       def reset(self):
>
>             s = ''
>
>             ch = self.reader.read()
>
>             while ch <> -1:
>
>                   s = s + ch
>
>                   ch = self.reader.read()
>
>             self.index = 0
>
>             self.finalTokens = ... #processing s to extract
> self.finaltokens
>
>
>
>
>
>
>
>
>
> <http://www.reed.co.uk/lovemondays>
>

Mime
View raw message