lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Danger <roxana.dan...@reedonline.co.uk>
Subject Re: accessing to protected elements in PythonTokenizer
Date Fri, 10 Jul 2015 13:05:07 GMT
Hi Andi,
    Thank you very much. I will use the first solution.
    Best regards.
         Roxana

On 10 July 2015 at 12:00, Andi Vajda <vajda@apache.org> wrote:

>
> On Fri, 10 Jul 2015, Roxana Danger wrote:
>
>  Hello,
>>       I am trying to construct a custom PythonTokenizer (see above), but I
>> am getting the error: "attribute 'reader' of 'Tokenizer' objects is not
>> readable" when accessing to it in reset class.
>>       reader is a protected member in Tokenizer, I was supposing it to be
>> exposed through PythonTokenizer, and it is passed to the super class in
>> the
>> constructor. Am I wrong?
>>
>
> You're right but there is no accessor for the reader object stored on the
> Java side that makes it usable from the Python side.
> You can either:
>   - add a getReader() method to the PythonTokenizer Java class that returns
>     it (and rebuild PyLucene after 'make clean')
>   - store the 'input' variable that is passed to your constructor on the
>     Python side, on your ComposerTokenizer instance. That 'input' is the
>     reader (at least, it's passed on to the Tokenizer Java class)
>
> The first option is probably safer as it doesn't assume that
> Tokenizer(reader) is not changing it in some way before storing it.
>
> Andi..
>
>        Thanks, best regards,
>>             Roxana
>>
>> class ComposerTokenizer(PythonTokenizer):
>>
>>     def __init__(self, input):
>>
>>           PythonTokenizer.__init__(self, input)
>>
>>           self.reset()
>>
>>
>>
>>     def incrementToken(self):
>>
>>          if self.index < len(self.finaltokens):
>>
>>                self.clearAttributes()
>>
>>                offsetAttr = OffsetAttributeImpl()
>>
>>                offsetAttr.setOffset( ... )
>>
>>                self.index = self.index + 1
>>
>>                return True
>>
>>            else:
>>
>>                 return False
>>
>>
>>       def reset(self):
>>
>>             s = ''
>>
>>             ch = self.reader.read()
>>
>>             while ch <> -1:
>>
>>                   s = s + ch
>>
>>                   ch = self.reader.read()
>>
>>             self.index = 0
>>
>>             self.finalTokens = ... #processing s to extract
>> self.finaltokens
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> <http://www.reed.co.uk/lovemondays>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message