Return-Path: X-Original-To: apmail-lucene-pylucene-dev-archive@minotaur.apache.org Delivered-To: apmail-lucene-pylucene-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 891EB17884 for ; Fri, 10 Jul 2015 11:00:30 +0000 (UTC) Received: (qmail 75256 invoked by uid 500); 10 Jul 2015 11:00:30 -0000 Delivered-To: apmail-lucene-pylucene-dev-archive@lucene.apache.org Received: (qmail 75225 invoked by uid 500); 10 Jul 2015 11:00:30 -0000 Mailing-List: contact pylucene-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pylucene-dev@lucene.apache.org Delivered-To: mailing list pylucene-dev@lucene.apache.org Received: (qmail 75214 invoked by uid 99); 10 Jul 2015 11:00:30 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jul 2015 11:00:30 +0000 Received: from pc19.home (AOrleans-653-1-236-152.w90-24.abo.wanadoo.fr [90.24.23.152]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 793A41A0155 for ; Fri, 10 Jul 2015 11:00:29 +0000 (UTC) Date: Fri, 10 Jul 2015 04:00:25 -0700 (PDT) From: Andi Vajda X-X-Sender: vajda@pc19.home Reply-To: Andi Vajda To: pylucene-dev@lucene.apache.org Subject: Re: accessing to protected elements in PythonTokenizer In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (OSX 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 10 Jul 2015, Roxana Danger wrote: > Hello, > I am trying to construct a custom PythonTokenizer (see above), but I > am getting the error: "attribute 'reader' of 'Tokenizer' objects is not > readable" when accessing to it in reset class. > reader is a protected member in Tokenizer, I was supposing it to be > exposed through PythonTokenizer, and it is passed to the super class in the > constructor. Am I wrong? You're right but there is no accessor for the reader object stored on the Java side that makes it usable from the Python side. You can either: - add a getReader() method to the PythonTokenizer Java class that returns it (and rebuild PyLucene after 'make clean') - store the 'input' variable that is passed to your constructor on the Python side, on your ComposerTokenizer instance. That 'input' is the reader (at least, it's passed on to the Tokenizer Java class) The first option is probably safer as it doesn't assume that Tokenizer(reader) is not changing it in some way before storing it. Andi.. > Thanks, best regards, > Roxana > > class ComposerTokenizer(PythonTokenizer): > > def __init__(self, input): > > PythonTokenizer.__init__(self, input) > > self.reset() > > > > def incrementToken(self): > > if self.index < len(self.finaltokens): > > self.clearAttributes() > > offsetAttr = OffsetAttributeImpl() > > offsetAttr.setOffset( ... ) > > self.index = self.index + 1 > > return True > > else: > > return False > > > def reset(self): > > s = '' > > ch = self.reader.read() > > while ch <> -1: > > s = s + ch > > ch = self.reader.read() > > self.index = 0 > > self.finalTokens = ... #processing s to extract > self.finaltokens > > > > > > > > > > >