lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: Tokenizer text source
Date Thu, 27 Oct 2016 10:49:23 GMT

On Tue, 25 Oct 2016, Marc Jeurissen wrote:

> I have a custom Analyzer and Tokenizer which I'm trying to migrate from 
> Pylucene 4.10 to 6.2.
>
> Problem is that it is no longer possible to grab the text source from neither 
> the createComponents method or the Tokenizer constructor. Documentation says 
> the Tokenizer has a field 'input' which contains the text source, but in 
> Pylucene a Tokenizer does not seem to have a attribute 'input'..
>
> Any idea how I can address the text source?

I now expanded in JCC the capability of explicitely requesting a wrapper for 
a non public field, such as 'input' which is a protected field. That field 
is then available as an attribute on the corresponding python wrapper class.

I then added
   org.apache.lucene.analysis.Tokenizer:input
to the list of explicitely requested wrappers in pylucene's Makefile.

     >>> from lucene import *
     >>> initVM()
     <jcc.JCCEnv object at 0x10028a0f0>
     >>> from org.apache.lucene.analysis import Tokenizer
     >>> Tokenizer.input
     <attribute 'input' of 'Tokenizer' objects>

This is available from svn trunk rev 1766805.

To get this new feature, svn update to HEAD on trunk and:
  - rebuild jcc
  - rebuilt pylucene

If you have questions don't hesitate to ask (but subscribe to 
pylucene-dev@ first so that your message doesn't sit in a moderation queue).

Thanks !

Andi..

>
> analyzer = MyAnalyzer()    -> 'createComponents' sets MyTokenizer
> config = IndexWriterConfig(analyzer)
> config.setOpenMode(IndexWriterConfig.OpenMode.CREATE)
> store = SimpleFSDirectory(....)
> writer = IndexWriter(store, config)
> doc = Document()
> doc.add(Field("title", "value of testing",TextField.TYPE_NOT_STORED))
> writer.addDocument(doc)     -> calls incrementToken of MyTokenizer but I need 
> to grab the text source in order to create my tokens.....
>
> Thank you
>
> -- 
> Signature Marc Jeurissen | UAntwerpen
> Met vriendelijke groeten,
>
> Marc Jeurissen
>
> <http://anet.be>
> Bibliotheek UAntwerpen
> Stadscampus - S.A.085
> Prinsstraat 9 - 2000 Antwerpen
> marc.jeurissen@uantwerpen.be <mailto:marc.jeurissen@uantwerpen.be>
> T +32 3 265 49 71
> <http://anet.be>
>

Mime
View raw message