lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: ShingleAnalyzerWrapper in PyLucene
Date Sun, 29 Jan 2017 20:38:30 GMT

On Sun, 29 Jan 2017, marco turchi wrote:

> It is strange because I can see the attached files in the email I sent
> you...
>
> I attach again the Java code. In case it is not attached again, you can
> download from this link:
> https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
> the file is called CopyOfTest.Java

I didn't try to run your programs yet but one source of difference noticed 
is that in Python you do:
   analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 6, ' ', True, False, None)
and in Java you do:
   analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2, 4, " ", true, false,
null);

The numeric parameters are not the same: 2, 6 vs 2, 4.
Please use the same values in both versions and let us know if that solves 
the problem.
Thanks !

Andi..

>
> Thanks a lot!
> Marco
>
>
>
> On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <vajda@apache.org> wrote:
>
>>
>>> On Jan 29, 2017, at 03:50, marco turchi <marco.turchi@gmail.com> wrote:
>>>
>>> Dear Andi,
>>> please find in attachment the Java and the Python codes. Both of them,
>> create an index with two records using Shingle analyser and then query it
>> printing the query and the terms of the query.
>>
>> It looks like you attached only the python program, only one attachment.
>>
>> Andi..
>>
>>>
>>> Thanks a lot for your help
>>> Marco
>>>
>>>
>>>
>>>> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <vajda@apache.org> wrote:
>>>>
>>>> On Sat, 28 Jan 2017, marco turchi wrote:
>>>>
>>>>> Dear All,
>>>>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>>>>
>>>>> I have built the analyzer similar to Lucene:
>>>>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4, "
>> " ,
>>>>> True, False, None)
>>>>>
>>>>> and I have used it inside QuertParser
>>>>> query = QueryParser("source", self.analyzer).parse("welcome world is
>> at on")
>>>>>
>>>>> the output is:
>>>>> source:welcome source:world source:is source:at source:on
>>>>>
>>>>> I have run the same code in Java and the output is how I would expect
>> it:
>>>>> source:welcome source:welcome world source:welcome world is
>> source:welcome
>>>>> world is at source:world source:world is source:world is at
>> source:world is
>>>>> at on source:is content:is at source:is at on source:at source:at on
>>>>> source:on
>>>>>
>>>>> Do you have any ideas in what I'm doing wrong in PyLucene?
>>>>
>>>> Please, help me help you by including two simple programs that I can
>> run to reproduce the problem. One in Java producing the output you expect,
>> one in Python producing the output you're reporting.
>>>>
>>>> Thanks !
>>>>
>>>> Andi..
>>>>
>>>>
>>>>>
>>>>> Thanks a lot in advance for your help
>>>>> Marco
>>>>>
>>>
>>> <TestShingle.py>
>>
>

Mime
View raw message