lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From marco turchi <marco.tur...@gmail.com>
Subject Re: ShingleAnalyzerWrapper in PyLucene
Date Sun, 29 Jan 2017 20:45:09 GMT
Hi Andi,
while I was changing the parameter value, I have noticed another problem. I
have fixed it and it works.

Thanks a lot and sorry for bothering you!
Marco

On Sun, Jan 29, 2017 at 9:38 PM, Andi Vajda <vajda@apache.org> wrote:

>
> On Sun, 29 Jan 2017, marco turchi wrote:
>
> It is strange because I can see the attached files in the email I sent
>> you...
>>
>> I attach again the Java code. In case it is not attached again, you can
>> download from this link:
>> https://www.dropbox.com/s/o7ocygrdv8dqksl/CopyOfTest.java?dl=0
>> the file is called CopyOfTest.Java
>>
>
> I didn't try to run your programs yet but one source of difference noticed
> is that in Python you do:
>   analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 6, ' ',
> True, False, None)
> and in Java you do:
>   analyzer = new ShingleAnalyzerWrapper(new WhitespaceAnalyzer(), 2, 4, "
> ", true, false, null);
>
> The numeric parameters are not the same: 2, 6 vs 2, 4.
> Please use the same values in both versions and let us know if that solves
> the problem.
>
> Thanks !
>
> Andi..
>
>
>> Thanks a lot!
>> Marco
>>
>>
>>
>> On Sun, Jan 29, 2017 at 7:14 PM, Andi Vajda <vajda@apache.org> wrote:
>>
>>
>>> On Jan 29, 2017, at 03:50, marco turchi <marco.turchi@gmail.com> wrote:
>>>>
>>>> Dear Andi,
>>>> please find in attachment the Java and the Python codes. Both of them,
>>>>
>>> create an index with two records using Shingle analyser and then query it
>>> printing the query and the terms of the query.
>>>
>>> It looks like you attached only the python program, only one attachment.
>>>
>>> Andi..
>>>
>>>
>>>> Thanks a lot for your help
>>>> Marco
>>>>
>>>>
>>>>
>>>> On Sun, Jan 29, 2017 at 3:10 AM, Andi Vajda <vajda@apache.org> wrote:
>>>>>
>>>>> On Sat, 28 Jan 2017, marco turchi wrote:
>>>>>
>>>>> Dear All,
>>>>>> I need to use the ShingleAnalyzerWrapper in PyLucene.
>>>>>>
>>>>>> I have built the analyzer similar to Lucene:
>>>>>> self.analyzer = ShingleAnalyzerWrapper(WhitespaceAnalyzer(), 2, 4,
"
>>>>>>
>>>>> " ,
>>>
>>>> True, False, None)
>>>>>>
>>>>>> and I have used it inside QuertParser
>>>>>> query = QueryParser("source", self.analyzer).parse("welcome world
is
>>>>>>
>>>>> at on")
>>>
>>>>
>>>>>> the output is:
>>>>>> source:welcome source:world source:is source:at source:on
>>>>>>
>>>>>> I have run the same code in Java and the output is how I would expect
>>>>>>
>>>>> it:
>>>
>>>> source:welcome source:welcome world source:welcome world is
>>>>>>
>>>>> source:welcome
>>>
>>>> world is at source:world source:world is source:world is at
>>>>>>
>>>>> source:world is
>>>
>>>> at on source:is content:is at source:is at on source:at source:at on
>>>>>> source:on
>>>>>>
>>>>>> Do you have any ideas in what I'm doing wrong in PyLucene?
>>>>>>
>>>>>
>>>>> Please, help me help you by including two simple programs that I can
>>>>>
>>>> run to reproduce the problem. One in Java producing the output you
>>> expect,
>>> one in Python producing the output you're reporting.
>>>
>>>>
>>>>> Thanks !
>>>>>
>>>>> Andi..
>>>>>
>>>>>
>>>>>
>>>>>> Thanks a lot in advance for your help
>>>>>> Marco
>>>>>>
>>>>>>
>>>> <TestShingle.py>
>>>>
>>>
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message