lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: pass compressed string
Date Fri, 25 Feb 2011 15:30:45 GMT

On Feb 25, 2011, at 5:57, Roman Chyla <roman.chyla@gmail.com> wrote:

> Hi Andi,
>
> Thanks, the JArray_byte() does what I needed - I was (wrongly) passing
> bytestring (which I think got automatically converted to unicode) and
> trying to get bytes of that string was not correct.
>
> Though it would be interesting to find out if it is possible to pass
> string and get the bytes in java,

A Java String is not made of bytes but 16-bit unicode chars. If I  
remember correctly, the String.getBytes() method is deprecated in Java  
because of encoding issues. Whenever a Python string (type str, made  
of bytes) is passed to Java, it is assumed to be encoded utf-8 and  
converted to 16-bit unicode on the fly.

Andi..

> I don't know if what conversion
> happening on the jni side, or only in java - i shall do some reading
>
> Example in python:
>
> In [4]: s = zlib.compress("python")
>
> In [5]: repr(s)
> Out[5]: "'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'"
>
> In [6]: lucene.JArray_byte(s)
> Out[6]: JArray<byte>(120, -100, 43, -88, 44, -55, -56, -49, 3, 0, 9,  
> 87, 2, -93)
>
> The same thing in Jython:
>
>>>> s = zlib.compress("python")
>>>> s
> 'x\x9c+\xa8,\xc9\xc8\xcf\x03\x00\tW\x02\xa3'
>>>> repr(s)
> "'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'"
>>>> String(s).getBytes()
> array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120,
> -61, -113, 3, 0, 9, 87, 2, -62, -93])
>>>> String(s).getBytes('utf8')
> array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120,
> -61, -113, 3, 0, 9, 87, 2, -62, -93])
>>>> String(s).getBytes('utf16')
> array('b', [-2, -1, 0, 120, 0, -100, 0, 43, 0, -88, 0, 44, 0, -55, 0,
> -56, 0, -49, 0, 3, 0, 0, 0, 9, 0, 87, 0, 2, 0, -93])
>>>> String(s).getBytes('ascii')
> array('b', [120, 63, 43, 63, 44, 63, 63, 63, 3, 0, 9, 87, 2, 63])
>
>
>
>
> Roman
>
> On Thu, Feb 24, 2011 at 3:42 AM, Andi Vajda <vajda@apache.org> wrote:
>>
>> On Thu, 24 Feb 2011, Roman Chyla wrote:
>>
>>> I would like to transfer results from python to java:
>>>
>>> hello = zlib.compress("hello")
>>>
>>> on the java side do:
>>>
>>> byte[] data = string.getBytes()
>>>
>>> But I am not successful. Is there any translation going on  
>>> somewhere?
>>
>> Can you be more specific ?
>> Actual lines of code, errors, expected results, actual results...
>>
>> An array of bytes in JCC is not created with a string but a
>> JArray('byte')(len or str)
>>
>>  >>> import lucene
>>  >>> lucene.initVM()
>>  <jcc.JCCEnv object at 0x1004100d8>
>>  >>> lucene.JArray('byte')(10)
>>  JArray<byte>(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
>>  >>> lucene.JArray('byte')("abcd")
>>  JArray<byte>(97, 98, 99, 100)
>>  >>>
>>
>> Andi..
>>

Mime
View raw message