Return-Path: Delivered-To: apmail-lucene-pylucene-dev-archive@minotaur.apache.org Received: (qmail 80775 invoked from network); 25 Feb 2011 15:31:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Feb 2011 15:31:09 -0000 Received: (qmail 87549 invoked by uid 500); 25 Feb 2011 15:31:09 -0000 Delivered-To: apmail-lucene-pylucene-dev-archive@lucene.apache.org Received: (qmail 87460 invoked by uid 500); 25 Feb 2011 15:31:07 -0000 Mailing-List: contact pylucene-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pylucene-dev@lucene.apache.org Delivered-To: mailing list pylucene-dev@lucene.apache.org Received: (qmail 87302 invoked by uid 99); 25 Feb 2011 15:31:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Feb 2011 15:31:06 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [173.228.80.32] (HELO ovaltofu.org) (173.228.80.32) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Feb 2011 15:30:57 +0000 Received: from [192.168.0.101] (modemcable097.116-201-24.mc.videotron.ca [24.201.116.97]) (authenticated bits=0) by ovaltofu.org (8.14.4/8.14.4) with ESMTP id p1PFUPlU000022 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Fri, 25 Feb 2011 07:30:33 -0800 (PST) Message-Id: From: Andi Vajda To: "pylucene-dev@lucene.apache.org" In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Mailer: iPod Mail (7E18) Mime-Version: 1.0 (iPod Mail 7E18) Subject: Re: pass compressed string Date: Fri, 25 Feb 2011 10:30:45 -0500 References: X-Virus-Checked: Checked by ClamAV on apache.org On Feb 25, 2011, at 5:57, Roman Chyla wrote: > Hi Andi, > > Thanks, the JArray_byte() does what I needed - I was (wrongly) passing > bytestring (which I think got automatically converted to unicode) and > trying to get bytes of that string was not correct. > > Though it would be interesting to find out if it is possible to pass > string and get the bytes in java, A Java String is not made of bytes but 16-bit unicode chars. If I remember correctly, the String.getBytes() method is deprecated in Java because of encoding issues. Whenever a Python string (type str, made of bytes) is passed to Java, it is assumed to be encoded utf-8 and converted to 16-bit unicode on the fly. Andi.. > I don't know if what conversion > happening on the jni side, or only in java - i shall do some reading > > Example in python: > > In [4]: s = zlib.compress("python") > > In [5]: repr(s) > Out[5]: "'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'" > > In [6]: lucene.JArray_byte(s) > Out[6]: JArray(120, -100, 43, -88, 44, -55, -56, -49, 3, 0, 9, > 87, 2, -93) > > The same thing in Jython: > >>>> s = zlib.compress("python") >>>> s > 'x\x9c+\xa8,\xc9\xc8\xcf\x03\x00\tW\x02\xa3' >>>> repr(s) > "'x\\x9c+\\xa8,\\xc9\\xc8\\xcf\\x03\\x00\\tW\\x02\\xa3'" >>>> String(s).getBytes() > array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120, > -61, -113, 3, 0, 9, 87, 2, -62, -93]) >>>> String(s).getBytes('utf8') > array('b', [120, -62, -100, 43, -62, -88, 44, -61, -119, -61, -120, > -61, -113, 3, 0, 9, 87, 2, -62, -93]) >>>> String(s).getBytes('utf16') > array('b', [-2, -1, 0, 120, 0, -100, 0, 43, 0, -88, 0, 44, 0, -55, 0, > -56, 0, -49, 0, 3, 0, 0, 0, 9, 0, 87, 0, 2, 0, -93]) >>>> String(s).getBytes('ascii') > array('b', [120, 63, 43, 63, 44, 63, 63, 63, 3, 0, 9, 87, 2, 63]) > > > > > Roman > > On Thu, Feb 24, 2011 at 3:42 AM, Andi Vajda wrote: >> >> On Thu, 24 Feb 2011, Roman Chyla wrote: >> >>> I would like to transfer results from python to java: >>> >>> hello = zlib.compress("hello") >>> >>> on the java side do: >>> >>> byte[] data = string.getBytes() >>> >>> But I am not successful. Is there any translation going on >>> somewhere? >> >> Can you be more specific ? >> Actual lines of code, errors, expected results, actual results... >> >> An array of bytes in JCC is not created with a string but a >> JArray('byte')(len or str) >> >> >>> import lucene >> >>> lucene.initVM() >> >> >>> lucene.JArray('byte')(10) >> JArray(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) >> >>> lucene.JArray('byte')("abcd") >> JArray(97, 98, 99, 100) >> >>> >> >> Andi.. >>