lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie <charlie...@gmail.com>
Subject Re[2]: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)
Date Wed, 26 Apr 2006 17:21:23 GMT
ok, thanks for your reply.

But I thought
Method: public void writeVInt(int i)
is not about UTF-8, it is about how to write an int in variable length.
Is it included as a part of future unicode character writing?

-- 
Best regards,
 Charlie


---

>> I thought
>>
>>   (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)
>>
>> As (byte) is able to truncate the last byte for us already, no need of
>> (& 0x7f). If so, we may change that line to
>>
>>    writeByte((byte)(i | 0x80));
>>
>> and may speed up a little bit. Correct me if (i & 0x7f) is necessary.
>> Thank you.

> I wouldn't bother optimizing these methods... I think they will be
> changed in the future anyway.
> 1) The current code outputs modified-UTF-8 instead of true UTF-8
> 2) I think we may be going to byte-oriented counts for length (away
> from number of java chars, which are variable-length with the latest
> unicode standards)

> Marvin Humphrey has done the first, and seems close to finishing #2.

> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg02109.html
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg02468.html
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg03801.html

> -Yonik
> http://incubator.apache.org/solr Solr, the open-source Lucene search server




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message