fleece-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Dev <hendrikde...@gmail.com>
Subject Re: JsonLocation.getStreamOffset() return value unclear
Date Thu, 24 Jul 2014 10:39:32 GMT
doing this efficiently is more complicated than i thought. Can we not
simply just count 2 bytes for one char ;-)

BTW, seem the JsonLocation column value leave also room for interpretation:

Is the most left column 0 or 1? Texteditors for example start with
column 1 (there is never a column 0) but RI starts with 0.

Regards
Hendrik


On Wed, Jul 23, 2014 at 1:49 PM, Hendrik Dev <hendrikdev22@gmail.com> wrote:
> agree, will make it so
>
> On Wed, Jul 23, 2014 at 1:28 PM, Romain Manni-Bucau
> <rmannibucau@gmail.com> wrote:
>> Hi
>>
>> I agree wording is wrong but IMO it is not ambiguous: we get an inputstream
>> or reader (and we *don't* want to check it is a file or not) so we just
>> count the chars or bytes we read. All other implementation would lead to
>> confusion IMO (make default text file reader compliant friendly).
>>
>> We can start this way and if we have issues go further but I really doubt
>> we need it.
>>
>> What's your opinion?
>>
>>
>>
>>
>> Romain Manni-Bucau
>> Twitter: @rmannibucau
>> Blog: http://rmannibucau.wordpress.com/
>> LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> Github: https://github.com/rmannibucau
>>
>>
>> 2014-07-23 13:21 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>>
>>> Hi,
>>>
>>> the JSR 353 API says about JsonLocation.getStreamOffset()
>>>
>>> "long getStreamOffset()
>>>
>>> Return the stream offset into the input source this location is
>>> pointing to. If the input source is a file or a byte stream then this
>>> is the byte offset into that stream, but if the input source is a
>>> character media then the offset is the character offset. Returns -1 if
>>> there is no offset available."
>>>
>>> There are IMHO two issues here:
>>>
>>> 1) How can we know that the input source is a file(stream)? We can
>>> only know if the parser  read from an Inputstream (=byte stream) or
>>> from an Reader (=character stream). Wording here is unclear/ambiguous.
>>>
>>> 2) Since a UTF8 or UTF16 character can map to one, two, three or four
>>> bytes the output can be very confusing (especially if the user don't
>>> know whether the parser was constructed form a byte or character
>>> stream and which charset is used).
>>>
>>> Seems that the RI is not implementing these distinctions, if i looked
>>> correctly they always return character offsets.
>>>
>>> So want we want do to?
>>>
>>> Thanks
>>> Hendrik
>>>
>>>
>>> --
>>> Hendrik Saly (salyh, hendrikdev22)
>>> @hendrikdev22
>>> PGP: 0x22D7F6EC
>>>
>
>
>
> --
> Hendrik Saly (salyh, hendrikdev22)
> @hendrikdev22
> PGP: 0x22D7F6EC



-- 
Hendrik Saly (salyh, hendrikdev22)
@hendrikdev22
PGP: 0x22D7F6EC

Mime
View raw message