cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10189) Unify read/writeUTF code paths
Date Thu, 27 Aug 2015 16:55:46 GMT


Ariel Weisberg commented on CASSANDRA-10189:

bq. The problem with that suggestion is that we cannot know upfront how large it really is
exactly, without buffering more than max buffer size bytes. So we can't avoid the possibility
of having to resize it at the end, 
Is that really true? We know that every UTF-8 character decodes to at most 2 UTF-16 chars.
So a 4k buffer would be able to save an allocation for all strings less than 1k characters.
That should cover quite a few strings I would think and it's a modest memory commitment.

How should we avoid allocations for ASCII on the read side? We don't know if it's ASCII without
doing a pass through. If we are going to do a second pass through why not do it as a memory
copy which should be much faster.

My thinking is give it a shot and we don't like the result don't include it. Another factor
is that if there were a way to get a char view into a byte array we could use the same thread
local for reading and writing and space wouldn't be an issue. I can sort of see how you could
do it with unsafe, but I wonder how well it would compile in practice.

> Unify read/writeUTF code paths
> ------------------------------
>                 Key: CASSANDRA-10189
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
> (Follow-up to CASSANDRA-9738 and CASSANDRA-8670)
> CASSANDRA-9738 requires {{writeUTF}} functionality that has been improved in CASSANDRA-8670
plus {{readUTF}} functionality. But we need slightly different signatures - one taking {{DataInput}}/{{DataOutput}}
and one taking {{ByteBuffer}}.
> We can combine both code paths and benefit from a shared, thread-local byte buffer.
> Slightly different implementations are needed for array backed and direct BBs (as we
can directly access the backing array bypassing the direct BB's boundary checks).
> (Part of this has already been done for CASSANDRA-9738 in {{OHCKeyCache.SerializationUtil}})

This message was sent by Atlassian JIRA

View raw message