hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: toStringBinary output is painful
Date Tue, 14 Apr 2015 18:31:58 GMT
On Mon, Apr 13, 2015 at 3:16 PM, Dave Latham <latham@davelink.net> wrote:

> Wish I had started this conversation 5 years ago...
> When we're using binary data, especially in row keys (and therefore
> region boundaries) the output created by toStringBinary is very
> painful to use:
>  - mix of ascii / hex representation is trouble
>  - it's quite long (4 characters per binary byte)
>  - it mixes in punctuation characters (quotes are not good when
> pasting in to shell commands)
> Does this bother anyone else?  Is it too late to do anything about it
> due to compatibility fears?

> Can we change to something that is always binary e.g. hex?
> 0x746f537472696e6742696e617279206d616b657320636c75737465727320637279
> Or if we have to auto-detect, only use ascii if the entire string is ascii?

I believe toStringBinary does all ascii if input ascii-only.

Our output was in part shaped by ruby binary String representation. It was
thought useful that you could copy from shell and find in UI, and
vice-versa. I don't think many make use of this fact.

We could change toStringBinary format. While rarely needed (now), probably
best if the format allows us get back to the original bytes.

We could also work on hiding the uglyness from users since it hurts relying
on region id instead and/or making smarter key choices... doing minimal
necessary rather than full row key (we have the code already in our hfile


Where is the pain in particular Dave?

Using the shell or looking at UIs and logs?

What if we hid the uglyness.
toStringBinary will do ascii if all ascii

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message