hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: toStringBinary output is painful
Date Tue, 14 Apr 2015 20:14:52 GMT
> I believe toStringBinary does all ascii if input ascii-only.

Right, but it will also mix ascii range characters with binary.

> Our output was in part shaped by ruby binary String representation. It was
> thought useful that you could copy from shell and find in UI, and
> vice-versa. I don't think many make use of this fact.

Interesting - didn't realize the motivation.

> We could change toStringBinary format. While rarely needed (now), probably
> best if the format allows us get back to the original bytes.

Absolutely - this would still be a requirement for us.  It should be a
reversible encoding.

> We could also work on hiding the uglyness from users since it hurts relying
> on region id instead and/or making smarter key choices... doing minimal
> necessary rather than full row key (we have the code already in our hfile
> index).

That would help with some cases, especially shell admin commands, but
I don't think is enough.

> Where is the pain in particular Dave?
> Using the shell or looking at UIs and logs?

It's a dull ache in many places which is why we have tolerated it for
years.  Region names and boundaries (especially ones auto generated
from splits) make the UI annoying to read and error prone when
copy/pasting the data around to other places that aren't as happy with
the whitespace or punctuation.  Region server logs.  Verify
replication logs.

> What if we hid the uglyness.
> toStringBinary will do ascii if all ascii

That was one option I suggested.  There may still be unusual cases of
binary keys that happen to be ascii, but they would be much rarer.
For us internally, I'd just do hex everywhere, but I understand if
enough people use ascii data encodings and find it helpful for those
to show up in the UI/logs.


View raw message