hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-901) Add a limit to key length, check key and value length on client side.
Date Fri, 26 Sep 2008 16:31:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634923#action_12634923
] 

Jean-Daniel Cryans commented on HBASE-901:
------------------------------------------

We have a problem with HTD that needs a meta scan at each invocation. See this conversation
I had with stack:


<jdcryans> st^ack: the HTD  we get in HTable is obtained though a meta scan
<jdcryans> validating a batch of rows would be shitty
<st^ack> and makes no sense caching the HTD
<jdcryans> I think we should cache it, while validating columns, if we get a null HCD
we try to refresh once
<st^ack> just getting it would be expensive
<jdcryans> get it while you build HTable
<st^ack> And if it gets changed by another client instance?
<jdcryans> that's what I said, cache only while validating rows
<jdcryans> I mean for this reason only
<jdcryans> If asked directly on HTable, do the scan
<st^ack> But what if we are not batching... doing one row at a time?
<st^ack> Do we get HTD each time?
<jdcryans> I would still leave it as class property
<st^ack> So, on construction of HTable, you'd get the HTD.
<st^ack> You'd keep using this same HTD during life of the HTable instance?
<jdcryans> unless you get a null HCD while validation
<jdcryans> at that moment you refresh the cache
<jdcryans> once so if you get it another time it's because it's not a family
<jdcryans> this would be a very optimistic caching
<st^ack> Why a null HCD, because a new one was added (or removed)
<jdcryans> public HColumnDescriptor getFamily(final byte [] column) {
<jdcryans>     return this.families.get(HStoreKey.getFamilyMapKey(column));
<jdcryans>   }
<jdcryans> if the get returns nothing
<st^ack> I think caching HCD in client is going to burn us in many interesting ways
<st^ack> Users on list will be showing up w/ questions about why changed attributes
are not being picked up in clients
<jdcryans> I repeat, the cached HCD would only be used for commits validation
<jdcryans> HTD
<st^ack> Since you have to disable to change HCD and HTD, if we could figure someway
of sending signal to clients when table is reeanbled.. .that'd help
<st^ack> size of cell, right
<st^ack> What if user changes size of cell in table schema
<st^ack> client won't see it
<jdcryans> ok, well we refresh it upon any error of that kind
<st^ack> well, if cell size is set down, the server will throw exception if client is
sending over cells too big...
<st^ack> but if cell size is made biggger, client will continue to send cells that are
too small if operating with an old version of HTD
st^ack by the way, thanks for figuring the HTD is made by scanning meta.. imagine if we'd
committed the patch w/o knowing this
<jdcryans> if cell size set down, we'd get an exception client side, refresh the cached
HTD, then it would be good
<jdcryans> but yeah, the other situation is a problem
<st^ack> What if we added an offline/onlining exception
<st^ack> It'd be like NSRE
<st^ack> When client sees it, it 'recalibrates'
<st^ack> Refreshes its cache of HTD
<st^ack> Clients would need to send identifiers though I suppose
<jdcryans> would only happen if client tries to do stuff while it's offline
<st^ack> jdcryans: HRS could keep a map of the client ids its currently talking to
<jdcryans> I know what we need... ZK
<jdcryans> it should manage that kind of stuff
st^ack thinking... if a big table, we'd be sending exception per HRS ... that'd be silly
<st^ack> How you think it would work here?  ZK lookups are cheap so just go get the
table schema on a period or before big batch?
<jdcryans> something like that
<st^ack> Ok.
<jdcryans> or it would keep track of table offlining/enabling
<st^ack> Sounds million times better than my dumbass suggestion
<jdcryans> well there is a reason why that kind of software exists
<jdcryans> because distributed computing is a zoo
<st^ack> Whats that mean for batching?  No batching till ZK?
<jdcryans> or be very careful when batching
<st^ack> k
<jdcryans> we can still check some stuff like row key length
<jdcryans> if columns are well formed


> Add a limit to key length, check key and value length on client side.
> ---------------------------------------------------------------------
>
>                 Key: HBASE-901
>                 URL: https://issues.apache.org/jira/browse/HBASE-901
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Jim Kellerman
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Currently there is no limit on key length and there should be. It should be a parameter
in HTableDescriptor since the row key length needs to be considered in addition to the column
key.
> It should be trivial to add, since HTD can be upgraded without requiring a migration.
> Checking of the key length (and the value length) should be done on the client side as
it will fail early rather than once the request is sent to the server.
> This means that a BatchUpdate needs a reference to either the HTable or to the HTD. It
can be a transient reference so that the HTable (or HTD) need not be serialized/deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message