hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1304) New client server implementation of how gets and puts are handled.
Date Sat, 16 May 2009 17:55:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710127#action_12710127
] 

Jonathan Gray commented on HBASE-1304:
--------------------------------------

Dropped some thoughts on IRC, figured I'd post here:

[10:42am] jgray2: dj_ryan: i don't think v7 patch contains changes to compactions yet... not
following your questions exactly but compactions need to be merged with scan code
[10:43am] jgray2: gets can be redone as scans
[10:43am] jgray2: and that's probably the direction we'll need to go
[10:43am] jgray2: if millions of columns in a single row
[10:44am] jgray2: you basically need to scan them, even within the row
[10:44am] jgray2: QueryMatcher makes the decision about what to do with a KV given the parameters
of the query
[10:45am] jgray2: the two complex bits of it are a DeleteTracker and the ColumnTracker
[10:45am] jgray2: two implementations of each
[10:46am] jgray2: ScanDT and GetDT are different because, right now, a Get is not a low-level
KV merge like a Scan is
[10:46am] jgray2: so when you're scanning (or compacting) you actually look at a Stores keys
in strict sorted order
[10:46am] jgray2: merging all storefiles + memcache
[10:46am] jgray2: so when tracking deletes
[10:46am] jgray2: you need to track very little
[10:47am] jgray2: in a Get, you grab all keys from each storefile, starting at memcache, then
going through them newest to oldest
[10:47am] jgray2: so deletes you read in one storefile will apply to any storefiles that are
older
[10:47am] jgray2: so GetDT is quite a bit more complex
[10:47am] jgray2: we need to benchmark and see if scans are gooder
[10:47am] jgray2: because they are much more "correct"
[10:47am] jgray2: if you do manual timestamp setting, gets can give you indeterminate results
[10:48am] jgray2: but scans are always strictly sorted
[10:48am] jgray2: ColumnTracker is implemented as either ExplicitCT or WildcardCT
[10:48am] jgray2: explicit is when qualifiers are given, wildcard if all in a family
[10:48am] jgray2: so it tracks that, and then max versions for each
[10:49am] jgray2: honestly i've not looked at compactions since i wrote scanners but have
had it in mind
[10:50am] jgray2: it will use QueryMatcher and CT/DT directly
[10:50am] jgray2: wildcardCT where maxVerisons = family setting
[10:50am] jgray2: ScanDT
[10:50am] jgray2: QueryMatcher already does TTL enforcement and such
[10:51am] jgray2: the only difference is in a minor compaction you still need to output deletes
[10:51am] jgray2: that are not fully enforced or overridden
[10:51am] jgray2: so then we'll probably have a CompactDT
[10:52am] jgray2: might need a slight modification here and there, i don't think QM is written
to ever permit deletes out to the result

> New client server implementation of how gets and puts are handled. 
> -------------------------------------------------------------------
>
>                 Key: HBASE-1304
>                 URL: https://issues.apache.org/jira/browse/HBASE-1304
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Erik Holstad
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1304-v1.patch, HBASE-1304-v2.patch, HBASE-1304-v3.patch, HBASE-1304-v4.patch,
HBASE-1304-v5.patch, HBASE-1304-v6.patch, HBASE-1304-v7.patch
>
>
> Creating an issue where the implementation of the new client and server will go. Leaving
HBASE-1249 as a discussion forum and will put code and patches here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message