hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5993) Add a no-read Append
Date Wed, 30 May 2012 15:51:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285764#comment-13285764
] 

Jacques commented on HBASE-5993:
--------------------------------

The reason this can make sense is data overhead.  In a case where we are capturing a large
number of small values, the KeyValue overhead is substantial.  The original use case is one
where I'm adding to a list of documents that contain a certain term (search index).  Let's
say that each document number is a four byte int.  Right now there are two options: use the
existing append which means one will become swamped with reads as the cell value grows over
time (this would also wreak havoc on memstore flushes as the cell value become megabytes in
size and we're just adding another four bytes once a day).  On the flipside, using separate
columns creates a substantial amount of overhead for each value added.  This utility of this
functionality also extends to situations where people are capturing a large sequence of small
values in monitoring applications.  (Sematext are basically trying to create this functionality
already with their HBaseHUT work.)  

Yes, an additional KeyValue.Type is needed.  When this type is read, the return functionality
goes and get all the appended values (and the last full value) and then combines them on return.
 As compactions are done, the complete merged values are created.  

I'm swamped right now but am going to try to write up a short design doc in the next couple
of weeks and get you guys to review my approach since this will have to touch a number of
components.  I also need to make sure to manage edge cases like what happens if you do a no-read
append and no existing value exists (probably ok--even though read back performance will be
poor).  


                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would be helpful
to be able to do appends that don't actually require a read of the existing value.  This would
be useful in building a growing set of values.  Our original use case was for implementing
a form of search in HBase where a cell would contain a list of document ids associated with
a particular keyword for search.  However it seems like it would also be useful to provide
substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to leverage the existing
Append api.  If the Append is marked as setReturnResults(false), use this code path.  If result
return is requested, use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message