hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
Date Wed, 03 Jul 2013 20:00:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699375#comment-13699375

stack commented on HBASE-6295:


the defaults in the code
hbase-defaults.xml in hbase-common (seems to be used when do the integration test with a cluster)
hbase-site.xml in hbase-server/test (seems to be used when you run the integration test with
a minicluster)
hbase-site.xml in hbase-client
hbase-site.xml in conf

Removing hadoop-default.xml is a radical notion.  hbase-default.xml used to be in conf for
all to view and adapt into an hbase-site.xml.  hbase-3090 moved it out of conf and into jar
so that new installs picked up new defaults.  This made hbase-default.xml content effectively
opaque unless you undid the jar or went to the refguide to read the doc. we generate from
it (See http://hbase.apache.org/book.html#hbase.site)  My guess is no one looks at the refguide.
 This would seem to rendor hbase-default.xml near useless?   Yet we have to maintain it. 
In the configuration code, we'll favor the hbase-default* setting over what we have in code.

If we remove it, then we'll only use what is in code.  Means we won't have list of configs.
in doc. w/ their descriptions.

We could generate a class from the hbase-default.xml src that wrote out a Constants java file
which had in it defines that we'd use as default whenever we did Configuration#getInt.  If
you added something to hbase-default.xml, you'd have to use a constant.  Would mean a script
run against the src that would fail if it found something in hbase-default.xml that had a
default in code that was not an upper-case constant?

The hbase-site.xml in conf is empty always.  Probably better named hbase-site.xml.template.

The other hbase-site.xmls are configs for the local tests.  Notion is that tests have shorter
timeouts and retries than what we ship as our defaults.  Do we want to reexamine this and
have the hbase defaults true for tests too?

Thanks Elliott and Nicolas for figuring this one out.

> Possible performance improvement in client batch operations: presplit and send in background
> --------------------------------------------------------------------------------------------
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>             Fix For: 0.98.0, 0.95.2
>         Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 6295.v14.patch,
6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch,
6295.v6.patch, 6295.v8.patch, 6295.v9.patch, hbase-ycsb-workloads Build time trend.png
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is enough data
for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be shared with
the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message