hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7978) Merge hbase-prefixtree into hbase-server
Date Sun, 03 Mar 2013 00:31:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591577#comment-13591577
] 

Matt Corgan commented on HBASE-7978:
------------------------------------

{quote}but -1 for having a module per codec type{quote}Seems like that is the consensus -
makes sense to me.  I don't like having tons of separate projects in eclipse either.

I'm not familiar with Hadoop code base, but if they have a module per package that sounds
weird to me.  The point of a module is not to just be a heavyweight package of some sort.
 A module should only be considered when there is an opportunity to limit the compile time
visibility of one module into another.  I've left some comments on a bunch of other jira's
about that so won't reiterate here.  Unfortunately, in Eclipse every class in a single project
can access any other class in the project.  There's no way to limit the interdependencies
by tweaking the src directories, build path, libs, etc.  The only way to isolate visibility
from one class to another is to put them in separate projects, which are the maven modules.

Elliott can probably speak to it best, but even though each developer thinks he's putting
a constant in the right place, or it's ok to reach over and call a method from one place to
another, after dozens of developers contribute stuff for years you end up with a big bowl
of spaghetti.  Using public/private/protected keywords, interfaces, packages and all that
good stuff will get you pretty far, but modules take you a step further and make it impossible
for anyone to cheat the above mechanisms, despite their best intentions at the time.

I would argue that carving out complex pieces of hbase into modules is critical to growing
the code base and the number of developers.  Take the memstore for example - it should eventually
be a very sophisticated piece of machinery that's 10x as efficient as it is today, and while
the interface to it has only a few methods relating to Cells and such, the code to implement
it will be pretty low-level and fragile.  The hbase-server module should not be able to reach
into the inner workings of the memstore, nor should it care how it's implemented.  It should
just request that someone give it an implementation to back up the memstore interface from
hbase-common.  Pulling that memstore code into a module is a perfect way to enforce those
principles.  Further, it's important that the correctness of the memstore be established without
relying on grander tests in hbase-server.  Tests that prove memstore correctness should be
in the memstore module.

Having modules pulled out and strictly isolated like that speeds up development too.  Now
someone working on the hbase-server module can work a little faster knowing that there's no
way they're going to break the memstore.  I'd say the hbase-server code is more readable too
because memstore implementation details aren't there anymore.

Anyway, just some arguments for modules improving understandability, testability, and development
speed.  It all boils down to reducing compile-time visibility between modules to a minimum.

{quote}The question here is whether to rename the current module to hbase-codec and have the
rest there? How close are we do extract the DBE as generic codecs?{quote}Renaming hbase-prefix-tree
to hbase-codec sounds great to me.  I think pulling the delta encoder's core functionality
out into hbase-codec while leaving the hfile interaction stuff in hbase-server should be doable,
and doing so will have the added benefit of getting the encoders more ready for RPC usage.
                
> Merge hbase-prefixtree into hbase-server
> ----------------------------------------
>
>                 Key: HBASE-7978
>                 URL: https://issues.apache.org/jira/browse/HBASE-7978
>             Project: HBase
>          Issue Type: Improvement
>          Components: HFile
>    Affects Versions: 0.95.0, 0.98.0
>            Reporter: Enis Soztutar
>
> I would like to discuss the possibility of merging the prefix tree module into the hbase-server
module. 
> Ideally, I think we should have hbase-mapreduce and hbase-storage modules, the latter
one containing most of HFile code. hbase-mapreduce depends on hbase-storage so that it knows
how to encode hfiles. prefix-tree belongs to hbase-storage. 
> prefix tree is just another DBE, although a big one, and it rightfully belongs with her
sisters. The fact that the code is independent from the rest of the code base does not mean
that it should have it's own module. We should keep the number of modules manageable, and
stay away from hadoop trunk's one-module-per-package policy. 
> Related: HBASE-7936

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message