hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: HBase Developer's Pow-wow.
Date Mon, 10 Sep 2012 19:13:36 GMT
Can indexing be boiled down to these questions to start?

1) Per-region or Per-table
2) Sync or Async
3) Client-managed or Server-managed
4) Schema or Schema-less


- Per-region: the index entries are stored on the same machine as the
primary rows
- Per-table: each index is stored in a separate table, requiring
cross-server consistency

- Sync: the client blocks until all index entries exist
- Async: the client returns when the primary row has been inserted, but
indexes are guaranteed to be created eventually

- Client-managed: client pushes index entries directly to regions, possibly
utilizing some server-side locks or id generators
- Server-managed: client pushes index entries to the same server as the
primary row, letting the server push the index entries on to the
destination regions

- Schema: (complex to even define) client and/or server have info about
column names, value formats, etc.  (Taking this route opens a world of
follow-on questions)
- Schema-less: client provides the index entries which are rows with opaque
row/family/qualifier/timestamp like in normal hbase

Personal opinions:

All of my use-cases would require Per-table indexes.  Per-region is easier
to keep consistent at write-time, but is seems useless to me for the large
tables that hbase is designed for (because you have to hit every region for
each read).

I think Synchronous writes is important for high-consistency (OLTP style)
uses cases while Async is important for high-throughput (OLAP style).  I'd
say sync is a more desirable feature because it's easier to roll your own
async.  I would love to see the difference reduced to a per-index-entry
flag on the Put object.

Client-managed vs Server-managed isn't tremendously important.
 Client-managed seems admirable for the sync case, but server-managed is
better for async.  Therefore, probably better to keep the api simple and do
server-managed for both cases with a flag for sync/async.

The notion of adding a schema to hbase for secondary indexing scares me a
little.  Many of us already have ORM-type layers above hbase that do all
sorts of custom serializations.  It would be more flexible to let the
client generate abritrary index entries and ship them to the server inside
the Put object.

Anyway - my abbreviated 2 cents on a big topic.

On Mon, Sep 10, 2012 at 11:09 AM, Andrew Purtell <apurtell@apache.org>wrote:

> On Mon, Sep 10, 2012 at 12:03 AM, Jacques <whshub@gmail.com> wrote:
> >    - How important is indexing column qualifiers themselves (similar to
> >    Cassandra where people frequently utilize column qualifiers as
> "values"
> >    with no actual values stored)?
> It would be good to have a secondary indexing option that can build an
> index from some transform of family+qualifier.
> >    - In general it seems like there is tension between the main low level
> >    approaches of (1) leverage as much HBase infrastructure as possible
> (e.g.
> >    secondary tables) and (2) leverage an efficient indexing library e.g.
> >    Lucene.
> Regarding option #2, Jason Rutherglen's experiences may be of
> interest: https://issues.apache.org/jira/browse/HBASE-3529 . The new
> Codec and CodecProvider classes of Lucene 4 could conceivably support
> storage of postings in HBase proper now
> (http://wiki.apache.org/lucene-java/FlexibleIndexing) so HDFS hacks
> for bringing indexes local for mmapping may not be necessary, though
> this is a huge hand-wave.
> The remainder of your mail is focused on option #1, I have no comment
> to add there, lots of food for thought.
> > *
> > *
> > *Approach Thoughts*
> > Trying to leverage HBase as much as possible is hard if we want to
> utilize
> > the approach above and have consistent indexing.  However, I think we can
> > do it if we add support for what I will call a "local shadow family".
> >  These are additional, internally managed families for a table.  However,
> > they have the special characteristic that they belong to the region
> despite
> > their primary keys being outside the range of the region's.  Otherwise
> they
> > look like a typical family.  On splits, they are regenerated (somehow).
>  If
> > we take advantage of Lars'
> > HBASE-5229<https://issues.apache.org/jira/browse/HBASE-5229>,
> > we then have the opportunity to consistently insert one or more rows into
> > these local shadow families for the purpose of secondary indexing. The
> > structure of these secondary families could use row keys as the indexed
> > values, qualifiers for specific store files and the value of each being a
> > list of originating keys (using read-append or
> > HBASE-5993<https://issues.apache.org/jira/browse/HBASE-5993>).
> >  By leveraging the existing family infrastructure, we get things like
> > optional in-memory indexes and basic scanners for free and don't have to
> > swallow a big chunk of external indexing code.
> >
> > The simplest approach for integration of these for queries would be
> > internally be a  ScannerBasedFilter (a filter that is based on a scanner)
> > and a GroupingScanner (a Scanner that does intersection and/or union of
> > scanners for multi criteria queries).  Implementation of these scanners
> > could happen at one of two levels:
> >
> >    - StoreScanner level: A more efficient approach using the store file
> >    qualifier approach above (this allows easier maintenance of index
> >    deletions)
> >    - RegionScanner level: A simpler implementation with less violation of
> >    existing encapsulation.  We'd store row keys in qualifiers instead of
> >    values to ensure ordering that works iteratively with RegionScanner.
>  The
> >    weaknesses of this approach are less efficient scanning and figuring
> out
> >    how to manage primary value deletes.
> >
> > In general, the best way to deal with deletes is probably to age them out
> > per storefile and just filter "near misses" as a secondary filter that
> > works with ScannerBasedFilter.  The client side would be TBD but would
> > probably offer some kind of criteria filters that on server side had all
> > the lower level ramifications.
> >
> > *Future Optimizations*
> > In a perfect world, we'd actually use StoreFile block start locations as
> > the index pointer values in the secondary families.  This would make
> things
> > much more compact and efficient.  Especially if we used a smarter block
> > codec that took advantage of this nature.  However, this requires quite a
> > bit more work since we'd need to actually use the primary keys in the
> > secondary memstore and then "patch" the values to block locations as we
> > flushed the primary family that we were indexing (ugh).
> >
> > Assuming that the primary limiter of peak write throughput for HBase is
> > typically WAL writing and since indexes have no "real" data, we could
> > consider disabling WAL for local shadow families and simply regenerate
> this
> > data upon primary WAL playback.  I haven't spent enough time in that code
> > to know what kind of consistency pain this would cause  (my intuition is
> it
> > would be fine as long as we didn't fix
> > HBASE-3149<https://issues.apache.org/jira/browse/HBASE-3149>).
> > If consistency isn't a problem, this would be a nice option since it
> means
> > that indexing would have minimal impact on peak write throughput.
> >
> > *I haven't thought at all about...*
> >
> >    - How/whether this makes sense to be implemented as a coprocessor.
> >    - Weird timestamp impacts/considerations here.
> >    - Version handling/impacts.
> Best regards,
>    - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message