hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evert Arckens (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3247) Changes API: API for pulling edits from HBase
Date Thu, 25 Nov 2010 10:37:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935712#action_12935712

Evert Arckens commented on HBASE-3247:

With the Rowlog you can register a subscription and then all messages that are put on the
rowlog will be kept for that subscription. If you then also register a listener (cfr RowLogMessageListener)
on that subscription, the rowlog processor will start feeding the messages to the listener.
If you can make a bulk load that only processes data that was changed before a certain point
in time, you can let that run and in the meanwhile let the rowlog record all changes that
are done after that point.

Looking a bit further at how the Indexer in Lily uses the rowlog (http://docs.outerthought.org/lily-docs-current/415-lily.html)
When the indexer recieves a message it will use the record's current data and put that data
in the index (IndexUpdater is the listener that is registered on the rowlog).
An index rebuild will use map reduce to go over all the data again and update the index.
It is allowed for both the bulk index rebuild and the index updater through the rowlog to
run in parallel. Both will look at the current data of the record and put that in the index.
So there is no need for a transition point from bulk to incremental.
The indexer is written specifically to put Lily records into a Solr index. It is not designed
yet to plug-in another index. But it should be do-able to use this same framework to have
something non-Lily on the one hand and a non-Solr index on the other. If we look at the classes
in the framework : the IndexUpdater is the implementation of the RowLogMessageListener which
has knowledge about lily-records and decides 'what' to index. The Indexer class is responsible
for mapping the Lily-schema onto the Solr-schema and maintains the communication with Solr.

> Changes API: API for pulling edits from HBase
> ---------------------------------------------
>                 Key: HBASE-3247
>                 URL: https://issues.apache.org/jira/browse/HBASE-3247
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
> Talking to Shay from Elastic Search, he was asking where the Changes API is in HBase.
 Talking more -- there was a bit of beer involved so apologize up front -- he wants to be
able to bootstrap an index and thereafter ask HBase for changes since time t.  We thought
he could tie into the replication stream, but rather he wants to be able to pull rather than
have it pushed to him (in case he crashes, etc. so on recovery he can start pulling again
from last good edit received).  He could do the bootstrap with a Scan.  Thereafter, requests
to pull from hbase would pass a marker of some  sort.  HBase would then give out edits that
came in after this marker, in batches, along with an updated marker.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message