gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Nishikawa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (GORA-413) Support creation of dynamic columns within Gora datastore mapping designs
Date Mon, 02 Mar 2015 00:52:04 GMT

    [ https://issues.apache.org/jira/browse/GORA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342571#comment-14342571
] 

Alfonso Nishikawa edited comment on GORA-413 at 3/2/15 12:51 AM:
-----------------------------------------------------------------

Hi, Lewis.
In order to dynamically create columns in a column family in HBase what has to be done is
create a Map, like [the metadata in Nutch|http://svn.apache.org/viewvc/nutch/tags/release-2.3/src/gora/webpage.avsc?view=markup#l259].
In [gora-hbase-mapping.xml|svn.apache.org/viewvc/nutch/tags/release-2.3/conf/gora-hbase-mapping.xml?view=markup#l82]
you can see that for metadata is only defined the 'family' but not the 'qualifier'. Each key
of the Map (metadata, for example) is one column name in that column family, so the columns
grow when you add new keys to the map in the entity.
Just write your timestamps as Strings and they will be persisted as column name.

One thing to note (I once tried in 0.2.1) is that you can not define static columns using
one column family, and the dynamic ones using the same column family.


was (Author: alfonso.nishikawa):
Hi, Lewis.
In order to dynamically create columns in a column family in HBase what has to be done is
create a Map, like [the metadata in Nutch|http://svn.apache.org/viewvc/nutch/tags/release-2.3/src/gora/webpage.avsc?view=markup#l259].
In [gora-hbase-mapping.xml|svn.apache.org/viewvc/nutch/tags/release-2.3/conf/gora-hbase-mapping.xml?view=markup#l82]
you can see that for metadata is only defined the 'family' but not the 'qualifier'. Each key
of the Map (metadata, for example) is one column name, so the columns grow when you add new
keys to the map in the entity.
Just write your timestamps as Strings and they will be persisted as column name.

One thing to note (I once tried in 0.2.1) is that you can not define static columns using
one column family, and the dynamic ones using the same column family.

> Support creation of dynamic columns within Gora datastore mapping designs
> -------------------------------------------------------------------------
>
>                 Key: GORA-413
>                 URL: https://issues.apache.org/jira/browse/GORA-413
>             Project: Apache Gora
>          Issue Type: New Feature
>          Components: gora-hbase
>    Affects Versions: 0.6
>            Reporter: Lewis John McGibbney
>             Fix For: 0.7
>
>
> The conversation taking place on [dynamically generating HBase columns|http://www.mail-archive.com/dev%40gora.apache.org/msg05754.html]
has raised an issue that new functionality needs to be added in order to achieve this.
> The main driver for this issue coming to light is that Chukwa logs need to dynamically
create many many columns over time directly dependent on the number of data chunks we get.
Each data chunk has a [Sequence ID], this sequenceID should be the column name.
> The table design will look like this
> {code}
> Row Key: [Invert Date]:[Data Type]:[Primary Key]
> Column Family: log
> Column Name: [Sequence ID]
> Timestamp: [log entry timestamp]
> Example:
> Row Key: 2132013102:TT:host1.example.com
> Column Family: log
> Column Name: 1230
> Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
> Timestamp: 1358942490
> {code}
> The inverted date allow the table to be partitioned by hour or day of the month or month
more easily.
> The usage of column name for consecutive sequence to allow fast retrieval in a linear
scan. This format is typically good for retrieve a hour worth of logs fast for a node. Hence,
if we are doing batch scanning of the table in a rolling window via map reduce job at every
hour interval, we get a even spread the work load to multiple map reduce tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message