hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From grashmi13 <rashmi.maheshw...@rsystems.com>
Subject Re: RDBMS to HBASE schema migration
Date Fri, 29 Jun 2012 00:25:08 GMT

Thanks Doug Meil for your valuable comments.

Actually I need to provide some output to my manager by today.. so i asked
help from experts point of view.  

I will go thru with suggested maerial. But right now I dont have much time.

Also, this is my perception towards hbase desgin by so far surfing and
reading blogs and other material. 

As written by you, tables must have one PK.... I have also taken one PK
only. Also, I havent used any secondary index here.

I need your inputs if my change is correct or not. if not, what should be
hbase schema as per your perspective and expertise for above given tables?

I used nested entities in last hbase table. if that single table design for
all the RDBMS tables, is correct or not? if not, what should be it as per
your perception? How do we program nested entities using hbase API?


design suggested by you, would help in into further lEARNING ANd further
designing.

Thanks again, looking forward for your valuable inputs.

Regards,
Rashmi

Doug Meil-2 wrote:
> 
> 
> Hi there-
> 
> I commend your enthusiasm for the Hbase project.  For the "ground rules of
> Hbase" you probably want to read this closelyŠ
> 
> http://hbase.apache.org/book.html#datamodel
> 
> Š as it covers things like having one PK per table, no secondary indexes,
> etc.  With a solid understanding of these rules the table relationship and
> search questions you're asking will become clearer.
> 
> 
> 
> 
> 
> 
> On 6/28/12 3:01 PM, "grashmi13" <rashmi.maheshwari@rsystems.com> wrote:
> 
>>
>>Assets table is having numeric sequential ID and a one number out of
>>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
>>10
>>rows only. 
>>
>>
>>hmmm.. after some more surfing, i came to know that we have to manually
>>denormalize a relational DB. there are no preset rules for
>>denormalization. 
>>
>>today I was doing a denormalization.. can you please confirm if rdbms to
>>hbase change i have done is correct or not:
>>These tables are from stock trading application:
>>
>>1. I have two tables...
>>Assets 
>>========= 
>>AssetID | assetName
>>
>>
>>second table is: 
>>
>>INSTRUMENTS 
>>============ 
>>InstrumentID | assetID | Symbol | name | created | modifiedAt
>>
>>
>>While converting them into Hbase two questions were there in my mind:
>>1. Will there be two tables or one? If we have a web GUI and have two
>>screens, one for assets management and one for instuments management. In
>>instrument management, there would be a combo having assets. An asset can
>>be
>>assigned to multiple instruments.
>>
>>So if user will first populate all assets thru assets screen. So there
>>should be one master table for Assets. because it is not dependent on any
>>other entity. Am I right here?
>>
>>And now user will populate instruments thru instrument management screen.
>>Here another table would come into picture. "INSTRUMENTS". this table
>>would
>>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
>>properties, all assets information should be duplicated here too. Now
>>question here, if I delete or update some asset in future, and there are
>>corresponding records in INSTRUMENTS table. what if asset is deletion or
>>updation goes successful for assets table but fails for instruments table?
>>How we can maintain consistancy and integrity here?
>>
>>
>>HBASE table conversion here:
>>
>>ASSETS 
>>==================
>>row key - AssetID 
>>================ 
>>Columns: CF1 - AssetName
>>================================================
>>
>>INSTRUMENTS 
>>=====================================
>>row key-- instrumentID(a sequential numeric value)
>>====================================
>>InstrumentInfo(family) -
>>Symbol 
>>name 
>>AssetsInfo(family2) -
>>AssetID 
>>AssetName 
>>=====================================
>>
>>
>>Please confirm, two tables would be required or one only? If one only, how
>>assets population and after that instruments population would be possible?
>>If two, how we can maintain integrity and consistancy during assets
>>deletion\updation?
>>
>>Please confirm if my approach is correct?
>>
>>Yes, as read everywhere, in hbase, we must think about purpose of data in
>>our application before creating tables. So purpose here is, to
>>populate\delete\update assets and instruments thru web UI and then run a
>>job
>>which extract instruments by asset name. Also, assets by instrument name.
>>
>>======================================
>>
>>After above conversion, I got stuck on one more point. "SELF-JOIN"
>>
>>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
>>it
>>one primary key stockID and other normal column BaseCurrencyID, both
>>points
>>to an instrumentID.
>>
>>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
>>is
>>ID of an instrument whose assetID is 3.
>>
>>InstrumentsStock 
>>=======================
>>stockID | AssetID | BaseCurrencyID | 10 more columns here
>>========================================
>>
>>where StockID = InstrumentID of Instrument with AssetID 1
>>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
>>
>>StockID is primary key
>>
>>
>>E.g. I have below data in tables:
>>Instruments: 
>>======================================
>>InstrumentID | assetsID | Symbol | Name | Comment
>>======================================
>>22 | 1 | WMT | Ins1| createdByRashmi
>>23 | 1 | HOG | Ins2| createdByRashmi
>>.... 
>>.... 
>>.... 
>>40 | 3 | HPQ | Ins3| createdByRashmi
>>41 | 3 | KO | Ins4| createdByRashmi
>>
>>InstrumentsStock 
>>==========================================
>>StockID | assetsID | BasecurrencyID | x | y | z
>>======================================
>>22 | 1 | 41 | stock1 | y1 | z1
>>23 | 1 | 40 | stock2 | y2 | z2
>>24 | 1 | 41 | stock3 | y3 | z3
>>25 | 1 | 40 | stock4 | y4 | z4
>>... 
>>... 
>>
>>
>>Relationship between Instruments and InstrumentsStock is one to one.
>>
>>Question: retrieve all Stocks where baseCurrencyID is 40.
>>
>>
>>Change I made: Combined all three tables, Instruments, Assets,
>>InstrumentsStock and created one table "INstruments_Asset_Stocks" with
>>three
>>families (AssetInfo, InstrumentInfo, StockINfo)
>>
>>
>>
>>INstruments_Asset_Stocks
>>==============================================================
>>rowKey : | | | 
>>InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo
>>===============================================================
>>| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name
>>===============================================================
>>22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1
>>23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2
>>24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3
>>25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4
>>40 | 3 | assetTwo | | HPQ, Ins5
>>41 | 3 | assetTwo | | KO, Ins6
>>
>>
>>
>>On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all
>>Stock relation detail from this table. On basis of AssetInfo family,
>>assets
>>related stocks, on basis of IntrumentInfo, related stock we can
>>retrieve... 
>>
>>Is my understanding correct, or I am so far very wrong.
>>
>>
>>Here SELF-JOIn was used in relational schema... how do we achieve same
>>JOIn
>>in hbase, as above or some other way?
>>
>>
>>What is nested entities and how do we create nested entities using "Hbase
>>Shell" and using hbase java API. There is API to create family, column...
>>is
>>there API to create Nested Entities?
>>
>>
>>There is one more table:
>>
>>Same StockID and localCurrencyID are populated using InstrumentID only.
>>Data
>>of this table is like:
>>Listing 
>>==============================
>>ID | stockID | LocalCurrencyID | location
>>==============================
>>L1| 22 | 51 | India
>>L1| 22 | 52 | Japan
>>L1| 22 | 40 | London
>>L1| 23 | 57 | USA 
>>L1| 23 | 41 | Africa
>> 
>>Here again self-join is used and many-many relationship...
>>
>>How do I integrate this table with above INstruments_Asset_Stocks table? I
>>want to get All stocks details for given locaCurrency.
>>
>>
>>What I did: 
>>INstruments_Asset_Stocks_Listing
>>===============================================
>>rowKey - instrumentID_ListingID
>>===============================================
>>Cf1: AssetINnfo (assetID, assetName)
>>Cf2: InstrumentInfo(symbol, name, comment)
>>cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID,
>>nested_enties_for_LocalCurrency
>>) 
>>
>>Nested entities for local currency would be having localCurrencyID as key
>>and location_listingID as value.
>>
>>Is this change correct to get all Stocks for given localCurrency. As read
>>somewhere, I have to use map\reduce to get this data from nested entities?
>>is it? 
>>
>>How do we implement nested entities? How(from which API or shell command)
>>client scans, searched these entities? Can I search it like I do for
>>column
>>qualifier BaseCurrencyID?
>>
>>
>>Your inputs would really help me in understanding this Hbase design.
>>
>>
>>Regards, 
>>Rashmi
>>
>>
>>grashmi13 wrote:
>>> 
>>> Hi,
>>> 
>>> I want to change my RDBMS to HBASE schema, to be used with Hadoop
>>> platform.
>>> 
>>> I have changed two RDBMS tables into HBASE tables. I have ignored
>>> constraints, indexes and foreign key relationship. Because I dont know
>>>how
>>> to convert these relationships in Hbase schema.
>>> 
>>> Please confirm if the change I have made is correct?
>>> 
>>> Relational Schema of tables are:
>>> ==========================
>>> 
>>> TABLE : ASSTES
>>> ------------------------------------------------------------------
>>> 
>>> CREATE TABLE [dbo].[Assets](
>>> [AssetId] [int] NOT NULL,
>>> [AssetName] [varchar](50) NOT NULL,
>>> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED
>>> (
>>> [AssetId] ASC
>>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>>> ) ON [PRIMARY]
>>> 
>>>-------------------------------------------------------------------------
>>>------------------------------------------------------
>>> 
>>> 
>>> --------------------------------
>>> TABLE: Instruments
>>> ----------------------------------
>>> CREATE TABLE [dbo].[Instruments](
>>> [InstrumentId] [int] IDENTITY(1,1) NOT NULL,
>>> [AssetId] [int] NOT NULL,
>>> [Symbol] [varchar](50) NOT NULL,
>>> [Name] [varchar](250) NOT NULL,
>>> [Created] [datetime] NOT NULL,
>>> [Modified] [datetime] NULL,
>>> [Comments] [varchar](250) NULL,
>>> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED
>>> (
>>> [InstrumentId] ASC
>>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
>>> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED
>>> (
>>> [InstrumentId] ASC,
>>> [AssetId] ASC
>>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>>> ) ON [PRIMARY]
>>> 
>>> GO
>>> 
>>> SET ANSI_PADDING ON
>>> GO
>>> 
>>> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT
>>> [FK_Instruments_Assets] FOREIGN KEY([AssetId])
>>> REFERENCES [dbo].[Assets] ([AssetId])
>>> ON UPDATE CASCADE
>>> GO
>>> 
>>> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT [FK_Instruments_Assets]
>>> GO
>>> 
>>> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT
>>>[DF_Instruments_Created_1]
>>> DEFAULT (getdate()) FOR [Created]
>>> GO
>>> 
>>> ====================================
>>> 
>>> 
>>> 
>>> 
>>> 
>>> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables:
>>> ====================================================================
>>> 
>>> Assets table
>>> ===============
>>> RowKey	-- AssetID
>>> ColumnFamilies	(AssetName	) ---- ColumnName - (Name)
>>> 
>>> 
>>> 
>>> Instruments table
>>> ==============
>>> RowKey	- InstrumentID
>>> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created, Modified,
>>> Comments
>>> (Assets) ---	Columns: AssetID
>>> 
>>> 
>>> Please confirm if given conversion is proper?
>>> 
>>> Also, how do i convert constraints and indexes and foreign key
>>> relationship?
>>> 
>>> 
>>> 
>>> 
>>> Thanks in advance
>>> 
>>> Regards,
>>> rashmi
>>> 
>>
>>-- 
>>View this message in context:
>>http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958.
>>html
>>Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34089135.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message