hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Smieszny <ad...@cloudera.com>
Subject Re: RDBMS to HBASE schema migration
Date Fri, 29 Jun 2012 21:18:47 GMT
Hi Rashmi,

I know you were in a hurry so the following may not meet your time
constraints, but I found the following video to be helpful when considering
HBase Schema design
http://www.cloudera.com/resource/video-hbasecon-2012-hbasecon-2012/

Best,
Adam Smieszny

On Thu, Jun 28, 2012 at 8:25 PM, grashmi13
<rashmi.maheshwari@rsystems.com>wrote:

>
> Thanks Doug Meil for your valuable comments.
>
> Actually I need to provide some output to my manager by today.. so i asked
> help from experts point of view.
>
> I will go thru with suggested maerial. But right now I dont have much time.
>
> Also, this is my perception towards hbase desgin by so far surfing and
> reading blogs and other material.
>
> As written by you, tables must have one PK.... I have also taken one PK
> only. Also, I havent used any secondary index here.
>
> I need your inputs if my change is correct or not. if not, what should be
> hbase schema as per your perspective and expertise for above given tables?
>
> I used nested entities in last hbase table. if that single table design for
> all the RDBMS tables, is correct or not? if not, what should be it as per
> your perception? How do we program nested entities using hbase API?
>
>
> design suggested by you, would help in into further lEARNING ANd further
> designing.
>
> Thanks again, looking forward for your valuable inputs.
>
> Regards,
> Rashmi
>
> Doug Meil-2 wrote:
> >
> >
> > Hi there-
> >
> > I commend your enthusiasm for the Hbase project.  For the "ground rules
> of
> > Hbase" you probably want to read this closelyŠ
> >
> > http://hbase.apache.org/book.html#datamodel
> >
> > Š as it covers things like having one PK per table, no secondary indexes,
> > etc.  With a solid understanding of these rules the table relationship
> and
> > search questions you're asking will become clearer.
> >
> >
> >
> >
> >
> >
> > On 6/28/12 3:01 PM, "grashmi13" <rashmi.maheshwari@rsystems.com> wrote:
> >
> >>
> >>Assets table is having numeric sequential ID and a one number out of
> >>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
> >>10
> >>rows only.
> >>
> >>
> >>hmmm.. after some more surfing, i came to know that we have to manually
> >>denormalize a relational DB. there are no preset rules for
> >>denormalization.
> >>
> >>today I was doing a denormalization.. can you please confirm if rdbms to
> >>hbase change i have done is correct or not:
> >>These tables are from stock trading application:
> >>
> >>1. I have two tables...
> >>Assets
> >>=========
> >>AssetID | assetName
> >>
> >>
> >>second table is:
> >>
> >>INSTRUMENTS
> >>============
> >>InstrumentID | assetID | Symbol | name | created | modifiedAt
> >>
> >>
> >>While converting them into Hbase two questions were there in my mind:
> >>1. Will there be two tables or one? If we have a web GUI and have two
> >>screens, one for assets management and one for instuments management. In
> >>instrument management, there would be a combo having assets. An asset can
> >>be
> >>assigned to multiple instruments.
> >>
> >>So if user will first populate all assets thru assets screen. So there
> >>should be one master table for Assets. because it is not dependent on any
> >>other entity. Am I right here?
> >>
> >>And now user will populate instruments thru instrument management screen.
> >>Here another table would come into picture. "INSTRUMENTS". this table
> >>would
> >>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
> >>properties, all assets information should be duplicated here too. Now
> >>question here, if I delete or update some asset in future, and there are
> >>corresponding records in INSTRUMENTS table. what if asset is deletion or
> >>updation goes successful for assets table but fails for instruments
> table?
> >>How we can maintain consistancy and integrity here?
> >>
> >>
> >>HBASE table conversion here:
> >>
> >>ASSETS
> >>==================
> >>row key - AssetID
> >>================
> >>Columns: CF1 - AssetName
> >>================================================
> >>
> >>INSTRUMENTS
> >>=====================================
> >>row key-- instrumentID(a sequential numeric value)
> >>====================================
> >>InstrumentInfo(family) -
> >>Symbol
> >>name
> >>AssetsInfo(family2) -
> >>AssetID
> >>AssetName
> >>=====================================
> >>
> >>
> >>Please confirm, two tables would be required or one only? If one only,
> how
> >>assets population and after that instruments population would be
> possible?
> >>If two, how we can maintain integrity and consistancy during assets
> >>deletion\updation?
> >>
> >>Please confirm if my approach is correct?
> >>
> >>Yes, as read everywhere, in hbase, we must think about purpose of data in
> >>our application before creating tables. So purpose here is, to
> >>populate\delete\update assets and instruments thru web UI and then run a
> >>job
> >>which extract instruments by asset name. Also, assets by instrument name.
> >>
> >>======================================
> >>
> >>After above conversion, I got stuck on one more point. "SELF-JOIN"
> >>
> >>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
> >>it
> >>one primary key stockID and other normal column BaseCurrencyID, both
> >>points
> >>to an instrumentID.
> >>
> >>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
> >>is
> >>ID of an instrument whose assetID is 3.
> >>
> >>InstrumentsStock
> >>=======================
> >>stockID | AssetID | BaseCurrencyID | 10 more columns here
> >>========================================
> >>
> >>where StockID = InstrumentID of Instrument with AssetID 1
> >>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
> >>
> >>StockID is primary key
> >>
> >>
> >>E.g. I have below data in tables:
> >>Instruments:
> >>======================================
> >>InstrumentID | assetsID | Symbol | Name | Comment
> >>======================================
> >>22 | 1 | WMT | Ins1| createdByRashmi
> >>23 | 1 | HOG | Ins2| createdByRashmi
> >>....
> >>....
> >>....
> >>40 | 3 | HPQ | Ins3| createdByRashmi
> >>41 | 3 | KO | Ins4| createdByRashmi
> >>
> >>InstrumentsStock
> >>==========================================
> >>StockID | assetsID | BasecurrencyID | x | y | z
> >>======================================
> >>22 | 1 | 41 | stock1 | y1 | z1
> >>23 | 1 | 40 | stock2 | y2 | z2
> >>24 | 1 | 41 | stock3 | y3 | z3
> >>25 | 1 | 40 | stock4 | y4 | z4
> >>...
> >>...
> >>
> >>
> >>Relationship between Instruments and InstrumentsStock is one to one.
> >>
> >>Question: retrieve all Stocks where baseCurrencyID is 40.
> >>
> >>
> >>Change I made: Combined all three tables, Instruments, Assets,
> >>InstrumentsStock and created one table "INstruments_Asset_Stocks" with
> >>three
> >>families (AssetInfo, InstrumentInfo, StockINfo)
> >>
> >>
> >>
> >>INstruments_Asset_Stocks
> >>==============================================================
> >>rowKey : | | |
> >>InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo
> >>===============================================================
> >>| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name
> >>===============================================================
> >>22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1
> >>23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2
> >>24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3
> >>25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4
> >>40 | 3 | assetTwo | | HPQ, Ins5
> >>41 | 3 | assetTwo | | KO, Ins6
> >>
> >>
> >>
> >>On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all
> >>Stock relation detail from this table. On basis of AssetInfo family,
> >>assets
> >>related stocks, on basis of IntrumentInfo, related stock we can
> >>retrieve...
> >>
> >>Is my understanding correct, or I am so far very wrong.
> >>
> >>
> >>Here SELF-JOIn was used in relational schema... how do we achieve same
> >>JOIn
> >>in hbase, as above or some other way?
> >>
> >>
> >>What is nested entities and how do we create nested entities using "Hbase
> >>Shell" and using hbase java API. There is API to create family, column...
> >>is
> >>there API to create Nested Entities?
> >>
> >>
> >>There is one more table:
> >>
> >>Same StockID and localCurrencyID are populated using InstrumentID only.
> >>Data
> >>of this table is like:
> >>Listing
> >>==============================
> >>ID | stockID | LocalCurrencyID | location
> >>==============================
> >>L1| 22 | 51 | India
> >>L1| 22 | 52 | Japan
> >>L1| 22 | 40 | London
> >>L1| 23 | 57 | USA
> >>L1| 23 | 41 | Africa
> >>
> >>Here again self-join is used and many-many relationship...
> >>
> >>How do I integrate this table with above INstruments_Asset_Stocks table?
> I
> >>want to get All stocks details for given locaCurrency.
> >>
> >>
> >>What I did:
> >>INstruments_Asset_Stocks_Listing
> >>===============================================
> >>rowKey - instrumentID_ListingID
> >>===============================================
> >>Cf1: AssetINnfo (assetID, assetName)
> >>Cf2: InstrumentInfo(symbol, name, comment)
> >>cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID,
> >>nested_enties_for_LocalCurrency
> >>)
> >>
> >>Nested entities for local currency would be having localCurrencyID as key
> >>and location_listingID as value.
> >>
> >>Is this change correct to get all Stocks for given localCurrency. As read
> >>somewhere, I have to use map\reduce to get this data from nested
> entities?
> >>is it?
> >>
> >>How do we implement nested entities? How(from which API or shell command)
> >>client scans, searched these entities? Can I search it like I do for
> >>column
> >>qualifier BaseCurrencyID?
> >>
> >>
> >>Your inputs would really help me in understanding this Hbase design.
> >>
> >>
> >>Regards,
> >>Rashmi
> >>
> >>
> >>grashmi13 wrote:
> >>>
> >>> Hi,
> >>>
> >>> I want to change my RDBMS to HBASE schema, to be used with Hadoop
> >>> platform.
> >>>
> >>> I have changed two RDBMS tables into HBASE tables. I have ignored
> >>> constraints, indexes and foreign key relationship. Because I dont know
> >>>how
> >>> to convert these relationships in Hbase schema.
> >>>
> >>> Please confirm if the change I have made is correct?
> >>>
> >>> Relational Schema of tables are:
> >>> ==========================
> >>>
> >>> TABLE : ASSTES
> >>> ------------------------------------------------------------------
> >>>
> >>> CREATE TABLE [dbo].[Assets](
> >>> [AssetId] [int] NOT NULL,
> >>> [AssetName] [varchar](50) NOT NULL,
> >>> CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED
> >>> (
> >>> [AssetId] ASC
> >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
> >>> ) ON [PRIMARY]
> >>>
>
> >>>-------------------------------------------------------------------------
> >>>------------------------------------------------------
> >>>
> >>>
> >>> --------------------------------
> >>> TABLE: Instruments
> >>> ----------------------------------
> >>> CREATE TABLE [dbo].[Instruments](
> >>> [InstrumentId] [int] IDENTITY(1,1) NOT NULL,
> >>> [AssetId] [int] NOT NULL,
> >>> [Symbol] [varchar](50) NOT NULL,
> >>> [Name] [varchar](250) NOT NULL,
> >>> [Created] [datetime] NOT NULL,
> >>> [Modified] [datetime] NULL,
> >>> [Comments] [varchar](250) NULL,
> >>> CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED
> >>> (
> >>> [InstrumentId] ASC
> >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
> >>> CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED
> >>> (
> >>> [InstrumentId] ASC,
> >>> [AssetId] ASC
> >>> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> >>> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
> >>> ) ON [PRIMARY]
> >>>
> >>> GO
> >>>
> >>> SET ANSI_PADDING ON
> >>> GO
> >>>
> >>> ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT
> >>> [FK_Instruments_Assets] FOREIGN KEY([AssetId])
> >>> REFERENCES [dbo].[Assets] ([AssetId])
> >>> ON UPDATE CASCADE
> >>> GO
> >>>
> >>> ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT
> [FK_Instruments_Assets]
> >>> GO
> >>>
> >>> ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT
> >>>[DF_Instruments_Created_1]
> >>> DEFAULT (getdate()) FOR [Created]
> >>> GO
> >>>
> >>> ====================================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables:
> >>> ====================================================================
> >>>
> >>> Assets table
> >>> ===============
> >>> RowKey      -- AssetID
> >>> ColumnFamilies      (AssetName      ) ---- ColumnName - (Name)
> >>>
> >>>
> >>>
> >>> Instruments table
> >>> ==============
> >>> RowKey      - InstrumentID
> >>> ColumnFamilies - (Content) --- Columns: Symbol, Name, Created,
> Modified,
> >>> Comments
> >>> (Assets) ---        Columns: AssetID
> >>>
> >>>
> >>> Please confirm if given conversion is proper?
> >>>
> >>> Also, how do i convert constraints and indexes and foreign key
> >>> relationship?
> >>>
> >>>
> >>>
> >>>
> >>> Thanks in advance
> >>>
> >>> Regards,
> >>> rashmi
> >>>
> >>
> >>--
> >>View this message in context:
> >>
> http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34087958.
> >>html
> >>Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/RDBMS-to-HBASE-schema-migration-tp34087951p34089135.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>


-- 
Adam Smieszny
Cloudera | Systems Engineer | http://tiny.cloudera.com/about
917.830.4156 | http://www.linkedin.com/in/adamsmieszny

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message