hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <f...@efendi.ca>
Subject Re: Hbase schema for many-to-many association
Date Wed, 24 Sep 2008 14:27:51 GMT
I agree with your design...

Without traditional RDBMS indexes from row-oriented world (which are  
indeed column-oriented structures very similar to Hadoop) we should  
have tables like:
STUDENT_COURSE: student, course
COURSE_STUDENT: course, student

I feel we can think about Hadoop tables as of index structures from  
traditional RDBMS...

Even with traditional RDBMS, it is possible to design full  
(non-normalized) schema using single table only:
CREATE TABLE my_database_schema
    table_name  VARCHAR(256) NOT NULL,
    column_name VARCHAR(256) NOT NULL,
    cell_data   BLOB

The only problem is surrogate primary keys... modern pattern says: do  
not use primary keys having business meaning (such as Social Insurance  
Number, Driving License Number, and etc). With RDBMS, transactional  
changing primary key in 'parent' table will put a lock on virtually  
infinite number of records from child table, so that we should use  
generated STUDENT_ID instead of natural PKs.

With HBase we should probably revert back to old pattern: find natural  
primary key, do not use surrogate auto-generated STUDENT_ID...  
(student_id could be used in their official transcripts - in this case  
it is 'natural' (having business meaning))


Quoting Michael Dagaev <michael.dagaev@gmail.com>:

> Hi All
> How would you design an Hbase table for many-to-many association
> between two entities, for example Student and Course?
> I would define two tables:
> Student:
>     student id
>     student data (name, address, ...)
>     courses  (use course ids as column qualifiers here)
> Course:
>    course id
>    course data (name, syllabus, ...)
>    students (use student ids as column qualifiers here)
> Does it make sense?
> Thank you,
> Michael

View raw message