hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim R. Wilson" <wilson.ji...@gmail.com>
Subject Re: HBase Sample Schemas
Date Fri, 28 Mar 2008 13:28:50 GMT
Thanks Ankur!

Those are very helpful - finding example schemas has been a really
sore point for me as well in trying to learn all this.

I was wondering if you had an example that defined a bloom filter for
a column, and an example on how to query a bloom filter once it's set
up (shell example or rest example if possible).

Thanks again!

-- Jim R. Wilson (jimbojw)

On Fri, Mar 28, 2008 at 1:33 AM, Goel, Ankur <Ankur.Goel@corp.aol.com> wrote:
>
>  > ....by adding a column.
>  Sorry, I meant colon ":"
>
>
>  -----Original Message-----
>  From: Goel, Ankur [mailto:Ankur.Goel@corp.aol.com]
>  Sent: Friday, March 28, 2008 12:01 PM
>  To: hbase-user@hadoop.apache.org
>
>
> Subject: RE: HBase Sample Schemas
>
>  The tables below are RDBMS tables with column names simply converted to
>  column families by adding a column.
>  I'd like to share ideas on how best these tables can be modified (or
>  merged ??) to take advantage of column oriented design.
>
>  -----Original Message-----
>  From: Edward J. Yoon [mailto:edward@udanax.org]
>  Sent: Friday, March 28, 2008 11:48 AM
>  To: hbase-user@hadoop.apache.org
>  Subject: Re: HBase Sample Schemas
>
>  I don't think this is a good example.
>
>  Find the the difference between the two physical schemas for same
>  logical data modeling of relational database using an relationship
>  tables on RDBMS and a list of column qualifiers on BigTable.
>
>  On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <Ankur.Goel@corp.aol.com>
>  wrote:
>  > Hi Bryan,
>  >         Here is the sample schema I have (looks closer to RDBMS, I
>  > know)
>  >
>  > TABLE:           seed_list
>  >
>  > DESCRIPTION: Used to store seed urls (both old and newly discovered).
>  >             Initially populated with some seed URLs. The crawl
>  > controller
>  >             picks up the seeds from this table that have status=0 (Not
>  > Visited)
>  >                 or status=2 (Visited, but ready for re-crawl) and
>  > feeds these seeds
>  >             in batch to different crawl engines that it knows about.
>  >
>  > SCHEMA:      Columns families below
>  >
>  >          {"referer_id:", "100"}, // Integer here is Max_Length
>  >        {"url:","1500"},
>  >        {"site:","500"},
>  >        {"last_crawl_date:", "1000"},
>  >        {"next_crawl_date:", "1000"},
>  >        {"create_date:","100"},
>  >        {"status:","100"},
>  >        {"strike:", "100"},
>  >        {"language:","150"},
>  >        {"topic:","500"},
>  >        {"depth:","100000"}
>  >
>  > Common attributes are [max versions: 1,  compression: NONE, in memory:
>  > false, block cache enabled: true, max length: 100, bloom filter: none]
>  >
>  >
>  > TABLE:   web_content
>  >
>  > DESCRIPTION: Used to store information retrived after crawling a URL.
>  >             Each crawl engines provides information about URL it
>  > crawled.
>  >             This information is then stored in this table depending
>  > upon
>  >             the profile settings (what should be stored?)
>  > SCHEMA:  Column families below
>  >
>  >            {"url:", "1500"},
>  >          {"site:","500"},
>  >          {"content_type:","100"},
>  >          {"title:", "1000"},
>  >          {"content:", Integer.MAX_VALUE + ""},
>  >          {"parsed_text:",Integer.MAX_VALUE + ""},
>  >          {"crawl_date:", "1000"},
>  >          {"last_modified_date:","100"},
>  >          {"http_headers:","10000"},
>  >          {"content_length:","11"},
>  >          {"outlinks_count:","100000"}
>  >
>  > Common attributes are [max versions: 1,  compression: BLOCK, in
>  memory:
>  > false, block cache enabled: true, max length: 100, bloom filter: none]
>  >
>  > Please feel free to suggest modifications/enhancements for column
>  > oriented Design.
>  >
>  > Thanks
>  > -Ankur
>  >
>  >
>  > -----Original Message-----
>  > From: Bryan Duxbury [mailto:bryan@rapleaf.com]
>  > Sent: Friday, March 28, 2008 10:33 AM
>  > To: hbase-user@hadoop.apache.org
>  > Subject: HBase Sample Schemas
>  >
>  > All,
>  >
>  > One of the more common types of questions we get from people new to
>  > HBase are about the differences in the schema between HBase and
>  > relational databases. So that we can generate some good examples of
>  > RDBMS schemas and their counterparts as they might be represented in
>  > HBase, could you guys post some small (1-5 entities) schemas that you
>  > might be interested in using and a few sentences about how you'd like
>  > to consume them? We can then discuss possible options and see how
>  > things might look. This will also help Stack, Jim, and myself to
>  > notice interesting access patterns we might want to support.
>  >
>  > Thanks in advance,
>  >
>  > Bryan
>  >
>
>
>
>  --
>  B. Regards,
>  Edward J. Yoon
>

Mime
View raw message