hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Schema to store graph
Date Wed, 01 Apr 2009 19:57:44 GMT
Can someone clarify the issues with the number of columns per column family
that HBase 0.19 can handle?  I'm a bit confused, because I feel like there's
some conflicting information.

In this post (Dec. 20), St.Ack says low hundreds of columns per family are
recommended, and refers to a bug (I'm guessing HBASE-867):
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200812.mbox/%3C494D7D6F.2050903@duboce.net%3E

Then in this post (Dec. 21), Jonathan says they have hundreds of thousands
of columns per family in production:
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200812.mbox/%3C60569.71.177.254.11.1229821338.squirrel@webmail.streamy.com%3E

And follows (Mar. 9) with 50,000 columns:
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200903.mbox/%3C040701c9a0bd$fed90b70$fc8b2250$@com%3E

And now in this thread people are referring to a rough limit of 5000.

There are probably some differences based on resources available and what
not, but I wouldn't think it would make this level of difference.  I've
begun implementing a schema where I expect some rows to have potentially
10,000s of columns (in the same family) and want to make sure that this is
possible with HBase 0.19.  I don't at all mean to pin anyone down, I'm just
hoping someone can shed a bit more light.

Dave


On Wed, Apr 1, 2009 at 12:41 AM, stack <stack@duboce.net> wrote:

> Edward is referring to https://issues.apache.org/jira/browse/HBASE-867.
> We
> need to fix it for 0.20.0 hbase release.
> St.Ack
>
> On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> > column so I couldn't test/benchmark for large scale.
> >
> > On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <amansk@gmail.com>
> > wrote:
> > > Response below
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> > >
> > >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> > >> One of reason is to perform matrix decomposition for large sparse
> > >> graphs. Anyway, I guess If you store the graph like that, you'll only
> > >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> > >
> > >
> > > I didnt quite understand the last line here.
> > >
> > > I did think of a sparse matrix as well but not sure which is a better
> > > approach. Thats why I posted here...
> > >
> > > Share about your experiences with Hama...
> > >
> > >>
> > >>
> > >> Just FYI, You also may want to see --
> > >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> > >>
> > >> If you have any advice for us, Pls let us know.
> > >>
> > >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <amansk@gmail.com>
> > >> wrote:
> > >> > What would be a good schema in HBase to store information pertaining
> > to a
> > >> > many to many graph? I was thinking of having the node id as the row
> > key,
> > >> the
> > >> > type of relation as the column family, the relation name for the
> > column
> > >> > identifier and the actual cell containing the key of the node that
> is
> > >> being
> > >> > connected with.
> > >> >
> > >> >
> > >> > Amandeep Khurana
> > >> > Computer Science Graduate Student
> > >> > University of California, Santa Cruz
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> edwardyoon@apache.org
> > >> http://blog.udanax.org
> > >>
> > >
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message