hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Schema to store graph
Date Wed, 01 Apr 2009 23:35:34 GMT
Thanks, Jonathan, that's very helpful!

Dave

On Wed, Apr 1, 2009 at 5:25 PM, Jonathan Gray <jlist@streamy.com> wrote:

> I will do my best to bring some clarity.
>
> First of all, HBase 0.20 will remove most, if not all, of the "limitations"
> on a single rows columns in a family.
>
> As far as 0.19 is concerned, there are no "limits".  We have several rows
> with 10s of thousands of columns in a family and this does not break
> anything.  The primary issue is that there are serious _performance_ issues
> when the family gets big.  There's nothing that will all of a sudden stop
> working, some things will just get very slow.
>
> So the reason you see varying opinions on the issue is that there is really
> no limit, things just progressively get slower and slower.  When they get
> slow and by how much is related to the size of your columns, if there are
> multiple versions of them, and how you are querying them.  I'm not 100%
> clear on which cases have the worst performance, and I'm not going to dig
> in
> the code now as this has radically changed in 0.20, but I think things are
> very bad if you specify explicit column lists, have high numbers of deletes
> and/or versions, etc.  I think this also has a negative impact on row
> seeking/scanning.
>
> I suggest you run some tests and benchmarks.  Figure out what your max
> is/will be, and run some performance tests.  Only you know if the
> performance hits from high numbers of columns is too much or not.  In my
> case, it was fine.  The query does not have significant slow-down compared
> with those with fewer (of course it's slower because it's reading more).
>
> And as long as things are not painfully slow, then you should be good
> moving
> forward with 0.19 and then watch everything get 10+X faster when you
> upgrade
> to 0.20 :)
>
> Hope that helps.
>
> JG
>
> > -----Original Message-----
> > From: Dave Latham [mailto:latham@davelink.net]
> > Sent: Wednesday, April 01, 2009 11:58 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Schema to store graph
> >
> > Can someone clarify the issues with the number of columns per column
> > family
> > that HBase 0.19 can handle?  I'm a bit confused, because I feel like
> > there's
> > some conflicting information.
> >
> > In this post (Dec. 20), St.Ack says low hundreds of columns per family
> > are
> > recommended, and refers to a bug (I'm guessing HBASE-867):
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200812.mbox/%3C494D7D6F.2050903@duboce.net%3E
> >
> > Then in this post (Dec. 21), Jonathan says they have hundreds of
> > thousands
> > of columns per family in production:
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200812.mbox/%3C60569.71.177.254.11.1229821338.squirrel@webmail.str
> > eamy.com%3E
> >
> > And follows (Mar. 9) with 50,000 columns:
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200903.mbox/%3C040701c9a0bd$fed90b70$fc8b2250$@com%3E
> >
> > And now in this thread people are referring to a rough limit of 5000.
> >
> > There are probably some differences based on resources available and
> > what
> > not, but I wouldn't think it would make this level of difference.  I've
> > begun implementing a schema where I expect some rows to have
> > potentially
> > 10,000s of columns (in the same family) and want to make sure that this
> > is
> > possible with HBase 0.19.  I don't at all mean to pin anyone down, I'm
> > just
> > hoping someone can shed a bit more light.
> >
> > Dave
> >
> >
> > On Wed, Apr 1, 2009 at 12:41 AM, stack <stack@duboce.net> wrote:
> >
> > > Edward is referring to https://issues.apache.org/jira/browse/HBASE-
> > 867.
> > > We
> > > need to fix it for 0.20.0 hbase release.
> > > St.Ack
> > >
> > > On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <edwardyoon@apache.org
> > > >wrote:
> > >
> > > > One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of
> > one
> > > > column so I couldn't test/benchmark for large scale.
> > > >
> > > > On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana
> > <amansk@gmail.com>
> > > > wrote:
> > > > > Response below
> > > > >
> > > > >
> > > > > Amandeep Khurana
> > > > > Computer Science Graduate Student
> > > > > University of California, Santa Cruz
> > > > >
> > > > >
> > > > > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon
> > <edwardyoon@apache.org
> > > > >wrote:
> > > > >
> > > > >> Hama store the sparse graph using Hbase as an sparse adjacency
> > matrix.
> > > > >> One of reason is to perform matrix decomposition for large
> > sparse
> > > > >> graphs. Anyway, I guess If you store the graph like that, you'll
> > only
> > > > >> need update the row 'v/w' to add v to w's/w to v's list of
> > neighbors.
> > > > >
> > > > >
> > > > > I didnt quite understand the last line here.
> > > > >
> > > > > I did think of a sparse matrix as well but not sure which is a
> > better
> > > > > approach. Thats why I posted here...
> > > > >
> > > > > Share about your experiences with Hama...
> > > > >
> > > > >>
> > > > >>
> > > > >> Just FYI, You also may want to see --
> > > > >> http://blog.udanax.org/2009/02/breadth-first-search-
> > mapreduce.html
> > > > >>
> > > > >> If you have any advice for us, Pls let us know.
> > > > >>
> > > > >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana
> > <amansk@gmail.com>
> > > > >> wrote:
> > > > >> > What would be a good schema in HBase to store information
> > pertaining
> > > > to a
> > > > >> > many to many graph? I was thinking of having the node id
as
> > the row
> > > > key,
> > > > >> the
> > > > >> > type of relation as the column family, the relation name
for
> > the
> > > > column
> > > > >> > identifier and the actual cell containing the key of the
node
> > that
> > > is
> > > > >> being
> > > > >> > connected with.
> > > > >> >
> > > > >> >
> > > > >> > Amandeep Khurana
> > > > >> > Computer Science Graduate Student
> > > > >> > University of California, Santa Cruz
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Edward J. Yoon
> > > > >> edwardyoon@apache.org
> > > > >> http://blog.udanax.org
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > edwardyoon@apache.org
> > > > http://blog.udanax.org
> > > >
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message