hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Help needed - Adding HBase to architecture
Date Sun, 14 Jun 2009 16:40:23 GMT
Hi Schubert,

I have 2TB and 1TB storage densities, respectively, on my test environments,
so I very much understand your point of view.

I think 0.20 will be able to come a lot closer to the goal of utilizing all
of that space than 0.19 can. If you can wait for the release of 0.20, it may
be worth experimenting to try and achieve 2000 regions with 1GB store files.
I personally am planning to run such an experiment. 

  - Andy




________________________________
From: zsongbo <zsongbo@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Sunday, June 14, 2009 9:34:44 AM
Subject: Re: Help needed - Adding HBase to architecture

Thank Andy and stack for your experiences sharing.In my data management
system with HBase, I want to store HUGE size of data.
But, assumes each node servers 1000 regions (250MB), only 250GB storage are
used.  We have 2TB disk on each node.
So, now, we store the data in files in HDFS and create a simple index to
query and locate these files.

Schubert

On Sun, Jun 14, 2009 at 3:18 AM, Andrew Purtell <apurtell@apache.org> wrote:

> 0.19 will have trouble compacting regions with large store files (> 1GB),
> especially if they are compressed.
>
> 0.20 is such a game changer that all the old experience and assumptions
> will have to be thrown out and all of this testing redone. That is a very
> good thing! :-) Kudos to all those who rebuilt the region server for this
> release.
>
>  - Andy
>
>
>
>
> ________________________________
> From: stack <stack@duboce.net>
> To: hbase-user@hadoop.apache.org
> Sent: Saturday, June 13, 2009 12:13:58 PM
> Subject: Re: Help needed - Adding HBase to architecture
>
> At powerset, we have ~80 regions per node on > 100 nodes.
>
> I've seen other clusters with hundreds and in testing have come close to a
> thousand per node.
>
> When a node has this many regions on board and it crashes, its going to
> take
> a while to recover.
>
> We've not played with it in a while but regions could be fatter.  By
> default, biggest store file in a region is < 256M.  Dependent on the type
> of
> your data and your access patterns, we should probably look to doubling or
> quadrupling this size.  Then could carry low hundreds of regions but they'd
> have more heft to them.
>
> St.Ack
>
> On Sat, Jun 13, 2009 at 11:59 AM, zsongbo <zsongbo@gmail.com> wrote:
>
> > Hi Billy,
> >
> > I agree "Hbase would be better suited to store the meta data in place of
> > the
> > images." very much.And store files in HDFS or other storage system such
> as
> > S3. But for small files, S3-like object storage system will be better.
> >
> > Another issue to discuss with you:
> > How many tablets/regions served in each of you HBase region server in you
> > practices? The Bigtable paper suggests at most handreds.
> >
> > Schubert
> >
> > On Mon, Jun 8, 2009 at 2:28 PM, Billy Pearson <
> sales@pearsonwholesale.com
> > >wrote:
> >
> > > If I was going to use a RDBMS to store the meta data then I would just
> > use
> > > hadoop hdfs to store the images/video
> > > I know that hadoop has a thrift api now
> > > http://wiki.apache.org/hadoop/HDFS-APIs
> > >
> > > Hbase would be better suited to store the meta data in place of the
> > images.
> > > The biggest benefit to hbase is you can scale the reads and writes to
> the
> > > db not just the reads in most RDBMS
> > >
> > > So you should be able to work with the files in hadoop in any language
> as
> > > long as you can get hadoop working correctly on windows.
> > > The benefit of this is you can scale hadoop as needed to hold more
> data.
> > > The downside to this is the memory that will be required for the
> namenode
> > > I thank its like 3m files per gb of memory or something like that
> > >
> > >
> > >
> > >
> > >
> > > "Nitin Gupta" <nitingupta183@gmail.com> wrote in message
> > > news:003c01c9e7fc$2087df00$61979d00$@com...
> > >
> > >  Jonathan,
> > >>
> > >> Thanks for detailed explanation. Much helpful.
> > >>
> > >> As far as file size is concerned, we may be even required to save
> Videos
> > >> in
> > >> future. So we shall def go above the HBase size limit at some point in
> > >> time.
> > >> Any other solution or key-value database that you can recommend for
> our
> > >> case?
> > >>
> > >> I am not much knowledgeable about the HDFS either. I think if we go
> with
> > >> pure HDFS, then all the required DB operations would have to be custom
> > >> developed on top of HDFS. For our needs, do you think that HDFS
> already
> > >> has
> > >> enough support that we will not need any major custom development. We
> > are
> > >> just saving the files/attachements and retrieving them with some basic
> > >> search.
> > >>
> > >> Regards,
> > >> Nitin
> > >>
> > >> -----Original Message-----
> > >> From: Jonathan Gray [mailto:jlist@streamy.com]
> > >> Sent: Sunday, June 07, 2009 9:30 PM
> > >> To: hbase-user@hadoop.apache.org
> > >> Subject: Re: Help needed - Adding HBase to architecture
> > >>
> > >> Nitin,
> > >>
> > >> HBase stores arbitrary binary values (row keys, column qualifiers, and
> > >> column values), so it is certainly capable of storing and serving
> files
> > >> and images.
> > >>
> > >> My only real question before I would give you a +1 on your idea is
> what
> > >> you expect the range of file sizes to be.  While HBase allows you to
> > store
> > >> values up to length Integer.MAX_VALUE, that is not recommended and in
> > past
> > >> versions has lead to memory issues (OOME and such).
> > >>
> > >> Images, text, word/excel docs, etc... should be no problem.  But I
> don't
> > >> recommend storing things in the upper 10s or 100s of MB, though it's
> > >> probably possible with a little work adjusting some configuration
> > >> parameters.  In general, if you are approaching HDFS block size, then
> > you
> > >> really just want HDFS and not HBase :)
> > >>
> > >> We are not currently running this in production, but we have had an
> > >> experimental version of our media server that runs on top of HBase
> > rather
> > >> than the file system.  It has a series of Python scripts (connected to
> > >> HBase through our custom interface, you could use Java directly or
> > >> Thrift/REST/etc) that are responsible for generating various thumbnail
> > >> sizes.  The originals are stored in HBase, and then a special query is
> > run
> > >> to grab the thumbnail of a certain size.  If it exists in HBase
> already,
> > >> it is just fetched and returned.  Otherwise, it is generated (via PIL,
> > >> Python Imaging Library, and some other custom tools), stored in HBase,
> > and
> > >> then returned to the client.
> > >>
> > >> As far as HBase on Windows goes... It's currently not possible but
> there
> > >> has been some effort from Powerset/Microsoft to make it happen.  I
> will
> > >> yield to those more familiar with it.
> > >>
> > >> Personally, I run Windows on my primary work desktop and spend a good
> > >> chunk of my time on HBase development.  When I've wanted to spin up
> > >> pseudo-distributed local clusters, I usually use a cheap Linux node or
> > >> local Virtual Machine.  In both cases, I use a Windows X Server and
> > >> redirect output to my local Windows machine so I can run Eclipse and
> > unit
> > >> tests from my Windows GUI.  Others have used Cygwin with some success,
> I
> > >> believe.
> > >>
> > >> Hope that sheds some light for you.
> > >>
> > >> You are almost certainly right about not wanting to store this in an
> > >> RDBMS.  And a hybrid approach seems to make sense, especially as a
> first
> > >> step.
> > >>
> > >> Jonathan Gray
> > >>
> > >>
> > >>
> > >> On Sun, June 7, 2009 6:44 am, Nitin Gupta wrote:
> > >>
> > >>> Hi All,
> > >>> I am working on an application which is kind of a social network on
> > >>> mobile
> > >>>  WAP. Recently, we have incorporated the files or attachments support
> > in
> > >>> our application. Right now, since we are not in production yet, we
> are
> > >>> keeping all the files in the RDBMS which our application is using.
> But
> > I
> > >>> am more than convinvced that this is not going to work once we are
in
> > >>> production mode.
> > >>>
> > >>> I got to know about HBase and I am making myself convice about its
> > usage
> > >>> for the file storage, search and retrieval operations. I would like
> my
> > >>> opinion to be endorsed by expert HBase users/developers. Just for the
> > >>> clarification, here is what I am planning to do:
> > >>>
> > >>> Make use of a RDBMS for relational data in the application.
> > >>> All the files/blob data to be saved in the HBase.
> > >>> When required, my application can query app data from the RDBMS and
> the
> > >>> files can be retrieved from the HBase data store I will keep the meta
> > >>> data
> > >>> of the files in my rdbms so that files can be associated with my apps
> > >>> entities
> > >>>
> > >>> Please help me decide if this is the right approach. My app is
> supposed
> > >>> to provide support for images as well. So if anyone can advice if
> HBase
> > >>> is
> > >>> the right solution for me, in conjuction with an imaging tool.
> > >>>
> > >>> Since my team is predominantly Windows based, I would like to know
is
> > it
> > >>> possible to run HBase on a windows machine in stand alone and in
> > >>> clustered
> > >>>  mode.
> > >>>
> > >>> Thanks for all your help.
> > >>>
> > >>>
> > >>> nitin
> > >>>
> > >>>
> > >>
> > >>
> > >
> > >
> >
>
>
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message