hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject RE: Storing images in Hbase
Date Sun, 13 Jan 2013 16:17:37 GMT
That's right, Memstore size , not flush size is increased.  Filesize is
10G. Overall write cache is 60% of heap and read cache is 20%.  Flush size
is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that can
be promoted.  On the way to hbase images are written to a queue, so that we
can take Hbase down for maintenance and still do inserts later.  ImageShack
has ‘perma cache’ servers that allows writes and serving of data even when
hbase is down for hours, consider it 4th replica 😉 outside of hadoop

Jack

 *From:* Mohit Anchlia <mohitanchlia@gmail.com>
*Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
*To:* user@hbase.apache.org
*Subject:* Re: Storing images in Hbase

Thanks Jack for sharing this information. This definitely makes sense when
using the type of caching layer. You mentioned about increasing write
cache, I am assuming you had to increase the following parameters in
addition to increase the memstore size:

hbase.hregion.max.filesize
hbase.hregion.memstore.flush.size

On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <magnito@gmail.com> wrote:

> We buffer all accesses to HBASE with Varnish SSD based caching layer.
> So the impact for reads is negligible.  We have 70 node cluster, 8 GB
> of RAM per node, relatively weak nodes (intel core 2 duo), with
> 10-12TB per server of disks.  Inserting 600,000 images per day.  We
> have relatively little of compaction activity as we made our write
> cache much larger than read cache - so we don't experience region file
> fragmentation as much.
>
> -Jack
>
> On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mohitanchlia@gmail.com>
> wrote:
> > I think it really depends on volume of the traffic, data distribution
per
> > region, how and when files compaction occurs, number of nodes in the
> > cluster. In my experience when it comes to blob data where you are
> serving
> > 10s of thousand+ requests/sec writes and reads then it's very difficult
> to
> > manage HBase without very hard operations and maintenance in play. Jack
> > earlier mentioned they have 1 billion images, It would be interesting to
> > know what they see in terms of compaction, no of requests per sec. I'd
be
> > surprised that in high volume site it can be done without any Caching
> layer
> > on the top to alleviate IO spikes that occurs because of GC and
> compactions.
> >
> > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <dontariq@gmail.com>
> wrote:
> >
> >> IMHO, if the image files are not too huge, Hbase can efficiently serve
> the
> >> purpose. You can store some additional info along with the file
> depending
> >> upon your search criteria to make the search faster. Say if you want to
> >> fetch images by the type, you can store images in one column and its
> >> extension in another column(jpg, tiff etc).
> >>
> >> BTW, what exactly is the problem which you are facing. You have written
> >> "But I still cant do it"?
> >>
> >> Warm Regards,
> >> Tariq
> >> https://mtariq.jux.com/
> >>
> >>
> >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
> michael_segel@hotmail.com
> >> >wrote:
> >>
> >> > That's a viable option.
> >> > HDFS reads are faster than HBase, but it would require first hitting
> the
> >> > index in HBase which points to the file and then fetching the file.
> >> > It could be faster... we found storing binary data in a sequence file
> and
> >> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
> >> been
> >> > improved since we did that project....
> >> >
> >> >
> >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> >> dwivedishashwat@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Kavish,
> >> > >
> >> > > i have a better idea for you copy your image files to a single file
> on
> >> > > hdfs, and if new image comes append it to the existing image, and
> keep
> >> > and
> >> > > update the metadata and the offset to the HBase. Because if you put
> >> > bigger
> >> > > image in hbase it wil lead to some issue.
> >> > >
> >> > >
> >> > >
> >> > > ∞
> >> > > Shashwat Shriparv
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <larsh@apache.org>
> >> wrote:
> >> > >
> >> > >> Interesting. That's close to a PB if my math is correct.
> >> > >> Is there a write up about this somewhere? Something that we could
> link
> >> > >> from the HBase homepage?
> >> > >>
> >> > >> -- Lars
> >> > >>
> >> > >>
> >> > >> ----- Original Message -----
> >> > >> From: Jack Levin <magnito@gmail.com>
> >> > >> To: user@hbase.apache.org
> >> > >> Cc: Andrew Purtell <apurtell@apache.org>
> >> > >> Sent: Thursday, January 10, 2013 9:24 AM
> >> > >> Subject: Re: Storing images in Hbase
> >> > >>
> >> > >> We stored about 1 billion images into hbase with file size up
to
> 10MB.
> >> > >> Its been running for close to 2 years without issues and serves
> >> > >> delivery of images for Yfrog and ImageShack.  If you have any
> >> > >> questions about the setup, I would be glad to answer them.
> >> > >>
> >> > >> -Jack
> >> > >>
> >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
> mohitanchlia@gmail.com
> >> >
> >> > >> wrote:
> >> > >>> I have done extensive testing and have found that blobs don't
> belong
> >> in
> >> > >> the
> >> > >>> databases but are rather best left out on the file system.
Andrew
> >> > >> outlined
> >> > >>> issues that you'll face and not to mention IO issues when
> compaction
> >> > >> occurs
> >> > >>> over large files.
> >> > >>>
> >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <
> apurtell@apache.org
> >> >
> >> > >> wrote:
> >> > >>>
> >> > >>>> I meant this to say "a few really large values"
> >> > >>>>
> >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
> >> apurtell@apache.org>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Consider if the split threshold is 2 GB but your one
row
> contains
> >> 10
> >> > >> GB
> >> > >>>> as
> >> > >>>>> really large value.
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Best regards,
> >> > >>>>
> >> > >>>>   - Andy
> >> > >>>>
> >> > >>>> Problems worthy of attack prove their worth by hitting
back. -
> Piet
> >> > Hein
> >> > >>>> (via Tom White)
> >> > >>>>
> >> > >>
> >> > >>
> >> >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message