Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 87539 invoked from network); 8 Jun 2009 05:39:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jun 2009 05:39:46 -0000 Received: (qmail 55296 invoked by uid 500); 8 Jun 2009 05:39:57 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 55240 invoked by uid 500); 8 Jun 2009 05:39:57 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 55230 invoked by uid 99); 8 Jun 2009 05:39:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 05:39:57 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nitingupta183@gmail.com designates 209.85.198.233 as permitted sender) Received: from [209.85.198.233] (HELO rv-out-0506.google.com) (209.85.198.233) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 05:39:46 +0000 Received: by rv-out-0506.google.com with SMTP id k40so1115523rvb.29 for ; Sun, 07 Jun 2009 22:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:message-id:mime-version:content-type :content-transfer-encoding:x-mailer:thread-index:content-language; bh=ABDnWVOop3OLPXD+NeEQB1TMPJ92T6MSv/gm9wPF194=; b=gCo6a4n42XdFmQVHFatfsXkAeq/eIfrz98/wbX8Thhfvj19UKYJbqoZdABG9aXkOtC ex11fofOn+ybm0pW0z7w0wBsX+oaVK01xe1peLLXbfu9ZDLwtOpgRlfSgO/jrcuAbMTa i3Xzp07fJcbGw8kXDmpNw1WB9dIY5cvszQnjs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; b=AyYRnB4+cu+pxi9/NbNgRjVFh1udriC2cXACW7RqHjz6i/YqIqZ26w0vY59FS8Uh6n Bqe5UkL+B02LhWbAKyD62izaQzkicEVNE+BzKfzFBebvbUYxoeYiCkFZVGeIlRiu4l13 K9uWa2Preab2cObQ0fkE23PLQcGoq/MErcPW4= Received: by 10.142.12.14 with SMTP id 14mr700885wfl.215.1244439565366; Sun, 07 Jun 2009 22:39:25 -0700 (PDT) Received: from stdel001 ([122.162.120.50]) by mx.google.com with ESMTPS id 32sm8111756wfc.34.2009.06.07.22.39.22 (version=SSLv3 cipher=RC4-MD5); Sun, 07 Jun 2009 22:39:24 -0700 (PDT) From: "Nitin Gupta" To: References: <56107.173.55.7.221.1244390379.squirrel@webmail.streamy.com> In-Reply-To: <56107.173.55.7.221.1244390379.squirrel@webmail.streamy.com> Subject: RE: Help needed - Adding HBase to architecture Date: Mon, 8 Jun 2009 11:13:56 +0530 Message-ID: <003c01c9e7fc$2087df00$61979d00$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcnniQgl0X4tg8ylS26RKwGlBDy6lQAb0CAg Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org Jonathan, Thanks for detailed explanation. Much helpful. As far as file size is concerned, we may be even required to save Videos in future. So we shall def go above the HBase size limit at some point in time. Any other solution or key-value database that you can recommend for our case? I am not much knowledgeable about the HDFS either. I think if we go with pure HDFS, then all the required DB operations would have to be custom developed on top of HDFS. For our needs, do you think that HDFS already has enough support that we will not need any major custom development. We are just saving the files/attachements and retrieving them with some basic search. Regards, Nitin -----Original Message----- From: Jonathan Gray [mailto:jlist@streamy.com] Sent: Sunday, June 07, 2009 9:30 PM To: hbase-user@hadoop.apache.org Subject: Re: Help needed - Adding HBase to architecture Nitin, HBase stores arbitrary binary values (row keys, column qualifiers, and column values), so it is certainly capable of storing and serving files and images. My only real question before I would give you a +1 on your idea is what you expect the range of file sizes to be. While HBase allows you to store values up to length Integer.MAX_VALUE, that is not recommended and in past versions has lead to memory issues (OOME and such). Images, text, word/excel docs, etc... should be no problem. But I don't recommend storing things in the upper 10s or 100s of MB, though it's probably possible with a little work adjusting some configuration parameters. In general, if you are approaching HDFS block size, then you really just want HDFS and not HBase :) We are not currently running this in production, but we have had an experimental version of our media server that runs on top of HBase rather than the file system. It has a series of Python scripts (connected to HBase through our custom interface, you could use Java directly or Thrift/REST/etc) that are responsible for generating various thumbnail sizes. The originals are stored in HBase, and then a special query is run to grab the thumbnail of a certain size. If it exists in HBase already, it is just fetched and returned. Otherwise, it is generated (via PIL, Python Imaging Library, and some other custom tools), stored in HBase, and then returned to the client. As far as HBase on Windows goes... It's currently not possible but there has been some effort from Powerset/Microsoft to make it happen. I will yield to those more familiar with it. Personally, I run Windows on my primary work desktop and spend a good chunk of my time on HBase development. When I've wanted to spin up pseudo-distributed local clusters, I usually use a cheap Linux node or local Virtual Machine. In both cases, I use a Windows X Server and redirect output to my local Windows machine so I can run Eclipse and unit tests from my Windows GUI. Others have used Cygwin with some success, I believe. Hope that sheds some light for you. You are almost certainly right about not wanting to store this in an RDBMS. And a hybrid approach seems to make sense, especially as a first step. Jonathan Gray On Sun, June 7, 2009 6:44 am, Nitin Gupta wrote: > Hi All, > I am working on an application which is kind of a social network on mobile > WAP. Recently, we have incorporated the files or attachments support in > our application. Right now, since we are not in production yet, we are > keeping all the files in the RDBMS which our application is using. But I > am more than convinvced that this is not going to work once we are in > production mode. > > I got to know about HBase and I am making myself convice about its usage > for the file storage, search and retrieval operations. I would like my > opinion to be endorsed by expert HBase users/developers. Just for the > clarification, here is what I am planning to do: > > Make use of a RDBMS for relational data in the application. > All the files/blob data to be saved in the HBase. > When required, my application can query app data from the RDBMS and the > files can be retrieved from the HBase data store I will keep the meta data > of the files in my rdbms so that files can be associated with my apps > entities > > Please help me decide if this is the right approach. My app is supposed > to provide support for images as well. So if anyone can advice if HBase is > the right solution for me, in conjuction with an imaging tool. > > Since my team is predominantly Windows based, I would like to know is it > possible to run HBase on a windows machine in stand alone and in clustered > mode. > > Thanks for all your help. > > > nitin >