hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845
Date Mon, 25 Jan 2010 22:12:33 GMT
Yes, that's roughly what I was thinking...

2010/1/24 Sriram Muthuswamy Chittathoor <sriramc@ivycomptech.com>

> So on the reporting tables I will have to store by the keys I want to
> lookup by  for example
>
> 1.  One reporting table by  gameid
>
> 2.  Another one by same some other column like tournamentid
>
> So basically  create a reporting table based on how I want to query and
> this reporting table will be queried by it rowKey (which is native) and
> the column values will be what I want
>
> Etc.  Is that right ?
>
>
>
> -----Original Message-----
> From: Daniel Washusen [mailto:dan@reactive.org]
> Sent: Sunday, January 24, 2010 2:00 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
> HBASE-1845
>
> Sounds like it's some sort of reporting system. Have you considered
> duplicating data into reporting tables?
>
> Write all the game details into the main table then map reduce into
> your reporting tables...
>
> On 24/01/2010, at 7:07 PM, "Sriram Muthuswamy Chittathoor"
> <sriramc@ivycomptech.com
>  > wrote:
>
> > However, I'd only recommend using secondary index as a last resort.
> > First I'd try doing everything I can to work with the index I get for
> > free. The row key.  It sounds like you have done this already...
> > --
> >
> > The only reason why this is important to me is because of the
> > following
> >
> > 1.  I am storing at a minimal 1 yrs worth of data (small rows --  10
> > billion)
> >
> > 2.  Row key is   user + date   (columns  --   gameid ,  opponent etc)
> >
> > 3.  Queries may be something like give me details for a particular
> > "gameid"
> >
> > 4.  To do step 3  I am assuming I need something like a secondary
> > index
> > or else given my row key  how else can I do it
> >
> >
> >
> > -----Original Message-----
> > From: Daniel Washusen [mailto:dan@reactive.org]
> > Sent: Sunday, January 24, 2010 3:16 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Support for MultiGet / SQL In clause -- error in patch
> > HBASE-1845
> >
> > Well, it CAN be a RAM hog ;-). It depends what you're indexing. Each
> > unique value in the indexed column resides in memory. If you index a
> > column that contains 1 million random 1KB values then the index will
> > require at least 1GB of memory. Also it *can* slow down writes,
> > especially when bulk loading sequential keys.
> >
> > On the up side, it can make scans dramatically faster.
> >
> > However, I'd only recommend using secondary index as a last resort.
> > First I'd try doing everything I can to work with the index I get for
> > free. The row key.  It sounds like you have done this already...
> >
> > Cheers,
> > Dan
> >
> > On 24/01/2010, at 7:02 AM, Stack <stack@duboce.net> wrote:
> >
> >> On Sat, Jan 23, 2010 at 2:52 AM, Sriram Muthuswamy Chittathoor
> >> <sriramc@ivycomptech.com> wrote:
> >>> Thanks all.  I messed it up when I was trying to upgrade to
> >>> 0.20.3.  I deleted the data directory and formatted it thinking it
> >>> will reset the whole cluster.
> >>>
> >>> I started fresh by deleting the data directory on all the nodes and
> >>> then everything worked.  I was also able to create the indexed
> >>> table using the 0.20.3 patch.  Let me run some tests on a few
> >>> million rows and see how it holds up.
> >>>
> >>> BTW --  what would be the right way when I moved versions.  Do I
> >>> run migrate scripts to migrate the data to newer versions ?
> >>>
> >> Just install the new binaries every and restart or perform a rolling
> >> restart -- see http://wiki.apache.org/hadoop/Hbase/RollingRestart --
> >> if you would avoid taking down your cluster during the upgrade.
> >>
> >> You'll be flagged on start if you need to run a migration but general
> >> rule is that there (should) never be need of a migration between
> >> patch
> >> releases: e.g. between 0.20.2 to 0.20.3.  There may be need of
> >> migrations moving between minor numbers; e.g. from 0.19 to 0.20.
> >>
> >> Let us know how IHBase works out for you (indexed hbase).  Its a RAM
> >> hog but the speed improvement finding matching cells can be
> >> startling.
> >>
> >> St.Ack
> >>
> >>> -----Original Message-----
> >>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> >>> Stack
> >>> Sent: Saturday, January 23, 2010 5:00 AM
> >>> To: hbase-user@hadoop.apache.org
> >>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
> >>> HBASE-1845
> >>>
> >>> Check your master log.  Something is seriously off if you do not
> >>> have
> >>> a reachable .META. table.
> >>> St.Ack
> >>>
> >>> On Fri, Jan 22, 2010 at 1:09 PM, Sriram Muthuswamy Chittathoor
> >>> <sriramc@ivycomptech.com> wrote:
> >>>> I applied the hbase-0.20.3 version / hadoop 0.20.1.  But after
> >>>> starting
> >>>> hbase I keep getting the error below when I go to the hbase shell
> >>>>
> >>>> [ppoker@karisimbivir1 hbase-0.20.3]$ ./bin/hbase shell
> >>>> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> >>>> Version: 0.20.3, r900041, Sat Jan 16 17:20:21 PST 2010
> >>>> hbase(main):001:0> list
> >>>> NativeException:
> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> >>>> contact region server null for region , row '', but failed after 7
> >>>> attempts.
> >>>> Exceptions:
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>>
> >>>>
> >>>>
> >>>> Also when I try to create a table programatically I get this --
> >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Attempting connection
> >>>> to
> >>>> server localhost/127.0.0.1:2181
> >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Priming connection to
> >>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:43775
> >>>> remote=localhost/127.0.0.1:2181]
> >>>> 10/01/22 15:48:23 INFO zookeeper.ClientCnxn: Server connection
> >>>> successful
> >>>> Exception in thread "main"
> >>>> org.apache.hadoop.hbase.TableNotFoundException: .META.
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ionInMeta(HConnectionManager.java:684)
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ion(HConnectionManager.java:634)
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ion(HConnectionManager.java:601)
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ionInMeta(HConnectionManager.java:675)
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ion(HConnectionManager.java:638)
> >>>>       at
> >>>> org.apache.hadoop.hbase.client.HConnectionManager
> >>>> $TableServers.locateReg
> >>>> ion(HConnectionManager.java:601)
> >>>>       at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:
> >>>> 128)
> >>>>       at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:
> >>>> 106)
> >>>>       at test.CreateTable.main(CreateTable.java:36)
> >>>>
> >>>>
> >>>>
> >>>> Any clues ?
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Dan Washusen [mailto:dan@reactive.org]
> >>>> Sent: Friday, January 22, 2010 4:53 AM
> >>>> To: hbase-user@hadoop.apache.org
> >>>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
> >>>> HBASE-1845
> >>>>
> >>>> If you want to give the "indexed" contrib package a try you'll
> >>>> need to
> >>>> do
> >>>> the following:
> >>>>
> >>>>  1. Include the contrib jars (export HBASE_CLASSPATH=(`find
> >>>>  /path/to/hbase/hbase-0.20.3/contrib/indexed -name '*jar' | tr -s
> >>>> "\n"
> >>>> ":"`)
> >>>>  2. Set the 'hbase.hregion.impl' property to
> >>>>  'org.apache.hadoop.hbase.regionserver.IdxRegion' in your
> >>>> hbase-site.xml
> >>>>
> >>>> Once you've done that you can create a table with an index using:
> >>>>
> >>>>>    // define which qualifiers need an index (choosing the correct
> >>>> type)
> >>>>>    IdxColumnDescriptor columnDescriptor = new
> >>>>> IdxColumnDescriptor("columnFamily");
> >>>>>    columnDescriptor.addIndexDescriptor(
> >>>>>      new IdxIndexDescriptor("qualifier",
> >>>>> IdxQualifierType.BYTE_ARRAY)
> >>>>>    );
> >>>>>
> >>>>>    HTableDescriptor tableDescriptor = new HTableDescriptor
> >>>>> ("table");
> >>>>>    tableDescriptor.addFamily(columnDescriptor);
> >>>>>
> >>>>
> >>>> Then when you want to perform a scan with an index hint:
> >>>>
> >>>>>    Scan scan = new IdxScan(
> >>>>>          new Comparison("columnFamily", "qualifier",
> >>>>>              Comparison.Operator.EQ, Bytes.toBytes("foo"))
> >>>>>      );
> >>>>>
> >>>>
> >>>> You have to keep in mind that the index hint is only a hint.  It
> >>>> guarantees
> >>>> that your scan will get all rows that match the hint but you'll
> >>>> more
> >>>> than
> >>>> likely receive rows that don't.  For this reason I'd suggest that
> >>>> you
> >>>> also
> >>>> include a filter along with the scan:
> >>>>
> >>>>>      Scan scan = new IdxScan(
> >>>>>          new Comparison("columnFamily", "qualifier",
> >>>>>              Comparison.Operator.EQ, Bytes.toBytes("foo"))
> >>>>>      );
> >>>>>      scan.setFilter(
> >>>>>          new SingleColumnValueFilter(
> >>>>>              "columnFamily", "qualifer",
> >>>> CompareFilter.CompareOp.EQUAL,
> >>>>>              new BinaryComparator("foo")
> >>>>>          )
> >>>>>      );
> >>>>>
> >>>>
> >>>> Cheers,
> >>>> Dan
> >>>>
> >>>>
> >>>> 2010/1/22 stack <stack@duboce.net>
> >>>>
> >>>>>
> >>>>
> >
> http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/<http://people.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/>
> <http://peop
> >>>> le.apache.org/%7Ejdcryans/hbase-0.20.3-candidate-2/>
> >>>>>
> >>>>> There is a bit of documentation if you look at javadoc for the
> >>>>> 'indexed' contrib (This is what hbase-2073 is called on commit).
> >>>>>
> >>>>> St.Ack
> >>>>>
> >>>>> P.S. We had a thread going named "HBase bulk load".  You got all
> >>>>> the
> >>>>> answers you need on that one?
> >>>>>
> >>>>> On Thu, Jan 21, 2010 at 11:19 AM, Sriram Muthuswamy Chittathoor
> >>>>> <sriramc@ivycomptech.com> wrote:
> >>>>>>
> >>>>>> Great.  Can I migrate to 0.20.3RC2 easily.  I am on 0.20.2.
Can u
> >>>> pass
> >>>>>> me the link
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf
> >>>>>> Of
> >>>>>> stack
> >>>>>> Sent: Friday, January 22, 2010 12:42 AM
> >>>>>> To: hbase-user@hadoop.apache.org
> >>>>>> Subject: Re: Support for MultiGet / SQL In clause -- error in
> >>>>>> patch
> >>>>>> HBASE-1845
> >>>>>>
> >>>>>> IIRC, hbase-1845 was a sketch only and not yet complete.  Its
> >>>> probably
> >>>>>> rotted since any ways.
> >>>>>>
> >>>>>> Have you looked at hbase-2037 since committed and available
in
> >>>>>> 0.20.3RC2.
> >>>>>> Would this help you with your original problem?
> >>>>>>
> >>>>>> St.Ack
> >>>>>>
> >>>>>> On Thu, Jan 21, 2010 at 9:10 AM, Sriram Muthuswamy Chittathoor
<
> >>>>>> sriramc@ivycomptech.com> wrote:
> >>>>>>
> >>>>>>> I tried applying the patch to the hbase source code  hbase
> >>>>>>> 0.20.2
> >>>> and
> >>>>>> I
> >>>>>>> get the errors below.  Do you know if this needs to be applied
> >>>>>>> to
> >>>> a
> >>>>>>> specific hbase version. Is there a version which works with
> >>>>>>> 0.20.2
> >>>> or
> >>>>>>> later ??
> >>>>>>> Basically HRegionServer  and HTable patching fails.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks for the help
> >>>>>>>
> >>>>>>> patch -p0 -i batch.patch
> >>>>>>>
> >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Get.java
> >>>>>>> Hunk #1 succeeded at 61 (offset 2 lines).
> >>>>>>> Hunk #2 succeeded at 347 (offset 31 lines).
> >>>>>>> patching file
> >>>> src/java/org/apache/hadoop/hbase/client/HConnection.java
> >>>>>>> patching file
> >>>>>>> src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
> >>>>>>> Hunk #3 succeeded at 1244 (offset 6 lines).
> >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/
> >>>>>>> HTable.java
> >>>>>>> Hunk #2 succeeded at 73 (offset 8 lines).
> >>>>>>> Hunk #4 FAILED at 405.
> >>>>>>> Hunk #5 succeeded at 671 with fuzz 2 (offset 26 lines).
> >>>>>>> 1 out of 5 hunks FAILED -- saving rejects to file
> >>>>>>> src/java/org/apache/hadoop/hbase/client/HTable.java.rej
> >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Multi.java
> >>>>>>> patching file
> >>>>>> src/java/org/apache/hadoop/hbase/client/MultiCallable.java
> >>>>>>> patching file
> >>>> src/java/org/apache/hadoop/hbase/client/MultiResult.java
> >>>>>>> patching file src/java/org/apache/hadoop/hbase/client/Row.java
> >>>>>>> patching file
> >>>>>>> src/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
> >>>>>>> Hunk #2 succeeded at 156 with fuzz 1 (offset 3 lines).
> >>>>>>> patching file
> >>>>>> src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
> >>>>>>> Hunk #2 succeeded at 247 (offset 2 lines).
> >>>>>>> patching file
> >>>>>>> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> >>>>>>> Hunk #1 succeeded at 78 (offset -1 lines).
> >>>>>>> Hunk #2 FAILED at 2515.
> >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file
> >>>>>>>
> >>>> src/java/org/apache/hadoop/hbase/regionserver/
> >>>> HRegionServer.java.rej
> >>>>>>> patching file
> >>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java
> >>>>>>> Hunk #2 FAILED at 333.
> >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file
> >>>>>>> src/test/org/apache/hadoop/hbase/client/TestHTable.java.rej
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Marc Limotte [mailto:mslimotte@gmail.com]
> >>>>>>> Sent: Tuesday, January 19, 2010 10:26 PM
> >>>>>>> To: hbase-user@hadoop.apache.org
> >>>>>>> Subject: Re: Support for MultiGet / SQL In clause
> >>>>>>>
> >>>>>>> Sriram,
> >>>>>>>
> >>>>>>> Would a secondary index help you:
> >>>>>>>
> >>>>>>
> >>>>
> >
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/
> >>>>>>> client/tableindexed/package-summary.html#package_description
> >>>>>>> .
> >>>>>>>
> >>>>>>> The index is stored in a separate table, but the index is
> >>>>>>> managed
> >>>> for
> >>>>>>> you.
> >>>>>>>
> >>>>>>> I don't think you can do an arbitrary "in" query, though.
 If
> >>>>>>> the
> >>>> keys
> >>>>>>> that
> >>>>>>> you want to include in the "in" are reasonably close neighbors,
> >>>> you
> >>>>>>> could do
> >>>>>>> a scan and skip ones that are uninteresting.  You could
also
> >>>>>>> try a
> >>>>>> batch
> >>>>>>> Get
> >>>>>>> by applying a separate patch, see
> >>>>>>> http://issues.apache.org/jira/browse/HBASE-1845.
> >>>>>>>
> >>>>>>> Marc Limotte
> >>>>>>>
> >>>>>>> On Tue, Jan 19, 2010 at 8:45 AM, Sriram Muthuswamy Chittathoor
<
> >>>>>>> sriramc@ivycomptech.com> wrote:
> >>>>>>>
> >>>>>>>> Is there any support for this.  I want to do this
> >>>>>>>>
> >>>>>>>> 1.  Create a second table to maintain mapping between
secondary
> >>>>>> column
> >>>>>>>> and the rowid's of the primary table
> >>>>>>>>
> >>>>>>>> 2.  Use this second table to get the rowid's to lookup
from the
> >>>>>>> primary
> >>>>>>>> table using a SQL In like clause ---
> >>>>>>>>
> >>>>>>>> Basically I am doing this to speed up querying by  Non-row
key
> >>>>>>> columns.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> Sriram C
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This email is sent for and on behalf of Ivy Comptech
Private
> >>>>>> Limited.
> >>>>>>> Ivy
> >>>>>>>> Comptech Private Limited is a limited liability company.
> >>>>>>>>
> >>>>>>>> This email and any attachments are confidential, and
may be
> >>>> legally
> >>>>>>>> privileged and protected by copyright. If you are not
the
> >>>> intended
> >>>>>>> recipient
> >>>>>>>> dissemination or copying of this email is prohibited.
If you
> >>>> have
> >>>>>>> received
> >>>>>>>> this in error, please notify the sender by replying
by email
> >>>>>>>> and
> >>>>>> then
> >>>>>>> delete
> >>>>>>>> the email completely from your system.
> >>>>>>>> Any views or opinions are solely those of the sender.
 This
> >>>>>>> communication
> >>>>>>>> is not intended to form a binding contract on behalf
of Ivy
> >>>> Comptech
> >>>>>>> Private
> >>>>>>>> Limited unless expressly indicated to the contrary and
properly
> >>>>>>> authorised.
> >>>>>>>> Any actions taken on the basis of this email are at
the
> >>>> recipient's
> >>>>>>> own
> >>>>>>>> risk.
> >>>>>>>>
> >>>>>>>> Registered office:
> >>>>>>>> Ivy Comptech Private Limited, Cyber Spazio, Road No.
2, Banjara
> >>>>>> Hills,
> >>>>>>>> Hyderabad 500 033, Andhra Pradesh, India. Registered
number:
> >>>> 37994.
> >>>>>>>> Registered in India. A list of members' names is available
for
> >>>>>>> inspection at
> >>>>>>>> the registered office.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>> This email is sent for and on behalf of Ivy Comptech Private
> >>> Limited. Ivy Comptech Private Limited is a limited liability
> >>> company.
> >>>
> >>> This email and any attachments are confidential, and may be legally
> >>> privileged and protected by copyright. If you are not the intended
> >>> recipient dissemination or copying of this email is prohibited. If
> >>> you have received this in error, please notify the sender by
> >>> replying by email and then delete the email completely from your
> >>> system.
> >>> Any views or opinions are solely those of the sender.  This
> >>> communication is not intended to form a binding contract on behalf
> >>> of Ivy Comptech Private Limited unless expressly indicated to the
> >>> contrary and properly authorised. Any actions taken on the basis of
> >>> this email are at the recipient's own risk.
> >>>
> >>> Registered office:
> >>> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara
> >>> Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number:
> >>> 37994. Registered in India. A list of members' names is available
> >>> for inspection at the registered office.
> >>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message