hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: [Indexed HBase] Can I add index in an existing table?
Date Thu, 18 Mar 2010 21:23:18 GMT
Interesting... it looks like it's because you are using the CHAR_ARRAY index
type but the data is not in the expected format.  All the index types expect
that the data in the cells has been generated using
the org.apache.hadoop.hbase.util.Bytes#toBytes methods.  The CHAR_ARRAY type
expects that the byte array in the cell has been generated using
the org.apache.hadoop.hbase.util.Bytes#toBytes(char[]) method.  It's failing
because the org.apache.hadoop.hbase.util.Bytes#toChars(byte[]) method is
returning null.  The Bytes#toChars(byte[]) method is returning null because
the provided byte array does not evenly divide in two (the format defined
by Bytes#toBytes(char[])).

It's obviously an issue that you could get your table into it's current
state but I'm not really sure how it should be 'fixed'.  In the mean time
you should be able to use the BYTE_ARRAY type without any issues...

Cheers,
Dan

p.s. During your testing watch your memory footprint carefully.  The index
stores each unique cell value in memory...

2010/3/18 Vukasin Toroman <vukasin@toroman.name>

> Hi Dan,
>
> the logs are mostly filled with compaction messages (and are fairly large)
> so I made a small list of "highlights"
>
>
> here  is the regionserver error I keep seeing:
>
> http://pastebin.com/xiFza45B
>
> It keeps repeating on each regionserver in a "round-robin" fashion as you
> will see in the master log here:
>
> http://pastebin.com/eKsHUynK
>
> The master is starting up and trying to assign the region but keeps getting
> errors from all regionservers. It might be important to say that the
> existing data is that table is scarce and that the cell which is used for
> indexing does not exist in each row (could that be the reason for the
> "needle" being null). Here is the code I used to add the indexing to the
> table (it is a shell add-on)
>
> http://pastebin.com/9K9V3ARW
>
>
> Hope this helps.
>
> Thx,
> Vukasin
>
>
>
> On Mar 16, 2010, at 21:02 , Dan Washusen wrote:
>
> > Hi Vukasin,
> > Would you be able to find the region server log that contains that error
> > message and post it up on pastebin (or email it to me)?  Preferably the
> > entire log file with debug logging turned on...
> >
> > Cheers,
> > Dan
> >
> > On 17 March 2010 02:09, Vukasin Toroman <vukasin@toroman.name> wrote:
> >
> >> Hi,
> >>
> >> could you post the output of "describe" for that table? I am having a
> very
> >> similar problem (added an index to an existing table and getting the
> >> NoServerForRegionException). This is what I am getting when typing
> "describe
> >> <TableName>" in the shell:
> >>
> >> {NAME => 'Records', FAMILIES => [{NAME => 'data', VERSIONS => '5',
> >> COMPRESSION => 'GZ', INDEX_DESC =>
> >> '5org.apache.hadoop.hbase.client.idx.IdxIndexDescriptoramazon.de:rating
> >> CHAR_ARRAY', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> >> 'false', BLOCKCACHE => 'true'}, {NAME => 'permissions', VERSIONS =>
'1',
> >> COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY
> >> => 'false', BLOCKCACHE => 'true'}]}
> >>
> >> I am seeing the following exception on startup in the master log:
> >> Processing MSG_REPORT_CLOSE: Records,,1267632504185:
> >> java.lang.IllegalArgumentException: Argument 'needle' cannot be null
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.idx.support.arrays.BinarySearch.search(BinarySearch.java:58)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.idx.support.arrays.BinarySearch.search(BinarySearch.java:108)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.CompleteIndexBuilder.addKeyValue(CompleteIndexBuilder.java:119)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.IdxRegionIndexManager.fillIndex(IdxRegionIndexManager.java:198)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.IdxRegionIndexManager.rebuildIndexes(IdxRegionIndexManager.java:112)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.IdxRegion.rebuildIndexes(IdxRegion.java:126)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.IdxRegion.initialize(IdxRegion.java:108)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531)
> >>       at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451)
> >>       at java.lang.Thread.run(Thread.java:619)
> >>
> >>
> >> the region Records,,1267632504185 is the one being reported in the
> >> NoServerForRegionException. Right now I am unable to disable the table
> so
> >> I'm kinda stuck :-(
> >>
> >> greetz,
> >> Vukasin
> >>
> >>
> >>> Also, could you try rolling back to your original configuration and
> >>> check that everything goes back to normal?
> >>>
> >>> 2010/3/5 Dan Washusen <dan@reactive.org>:
> >>>> Then something is not right...  Could you post up a region server log
> >> in
> >>>> pastebin?
> >>>>
> >>>> 2010/3/5 Ted Yu <yuzhihong@gmail.com>
> >>>>
> >>>>> I searched all existing hbase logs but didn't see 'Filled indices
for
> >>>>> region'
> >>>>>
> >>>>> 2010/3/4 Dan Washusen <dan@reactive.org>
> >>>>>
> >>>>>> Hi Ted,
> >>>>>> Did you verify that all the regions came back online after
> >> re-enabling
> >>>>> the
> >>>>>> table?  Depending on the size of the table it may time some
time...
> >>>>>>
> >>>>>> You should see something like the following logged in the region
> >> server
> >>>>>> logs
> >>>>>> for each region:
> >>>>>>
> >>>>>>> Filled indices for region: 'ruletable,,1267641828807' with
> >> 55555555
> >>>>>> entries
> >>>>>>> in 00:05:99
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Dan
> >>>>>>
> >>>>>> 2010/3/5 Ted Yu <yuzhihong@gmail.com>
> >>>>>>
> >>>>>>> 2010/3/3 Ted Yu <yuzhihong@gmail.com>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> I wrote a utility to add index to my table.
> >>>>>>>> After running it, I couldn't see the rows in that table
I saw
> >> before.
> >>>>>>>>
> >>>>>>>> hbase(main):007:0> count 'ruletable'
> >>>>>>>> NativeException:
> >>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
> >>>>>>>> No server address listed in .META. for region
> >>>>> ruletable,,1267641828807
> >>>>>>>>
> >>>>>>>> My code follows, how do I determine the correct IdxQualifierType
> >> ?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    HBaseAdmin admin = new HBaseAdmin(conf);
> >>>>>>>>    admin.disableTable(tableName);
> >>>>>>>>    System.out.println(tableName + " disabled");
> >>>>>>>>
> >>>>>>>>    for (int i = 1; i < otherArgs.length; i+=2)
> >>>>>>>>    {
> >>>>>>>>        String colFam = otherArgs[i];
> >>>>>>>>        byte[] familyName = Bytes.toBytes(colFam);
> >>>>>>>>        byte[] qualifier = Bytes.toBytes(otherArgs[i+1]);
> >>>>>>>>
> >>>>>>>>        IdxColumnDescriptor idxColumnDescriptor = new
> >>>>>>>> IdxColumnDescriptor(familyName);
> >>>>>>>>        IdxIndexDescriptor indexDescriptor  = new
> >>>>>>>> IdxIndexDescriptor(qualifier, IdxQualifierType.CHAR_ARRAY);
> >>>>>>>>        idxColumnDescriptor.addIndexDescriptor(indexDescriptor);
> >>>>>>>>
> >>>>>>>>        admin.modifyColumn(tableName, colFam,
> >> idxColumnDescriptor);
> >>>>>>>>        System.out.println(colFam + ":" + otherArgs[i+1]
+ "
> >>>>> indexed");
> >>>>>>>>    }
> >>>>>>>>    admin.enableTable(tableName);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2010/2/26 Ram Kulbak <ram.kulbak@gmail.com>
> >>>>>>>>
> >>>>>>>>> You will need to use an IdxColumnDescriptor:
> >>>>>>>>>
> >>>>>>>>> Here's a code example for creating a table with
a byte array
> >> index:
> >>>>>>>>>
> >>>>>>>>>   HTableDescriptor tableDescriptor = new
> >>>>>> HTableDescriptor(TABLE_NAME);
> >>>>>>>>>   IdxColumnDescriptor idxColumnFamilyDescriptor
= new
> >>>>>>>>> IdxColumnDescriptor(FAMILY_NAME);
> >>>>>>>>>   try {
> >>>>>>>>>     idxColumnFamilyDescriptor.addIndexDescriptor(
> >>>>>>>>>       new IdxIndexDescriptor(QUALIFIER_NAME,
> >>>>>>> IdxQualifierType.BYTE_ARRAY)
> >>>>>>>>>     );
> >>>>>>>>>   } catch (IOException e) {
> >>>>>>>>>     throw new IllegalStateException(e);
> >>>>>>>>>   }
> >>>>>>>>>   tableDescriptor.addFamily(idxColumnFamilyDescriptor);
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> You can add several index descriptors to the same
column family
> >>> and
> >>>>>>>>> you can put indexes on more than one column families.
You
> >> should
> >>> use
> >>>>>>>>> IdxScan with an
> >> org.apache.hadoop.hbase.client.idx.exp.Expression
> >>>>> set
> >>>>>>>>> to match your query criteria. The expression may
cross columns
> >>> from
> >>>>>>>>> the same or different families using ANDs and ORs.
> >>>>>>>>>
> >>>>>>>>> Note that several index types are supported. Current
types
> >> include
> >>>>> all
> >>>>>>>>> basic types and BigDecimals. Char arrays are also
supported.
> >> Types
> >>>>>>>>> allow for correct range checking (for example you
can quickly
> >>>>> evaluate
> >>>>>>>>> a scan getting all rows for which a given column
has values
> >> between
> >>>>> 42
> >>>>>>>>> and 314). You should make sure that columns which
are indexed
> >>> with a
> >>>>>>>>> given qualifier type are actually populated with
bytes matching
> >>>>> their
> >>>>>>>>> type, e.g. it you use IdxQualifierType.LONG make
sure that you
> >>>>>>>>> actually put values which are 8-long byte arrays
which were
> >> produced
> >>>>>>>>> in a method similar to Bytes.toBytes(long).
> >>>>>>>>>
> >>>>>>>>> Yoram
> >>>>>>>>>
> >>>>>>>>> 2010/2/27 Ted Yu <yuzhihong@gmail.com>:
> >>>>>>>>>> Ram:
> >>>>>>>>>> How do I specify index in HColumnDescriptor
that is passed
> >>> to
> >>>>>>>>> modifyColumn()
> >>>>>>>>>> ?
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>>
> >>>>>>>>>> 2010/2/26 Ram Kulbak <ram.kulbak@gmail.com>
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Shen,
> >>>>>>>>>>>
> >>>>>>>>>>> The first thing you need to verify is that
you can switch
> >>> to the
> >>>>>>>>>>> IdxRegion implementation without problems.
I've just
> >>> checked that
> >>>>>> the
> >>>>>>>>>>> following steps work on the PerformanceEvaluation
tables.
> >>> I would
> >>>>>>>>>>> suggest you backup your hbase production
instance before
> >>>>> attempting
> >>>>>>>>>>> this (or create and try it out on a sandbox
instance)
> >>>>>>>>>>>
> >>>>>>>>>>> * Stop hbase
> >>>>>>>>>>> * Edit  conf/hbase-env.sh file and add IHBASE
to your
> >>> classpath.
> >>>>>>>>>>> Here's an example which assumes you don't
need to add
> >>> anything
> >>>>> else
> >>>>>>> to
> >>>>>>>>>>> your classpath, make sure the HBASE_HOME
is defined or
> >>> simply
> >>>>>>>>>>> substiute it with the full path of hbase
installation
> >>> directory:
> >>>>>>>>>>>  export HBASE_CLASSPATH=(`find $HBASE_HOME/contrib/indexed
> >>> -name
> >>>>>>>>>>> '*jar' | tr -s "\n" ":"`)
> >>>>>>>>>>>
> >>>>>>>>>>> * Edit conf/hbase-site.xml and set IdxRegion
to be the
> >>> region
> >>>>>>>>>>> implementation:
> >>>>>>>>>>>
> >>>>>>>>>>> <property>
> >>>>>>>>>>>    <name>hbase.hregion.impl</name>
> >>>>>>>>>>>
> >> <value>org.apache.hadoop.hbase.regionserver.IdxRegion</value>
> >>>>>>>>>>> </property>
> >>>>>>>>>>>
> >>>>>>>>>>> * Propagate the configuration to all slaves
> >>>>>>>>>>> * Start HBASE
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Next, modify the table you want to index
using code similar
> >>> to
> >>>>>> this:
> >>>>>>>>>>>
> >>>>>>>>>>>   HBaseConfiguration conf = new HBaseConfiguration();
> >>>>>>>>>>>
> >>>>>>>>>>>   HBaseAdmin admin = new HBaseAdmin(conf);
> >>>>>>>>>>>   admin.disableTable(TABLE_NAME);
> >>>>>>>>>>>   admin.modifyColumn(TABLE_NAME, FAMILY_NAME1,
> >>>>>>>>> IDX_COLUMN_DESCRIPTOR1);
> >>>>>>>>>>>         ...
> >>>>>>>>>>>   admin.modifyColumn(TABLE_NAME, FAMILY_NAMEN,
> >>>>>>>>> IDX_COLUMN_DESCRIPTORN);
> >>>>>>>>>>> admin.enableTable(TABLE_NAME);
> >>>>>>>>>>>
> >>>>>>>>>>> Wait for the table to get indexed. This
may take a few
> >>> minutes.
> >>>>>> Check
> >>>>>>>>>>> the master web page and verify your index
definitions
> >>> appear
> >>>>>>> correctly
> >>>>>>>>>>> in the table description.
> >>>>>>>>>>>
> >>>>>>>>>>> This is it. Please let me know how it goes.
> >>>>>>>>>>>
> >>>>>>>>>>> Yoram
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 2010/2/26 ChingShen <chingshenchen@gmail.com>:
> >>>>>>>>>>>> Thanks, But I think I need the indexed
HBase rather
> >>> than
> >>>>>>>>> transactional
> >>>>>>>>>>>> HBase.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Shen
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2010/2/26 <y_823910@tsmc.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> You can try my code to create a
index in the
> >>> existing table.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> public void AddIdx2ExistingTable(String
tablename,String
> >>>>>>>>>>>>> columnfamily,String idx_column)
throws IOException
> >>> {
> >>>>>>>>>>>>>           IndexedTableAdmin admin
= null;
> >>>>>>>>>>>>>         admin = new IndexedTableAdmin(config);
> >>>>>>>>>>>>>         admin.addIndex(Bytes.toBytes(tablename),
> >>> new
> >>>>>>>>>>>>> IndexSpecification(idx_column,
> >>>>>>>>>>>>>         Bytes.toBytes(columnfamily+":"+idx_column)));
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Fleming Chiu(邱宏明)
> >>>>>>>>>>>>> 707-6128
> >>>>>>>>>>>>> y_823910@tsmc.com
> >>>>>>>>>>>>> 週一無肉日吃素救地球(Meat
Free Monday
> >>> Taiwan)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                     ChingShen
> >>>>>>>>>>>>>                     <chingshenchen@gm
> >>>       To:
> >>>>>> hbase-user
> >>>>>>> <
> >>>>>>>>>>>>> hbase-user@hadoop.apache.org>
> >>>>>>>>>>>>>                     ail.com>
> >>>                cc:      (bcc:
> >>>>>>>>>>>>> Y_823910/TSMC)
> >>>>>>>>>>>>>
> >>>                Subject:
> >>>>> [Indexed
> >>>>>>>>> HBase]
> >>>>>>>>>>>>> Can
> >>>>>>>>>>>>> I add index in an existing table?
> >>>>>>>>>>>>>                     2010/02/26 10:18
> >>>>>>>>>>>>>                     AM
> >>>>>>>>>>>>>                     Please respond
> >>> to
> >>>>>>>>>>>>>                     hbase-user
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I got http://issues.apache.org/jira/browse/HBASE-2037
> >>> that
> >>>>> can
> >>>>>>>>> create a
> >>>>>>>>>>>>> new
> >>>>>>>>>>>>> table with index, but can I add
index in an
> >>> existing table?
> >>>>>>>>>>>>> Any code examples?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Shen
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> ---------------------------------------------------------------------------
> >>>>>>>>>>>>>
> >>>                          TSMC
> >>>>>>>>> PROPERTY
> >>>>>>>>>>>>> This email communication (and any
attachments)
> >>> is proprietary
> >>>>>>>>>>>>> information
> >>>>>>>>>>>>> for the sole use of its
> >>>>>>>>>>>>> intended recipient. Any unauthorized
review,
> >>> use or
> >>>>>> distribution
> >>>>>>> by
> >>>>>>>>>>>>> anyone
> >>>>>>>>>>>>> other than the intended
> >>>>>>>>>>>>> recipient is strictly prohibited.
 If you
> >>> are not the
> >>>>> intended
> >>>>>>>>>>>>> recipient,
> >>>>>>>>>>>>> please notify the sender by
> >>>>>>>>>>>>> replying to this email, and then
delete this
> >>> email and any
> >>>>>> copies
> >>>>>>>>> of
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> immediately. Thank you.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> ---------------------------------------------------------------------------
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> *****************************************************
> >>>>>>>>>>>> Ching-Shen Chen
> >>>>>>>>>>>> Advanced Technology Center,
> >>>>>>>>>>>> Information & Communications Research
Lab.
> >>>>>>>>>>>> E-mail: chenchingshen@itri.org.tw
> >>>>>>>>>>>> Tel:+886-3-5915542
> >>>>>>>>>>>> *****************************************************
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message