hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Secondary index
Date Mon, 21 Mar 2011 20:37:04 GMT
Hey Wade!

It's great that you take some time to write a blog post about that,
I'm sure it's going to be useful to others too!

Rest of my answer is inline.

J-D

> I am playing with htable.batch for multi get to see if I can remove my
> external hbase indexes. This is what I am trying to do.
>
> #1 What is the best model for a column family that is just used as an index?
>
> Currently I am using a columns family _idx_ with:
> column:<row>
> value:<timstamp>
>
> This allows me to have a new column for each index with a value of when it
> was added. That allows me to purge the column family by the value greater
> than some time in Map reduce.

So I guess you're not using TTL because you don't want to remove the
most recent cell in a qualifier?

Recently I started recommending using short family names as it is
stored along every value in memory and on disk, so in your case you
could save a few bytes per row by having  shorter name than _idx_

>
> #2 What is the most performant way to get this back into a get object? This
> is what I am doing so far but want to validate my thoughts.

- So I guess that in your code base, compared to this standalone code,
you reuse the config object? If not, well do reuse it.

- Same comment regarding HTables, they should be reused. Not kidding.

- Why are you creating a HTD each time? Can't you just create the
HTable directly?

- I wonder why you're doing a getFamilyMap call, if all you want is
those keys then do a result.raw and iterate through that. Like the
javadoc says: "This API is faster than using getFamilyMap() and
getMap()"

- You should delay creating the array of Gets until you know how many
objects you need in order to create the list directly with the right
size.

>
>        Configuration config = HBaseConfiguration.create();
>        HTableDescriptor transactionsbycompany_descriptor = new
> HTableDescriptor(table);
>        HTable transactionsbycompany_table = new HTable(config,
> transactionsbycompany_descriptor.getName());
>        HTable transactions_table = new HTable(config,
> transactions_descriptor.getName());
>               List<Row>  gets = new ArrayList<Row>();
>                Get g = new Get(Bytes.toBytes(key));
>        Result result = transactionsbycompany_table.get(g);
>               NavigableMap<byte[], byte[]>  nmap =
> result.getFamilyMap(Bytes.toBytes(colfam_index));
>
>        Set<byte[]>  keySet = nmap.keySet();
>        Iterator<byte[]>  iter = keySet.iterator();
>        HTableDescriptor transactions_descriptor = new
> HTableDescriptor("transactions");
>
>        while (iter.hasNext()) {
>            byte[] idx_key = iter.next();
>            Get get = new Get(idx_key);
>            get.addColumn(Bytes.toBytes("details"), Bytes.toBytes("amount"));
>            gets.add(get);
>
>        }
>        Result[] multiRes = new Result[gets.size()];
>        try {
>            transactions_table.batch(gets, multiRes);
>        } catch (InterruptedException e) {
>            // TODO Auto-generated catch block
>
>            e.printStackTrace();
>        }
>
>
>
> With appreciation;
> Wade Arnold
>
>

Mime
View raw message