hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Fastest way to read only the keys of a HTable?
Date Thu, 03 Feb 2011 23:09:55 GMT
Awesome!  It's instantaneous now.  Thanks a bunch.  Any such tricks for code
that looks like this...

      Get get = new Get(Bytes.toBytes(code));
      Result result = table.get(get);
      NavigableMap<byte[], byte[]> map =
result.getFamilyMap(Bytes.toBytes("Keys"));
      if (map != null) {
        for (Map.Entry<byte[], byte[]> entry : map.entrySet()) {
          String key = Bytes.toString(entry.getValue());
          Get get1 = new Get(Bytes.toBytes(key));
          Result imp = table2.get(get1);
          // Do something with the result...
        }
      }

Basically, I am reading the first table by a key (code).  The "Keys" family
contains keys of some other table, so I get each key from that family and
retrieve row from the other table.

Thanks again.

On Thu, Feb 3, 2011 at 2:17 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> On the scan, you can setCaching with the number of rows you want to
> pre-fetch per RPC. Setting it to 2 is already 2x better than the
> default.
>
> J-D
>
> On Thu, Feb 3, 2011 at 1:35 PM, Something Something
> <mailinglists19@gmail.com> wrote:
> > After adding the following line:
> >
> > scan.addFamily(Bytes.toBytes("Info"));
> >
> > performance improved dramatically (Thank you both!).  But now I want it
> to
> > perform even faster, if possible -:)  To read 43 rows, it's taking 2
> > seconds.  Eventually, the 'partner' table may have over 500 entries.  I
> > guess, I will try by moving the recently added family to a different
> table.
> >  Do you think that might help?
> >
> > Thanks again.
> >
> >
> > On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jgray@fb.com> wrote:
> >
> >> If you only need to consider a single column family, use
> Scan.addFamily()
> >> on your scanner.  Then there will be no impact of the other column
> families.
> >>
> >> > -----Original Message-----
> >> > From: Something Something [mailto:mailinglists19@gmail.com]
> >> > Sent: Thursday, February 03, 2011 11:28 AM
> >> > To: user@hbase.apache.org
> >> > Subject: Re: Fastest way to read only the keys of a HTable?
> >> >
> >> > Hmm.. performance hasn't improved at all.  Do you see anything wrong
> with
> >> > the following code:
> >> >
> >> >
> >> >     public List<Partner> getPartners() {
> >> >       ArrayList<Partner> partners = new ArrayList<Partner>();
> >> >
> >> >       try {
> >> >           HTable table = new HTable("partner");
> >> >           Scan scan = new Scan();
> >> >           scan.setFilter(new FirstKeyOnlyFilter());
> >> >           ResultScanner scanner = table.getScanner(scan);
> >> >           Result result = scanner.next();
> >> >           while (result != null) {
> >> >               Partner partner = new
> >> > Partner(Bytes.toString(result.getRow()));
> >> >               partners.add(partner);
> >> >               result = scanner.next();
> >> >           }
> >> >       } catch (IOException e) {
> >> >           throw new RuntimeException(e);
> >> >       }
> >> >       return partners;
> >> >   }
> >> >
> >> > May be I shouldn't use more than one "column family" in a HTable - but
> >> the
> >> > BigTable paper recommends that, doesn't it?  Please advice and thanks
> for
> >> > your help.
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <stack@duboce.net> wrote:
> >> >
> >> > > I don't see a getKey on Result.  Use
> >> > >
> >> > >
> >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result
> .
> >> > > html#getRow()
> >> > > .
> >> > >
> >> > > Here is how its used in the shell table.rb class:
> >> > >
> >> > >    # Count rows in a table
> >> > >    def count(interval = 1000, caching_rows = 10)
> >> > >      # We can safely set scanner caching with the first key only
> filter
> >> > >      scan = org.apache.hadoop.hbase.client.Scan.new
> >> > >      scan.cache_blocks = false
> >> > >      scan.caching = caching_rows
> >> > >
> >> > >
> scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >> > >
> >> > >      # Run the scanner
> >> > >      scanner = @table.getScanner(scan)
> >> > >      count = 0
> >> > >      iter = scanner.iterator
> >> > >
> >> > >      # Iterate results
> >> > >      while iter.hasNext
> >> > >        row = iter.next
> >> > >        count += 1
> >> > >        next unless (block_given? && count % interval == 0)
> >> > >        # Allow command modules to visualize counting process
> >> > >        yield(count, String.from_java_bytes(row.getRow))
> >> > >      end
> >> > >
> >> > >      # Return the counter
> >> > >      return count
> >> > >    end
> >> > >
> >> > >
> >> > > St.Ack
> >> > >
> >> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> >> > > <mailinglists19@gmail.com> wrote:
> >> > > > Thanks.  So I will add this...
> >> > > >
> >> > > >   scan.setFilter(new FirstKeyOnlyFilter());
> >> > > >
> >> > > > But after I do this...
> >> > > >
> >> > > >   Result result = scanner.next();
> >> > > >
> >> > > > There's no...  result.getKey() - so what method would give me
the
> >> > > > Key
> >> > > value?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <stack@duboce.net>
wrote:
> >> > > >
> >> > > >> See
> >> > > >>
> >> > >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> >> > > yOnlyFilter.html
> >> > > >> St.Ack
> >> > > >>
> >> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> > > >> <mailinglists19@gmail.com> wrote:
> >> > > >> > I want to read only the keys in a table. I tried this...
> >> > > >> >
> >> > > >> >    try {
> >> > > >> >
> >> > > >> >  HTable table = new HTable("myTable");
> >> > > >> >
> >> > > >> >  Scan scan = new Scan();
> >> > > >> >
> >> > > >> >  scan.addFamily(Bytes.toBytes("Info"));
> >> > > >> >
> >> > > >> >  ResultScanner scanner = table.getScanner(scan);
> >> > > >> >
> >> > > >> >   Result result = scanner.next();
> >> > > >> >
> >> > > >> >  while (result != null) {
> >> > > >> >
> >> > > >> > & so on...
> >> > > >> >
> >> > > >> > This was performing fairly well until I added another
Family
> that
> >> > > >> contains
> >> > > >> > lots of key/value pairs.  My understanding was that
adding
> >> > > >> > another
> >> > > family
> >> > > >> > wouldn't affect performance of this code because I am
> explicitly
> >> > > >> > using "Info", but it is.
> >> > > >> >
> >> > > >> > Anyway, in this particular use case, I only care about
the
> "Key"
> >> > > >> > of
> >> > > the
> >> > > >> row.
> >> > > >> >  I don't need any values from any of the families. 
What's the
> >> > > >> > best
> >> > > way
> >> > > >> to
> >> > > >> > do this?
> >> > > >> >
> >> > > >> > Please let me know.  Thanks.
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message