hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaeyun Noh" <metal...@gmail.com>
Subject Re: map reduce range of records from hbase table
Date Thu, 09 Oct 2008 21:32:17 GMT
Another question:

Does Hbase will support so-called "multi-get" function, which receives set
of rowkeys as an input parameter and returns set of RowResults as an output?
It might be useful in case of having distinct set of rowkeys. similiar with
IN (.., ) condition of SQL.

2008/10/9 Jaeyun Noh <metalain@gmail.com>

> I wonder if the network RPC involves whenever we call next() if scanner
> class.
> Also if the scanner works as a manner of parallel-request to Hregions and
> fetch to temporary cache of Hbase clients.
> If so, we're happy to live with that.
>
> Is the following hbase parameter related to my question?
>
> <property>
>
>     <name>hbase.client.scanner.caching</name>
>
>     <value>30</value>
>
>     <description>Number of rows that will be fetched when calling next
>
>     on a scanner if it is not served from memory. Higher caching values
>
>     will enable faster scanners but will eat up more memory and some
>
>     calls of next may take longer and longer times when the cache is empty.
>
>     </description>
>
>   </property>
>
> Regards, Jaeyun Noh.
>
>
> On Wed, Oct 8, 2008 at 10:10 PM, stack <stack@duboce.net> wrote:
>
>> On Wed, Oct 8, 2008 at 9:01 PM, Jaeyun Noh <metalain@gmail.com> wrote:
>>
>> > Thx.
>> >
>> > BTW, it seems that the output format (subclass of
>> > org.apache.hadoop.mapred.OutputFormat) of MR job can only be a file. Can
>> we
>> > define our own file format which hbase clients can access?
>>
>>
>> No.  You can output to anything as long as you make it implement
>> OutputFormat.  To output to hbase subclass TableReduce or see
>> TableOutputFormat.
>>
>>
>> >
>> >
>> > My goal is to implement filter-enabled table scanner which runs by
>> > multi-process clients using MR. I'm trying to leverage MR since the
>> > ClientScanner class of HTable sequencially access Hregion and thus
>> involves
>> > multiple round trips btw servers and clients.
>>
>>
>> I'm not sure I follow.  Perhaps start simple then see where the
>> bottlenecks
>> are and optimize here.  Regards roundtrips between client and server, what
>> you want? A scanner that returns batches rather than row at at time?
>>
>> St.Ack
>>
>>
>>
>>
>>
>> >
>> >
>> > On Wed, Oct 8, 2008 at 4:30 PM, stack <stack@duboce.net> wrote:
>> >
>> > > Jaeyun Noh wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> May I ask another question?
>> > >>
>> > >> I'm running HBase/Hadoop on linux server, and implementing business
>> > >> application with java, which runs on a different windows machine.
>> > >>  It looks like MapReduce job runs on a server node. Can I run the
>> > >> MapReduce
>> > >> job built on windows client with an existing linux server? How can
we
>> > get
>> > >> result done by MapReduce job at the server?
>> > >>
>> > >>
>> > >
>> > > You should be able to, yes.  Make sure you use same java on both
>> > machines.
>> > >  This page might help some:
>> > http://wiki.apache.org/hadoop/Hbase/MapReduce.
>> > > St.Ack
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message