hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Dorner <christopher.dor...@gmail.com>
Subject Re: Best way to write to multiple tables in one map-only job
Date Tue, 04 Oct 2011 14:20:25 GMT
Thank you for the hint.

What about autoflush then? Is that also something i can set using the 
config on job setup? Or does it onyl work with an HTable instance? 
Somehow i can't really find the right information :)

Regards,
Christopher

Am 03.10.2011 19:20, schrieb Jean-Daniel Cryans:
> Option a) and b) are the same since MultiTableOutputFormat internally
> uses multiple HTables. See for yourself:
>
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java
>
> Also you can set the write buffer but setting
> hbase.client.write.buffer on the configuration that your pass in the
> job setup.
>
> Using HTablePool in a single threaded application doesn't offer more
> than just storage for your HTables.
>
> Hope that helps,
>
> J-D
>
> On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner
> <christopher.dorner@gmail.com>  wrote:
>> Hallo,
>>
>> i am building a RDF Store using HBase and experimenting with different index
>> tables and Schema Designs.
>>
>> For the input, i have a File where each line is a RDF triple in N3 Format.
>>
>> I need to write to multiple Tables since i need to build several index
>> tables. For the sake of reducing IO and not reading the file a few times i
>> want to do that in one Map-Only Job. Later the file will contain a few
>> million triples.
>>
>> I am experimenting in Pseudo-Distributed-Mode so far but will be able to run
>> it on our cluster soon.
>> Storing the data in the Tables does not need to be speed-optimized at any
>> cost, but i just want to do it as simple and fast as possible.
>>
>>
>> What is the best way to write to more than 1 table in one Map-Task?
>>
>> a)
>> I can either use "MultiTableOutputFormat.class" and write in map() using:
>> Put put = new Put(key);
>> put.add(kv);
>> context.write(tableName, put);
>>
>> Can i write to e.g. 6 Tables in this way by creating a new Put for each
>> table?
>>
>> But how can i turn off autoFlush and set writeBufferSize in this case?
>> Because i think autoflush is not that good in this case of putting lots of
>> values.
>>
>>
>> b)
>> I can use an instance of HTable in the Mapper class. Then i can set
>> autoFlush and writeBufferSize and write to the table using:
>> HTable table = new HTable(config, tableName);
>> table.put(put);
>>
>> But it is recommended to use only one instance of HTable, so i would need to
>> do
>> table = new Table();
>> for each table i want to write to. Is that still fine with 6 tables?
>> I stumbled upon HTablePool. Is this for these scenarios?
>>
>>
>> Thank You and Regards,
>> Christopher
>>


Mime
View raw message