hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Crawling Using HBase as a back end --Issue
Date Mon, 20 Apr 2009 11:14:41 GMT
Ninad,

Regards the timeouts, I recently gave a tip in the thread "Tip when
scanning and spending a lot of time on each row" which should solve
your problem.

Regards your table, you should split it. In the shell, type the
command "tools" to see how to use the "split" command. Issue a couple
of them, waiting a bit between each call.

J-D

On Mon, Apr 20, 2009 at 5:49 AM, Ninad Raut <hbase.user.ninad@gmail.com> wrote:
> Hi,
>
> I have been trying crawling data using MapReduce on HBase. Here is the scenario:
>
> 1) I have a Fetch list which has all the permalinks to be fetched
> .They are stored in a PermalinkTable
>
> 2) A MapReduce scans over each permalink and tries fetching for the
> data and dumping it in ContentTable.
>
> Here are the issues I face:
>
> The permalink table is not split so I have just one map running on a
> single machine. The use of mapreduce gets nullified.
>
> The map reduce keeps givinf scanner time our exceptions causing task
> failures and further delays.
>
>
> If any one can give me tips for this use case it would really help me.
>

Mime
View raw message