nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Stuart <david.stu...@progressivealliance.co.uk>
Subject Re: Nutch 2.0 Help
Date Sun, 05 Sep 2010 12:56:49 GMT
Hi All,

I have done as per below and can create a table from within the hbase shell. I found the appropriate
create table method
 bin/nutch org.apache.nutch.storage.WebTableCreator webtable  but it only returns null

Any help would be great

Regards

Dave


On 2 Sep 2010, at 13:12, Julien Nioche wrote:

> Hi David,
> 
> I haven't used the Hbase backend with GORA for quite some time but from what I can remember
you'll need the following things :
> 
> * conf/hbase-site.xml => this should correspond to your local configuration
> * conf/gora-hbase-mapping.xml => see below
> * conf/gora.properties => don't think there anything you need to specify for Hbase
> 
> * in nutch-site.xml
> 
> <property>
>   <name>storage.data.store.class</name>
>   <value>org.gora.hbase.store.HbaseStore</value>
>   <description>Default class for storing data</description>
> </property>
> 
> and of course all the necessary Hbase jars in the /lib dir - probably easier to modify
ivy/ivy.xml so that it includes Hbase
> 
> gora-hbase-mapping.xml  : not sure this is the latest version though 
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> <gora-orm>
> 
> <table name="webtable">
>   <family name="p"/> <!-- This can also have params like compression, bloom
filters -->
>   <family name="f"/>
>   <family name="s"/>
>   <family name="il"/>
>   <family name="ol"/>
>   <family name="h"/>
>   <family name="mtdt"/>
>   <family name="mk"/>
> </table>
> 
> <class table="webtable" keyClass="java.lang.String" name="org.apache.nutch.storage.WebPage">
 
>   <!-- fetch fields                                       -->
>   <field name="baseUrl" family="f" qualifier="bas"/>    
>   <field name="status" family="f" qualifier="st"/>
>   <field name="prevFetchTime" family="f" qualifier="pts"/>
>   <field name="fetchTime" family="f" qualifier="ts"/>
>   <field name="fetchInterval" family="f" qualifier="fi"/>
>   <field name="retriesSinceFetch" family="f" qualifier="rsf"/>
>   <field name="reprUrl" family="f" qualifier="rpr"/>
>   <field name="content" family="f" qualifier="cnt"/>
>   <field name="contentType" family="f" qualifier="typ"/>    
>   <field name="protocolStatus" family="f" qualifier="prot"/>
>   <field name="modifiedTime" family="f" qualifier="mod"/>
> 
>   <!-- parse fields                                       -->
>   <field name="title" family="p" qualifier="t"/>
>   <field name="text" family="p" qualifier="c"/>
>   <field name="parseStatus" family="p" qualifier="st"/>
>   <field name="signature" family="p" qualifier="sig"/>
>   <field name="prevSignature" family="p" qualifier="psig"/>
> 
>   <!-- score fields                                       -->
>   <field name="score" family="s" qualifier="s"/>
> 
>   <field name="headers" family="h"/>
> 
>   <field name="inlinks" family="il"/>
> 
>   <field name="outlinks" family="ol"/>
> 
>   <field name="metadata" family="mtdt"/>
> 
>   <field name="markers" family="mk"/>
> 
> </class>
> 
> </gora-orm>
> 
> 
> HTH
> 
> Good luck!
> 
> Julien
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> 
> On 2 September 2010 12:58, David Stuart <david.stuart@progressivealliance.co.uk>
wrote:
> Hey All,
> 
> I have setup the latest version nutch from trunk and am running into a few issues with
hbase and injecting urls. when I run the command
> 
> runtime/local/bin/nutch inject runtime/local/seed/
> 
> I get
> InjectorJob: java.lang.RuntimeException: Could not create datastore
>        at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:70)
>        at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50)
>        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:233)
>        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:246)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:256)
> 
> Under the gora properties it should be pointing at localhost/nutchtest and I created
that store manually in hbase is that right? 
> 
> I have found a few tutorials around nutchbase but the api seems to have changed since
the merge with Nutch trunk
> 
> Any help would be appreciated and I try to do a how to writeup
> 
> Regards,
> 
> Dave
> 


Mime
View raw message