nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <>
Subject [jira] Commented: (NUTCH-650) Hbase Integration
Date Sun, 16 Aug 2009 22:28:15 GMT


Doğacan Güney commented on NUTCH-650:

I just committed code to branch nutchbase. The scoring API did not turn out as clean as I
expected but I decided to put in what I have. Also, I made some changes so that web UI also

I am leaving this issue open because I will add documentation tomorrow. Meanwhile,

To download: 

  svn co


After starting hbase 0.20 (checkout rev. 804408 from hbase branch 0.20), create a webtable

  bin/nutch createtable webtable

After that, usage is similar.

  bin/nutch inject webtable url_dir # inject urls

for as many cycles as you want;
    bin/nutch generate webtable #-topN N works
    bin/nutch fetch webtable # -threads N works
    bin/nutch parse webtable
    bin/nutch updatetable webtable

  bin/nutch index <index> webtable
  bin/nutch solrindex <solr url> webtable

To use solr, use this schema file

Again, a note of warning: This is extremely new code. I hope people will test and use it but
there is no guarantee that it will work :)

> Hbase Integration
> -----------------
>                 Key: NUTCH-650
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>             Fix For: 1.1
>         Attachments: hbase-integration_v1.patch, hbase_v2.patch, malformedurl.patch,
meta.patch, meta2.patch, nofollow-hbase.patch, nutch-habase.patch, searching.diff, slash.patch
> This issue will track nutch/hbase integration

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message