nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (NUTCH-650) Hbase Integration
Date Sun, 16 Aug 2009 22:28:15 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743919#action_12743919
] 

Doğacan Güney commented on NUTCH-650:
-------------------------------------

I just committed code to branch nutchbase. The scoring API did not turn out as clean as I
expected but I decided to put in what I have. Also, I made some changes so that web UI also
works.

I am leaving this issue open because I will add documentation tomorrow. Meanwhile,

To download: 

  svn co http://svn.apache.org/repos/asf/lucene/nutch/branches/nutchbase

Usage:

After starting hbase 0.20 (checkout rev. 804408 from hbase branch 0.20), create a webtable
with

  bin/nutch createtable webtable

After that, usage is similar.

  bin/nutch inject webtable url_dir # inject urls

for as many cycles as you want;
    bin/nutch generate webtable #-topN N works
    bin/nutch fetch webtable # -threads N works
    bin/nutch parse webtable
    bin/nutch updatetable webtable

  bin/nutch index <index> webtable
or
  bin/nutch solrindex <solr url> webtable

To use solr, use this schema file
http://www.ceng.metu.edu.tr/~e1345172/schema.xml


Again, a note of warning: This is extremely new code. I hope people will test and use it but
there is no guarantee that it will work :)


> Hbase Integration
> -----------------
>
>                 Key: NUTCH-650
>                 URL: https://issues.apache.org/jira/browse/NUTCH-650
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>             Fix For: 1.1
>
>         Attachments: hbase-integration_v1.patch, hbase_v2.patch, malformedurl.patch,
meta.patch, meta2.patch, nofollow-hbase.patch, nutch-habase.patch, searching.diff, slash.patch
>
>
> This issue will track nutch/hbase integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message