nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Katsuki FUJISAWA <katsuki.fujisawa...@gmail.com>
Subject Re: The index file made by executing main method of org.apache.nutch.crawl.Crawl can not be read from Luke.
Date Mon, 07 Sep 2009 05:15:53 GMT
I am using below libraries.

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Sort;
import org.apache.lucene.store.FSDirectory;

Fujisawa

On Mon, Sep 7, 2009 at 1:13 PM, Katsuki
FUJISAWA<katsuki.fujisawa999@gmail.com> wrote:
> Hi,
>
> I am new to nutch.
> Now I am trying to do crawing from Java servlet program without using
> bin/nutch commnad.
> When nutch 0.9 index file made by main method of
> org.apache.nutch.crawl.Crawl class can be read from program.
> But when nutch 1.0 index file  made by main method of
> org.apache.nutch.crawl.Crawl class can not be read from program.
>
>
> Also read capability of index file by using luke is below.
>
> index file of nutch 0.9
> by bin/nutch command    readable.
> by main method of Crawl class    readable.
>
> index file of nutch 1.0
> by bin/nutch command    readable.
> by main method of Crawl class    unreadable.
>
>
> Does anybody know reason why?
> And give me a infomation please.
>
> My program code sample is below.
>
> *************************************************************
> FSDirectory indexDir = null;
>
> indexDir = FSDirectory.getDirectory( "C:\\nutch-1.0\\crawl\\index", false );
> IndexSearcher indexSearcher = new IndexSearcher( indexDir );
>
> List<DisplayBean> displayBeanList = new ArrayList<DisplayBean>();
>
> Hits hits = indexSearcher.search( new MatchAllDocsQuery());
>
> Iterator<Hit> i = hits.iterator();
> int cnt = 0;
> while (i.hasNext()){
>        if(cnt > 2) break;
>
>        Hit hit = (Hit)i.next();
>        DisplayBean displayBean = new DisplayBean();
>        displayBean.setUrl(hit.get("url"));
>        displayBean.setTitle(hit.get("title"));
>        displayBean.setTstamp(hit.get("tstamp"));
>
>        displayBeanList.add(displayBean);
>
>        cnt++;
> }
>
> indexSearcher.close();
>

Mime
View raw message