nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Katsuki FUJISAWA <katsuki.fujisawa...@gmail.com>
Subject The index file made by executing main method of org.apache.nutch.crawl.Crawl can not be read from Luke.
Date Mon, 07 Sep 2009 04:13:46 GMT
Hi,

I am new to nutch.
Now I am trying to do crawing from Java servlet program without using
bin/nutch commnad.
When nutch 0.9 index file made by main method of
org.apache.nutch.crawl.Crawl class can be read from program.
But when nutch 1.0 index file  made by main method of
org.apache.nutch.crawl.Crawl class can not be read from program.


Also read capability of index file by using luke is below.

index file of nutch 0.9
by bin/nutch command    readable.
by main method of Crawl class    readable.

index file of nutch 1.0
by bin/nutch command    readable.
by main method of Crawl class    unreadable.


Does anybody know reason why?
And give me a infomation please.

My program code sample is below.

*************************************************************
FSDirectory indexDir = null;

indexDir = FSDirectory.getDirectory( "C:\\nutch-1.0\\crawl\\index", false );
IndexSearcher indexSearcher = new IndexSearcher( indexDir );

List<DisplayBean> displayBeanList = new ArrayList<DisplayBean>();

Hits hits = indexSearcher.search( new MatchAllDocsQuery());

Iterator<Hit> i = hits.iterator();
int cnt = 0;
while (i.hasNext()){
	if(cnt > 2) break;

	Hit hit = (Hit)i.next();
	DisplayBean displayBean = new DisplayBean();
	displayBean.setUrl(hit.get("url"));
	displayBean.setTitle(hit.get("title"));
	displayBean.setTstamp(hit.get("tstamp"));

	displayBeanList.add(displayBean);

	cnt++;
}

indexSearcher.close();

Mime
View raw message