lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Gavriely <...@sightix.com>
Subject Terms with different boosts
Date Thu, 11 Sep 2008 14:28:34 GMT
Hi,

I have to index terms with different boosts, meaning that if the word A appears in two documents
one document will be ranked higher.

I've tried to index them by putting them in different fields and give the fields different
boost but i ran into too many files (caused by too many fields I guess) problem.
Below is the stack trace:

java.io.FileNotFoundException: /tmp/nutch/crawl/indexes/part-00000/_0.nrm (Too many open files)

 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.<init>(FileInputStream.java:106)
 at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:87)
 at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:142)
 at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.<init>(ChecksumFileSystem.java:110)
 at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
 at org.apache.nutch.indexer.FsDirectory$DfsIndexInput$Descriptor.<init>(FsDirectory.java:160)
 at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.<init>(FsDirectory.java:169)
 at org.apache.nutch.indexer.FsDirectory.openInput(FsDirectory.java:115)
 at org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:522)
 at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:185)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:140)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:126)
 at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:155)
 at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:579)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:179)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:142)
 at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.<init>(DeleteDuplicates.java:167)
 at org.apache.nutch.indexer.DeleteDuplicates$InputFormat.getRecordReader(DeleteDuplicates.java:240)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:139)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)
Exception in thread "main" java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
 at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
 at com.sightix.nutch.crawl.Crawl.main(Crawl.java:139)

Can you suggest another way to index terms in different boosts?
Thanks,
Guy
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message