lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: keywords in a document
Date Tue, 10 Apr 2007 00:16:09 GMT
Well, a couple of things....

The IndexReader is opened on an already existing index. You need to
have built an index to open a reader on it. So there isn't really a default
reader.

So usually, one uses Lucene to create an index out of some
number of documents, and use that index for searching. One of
the major parts of creating an index is exactly reading in the
raw text and deciding what needs to go into the index in various
fields.

This makes what you're trying to do a bit clumsy, at least it
would have a release or two ago...

However, you're in great luck, because some kind soul created
MemoryIndex, which is a simplified in-memory-only index
intended to be used to create indexes on a single document for
some processing and then throw out. Which may be just what you
need. See the jar in C:\lucene-2.1.0\contrib\memory

But I'm still not sure this is what you need, because you only get
relevancy rankings in Lucene based upon the query you are trying
to submit. If you're doing simple term frequency counting, perhaps
you can get creative with TermFreqVector and other Term* classes...

Perhaps others will be able to chime in here....

Best
Erick


On 4/9/07, Ted Chen <nehcdet@yahoo.com> wrote:
>
> Hi,
>     I'm new to Lucene, but I did spend quite some time trying to find an
> answer to the problem before turning to you gurus for help.
>     I'm given a text file.  My task is to extract some top keywords from
> the file so that these words can describe this document.  Ideally, terms
> with the highest tfidf should be returned.   I noticed lucene provides a
> method called QueryTermExtractor.getIdfWeightedTerms(), but to use this
> method, I need to provide a query and an IndexReader.  In my case, I only
> have a text file, and don't have a handle on any IndexReader.  Is there any
> way to indicate I want to use the "default IndexReader", where the default
> reader represents a typical collection of documents.  Also, do I need to
> construct a Query out of the text file?  If so, is the best choice a
> multi-term query?
>     Any suggestions?
>
> Thanks,
> Ted
>
>
>
>
> ____________________________________________________________________________________
> Sucker-punch spam with award-winning protection.
> Try the free Yahoo! Mail Beta.
> http://advision.webevents.yahoo.com/mailbeta/features_spam.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message