lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua O'Madadhain" <jmad...@ics.uci.edu>
Subject Re: How to store document meta information
Date Mon, 28 Apr 2003 18:38:34 GMT
On Mon, 28 Apr 2003, Stephane Vaucher wrote:

> I've got a document that I run through an information extraction engine
> that returns a list of concepts associated to a document with an
> appropriate relevancy factor (for example, with a news article, it might
> return sport=100%, litterature=84% and politics=10%).

It's unclear what the semantics of your relevancy measure are.  Is it
something like a fuzzy set measure ('this article is 100% in the set of
documents about sports, 84% in ... literature, and 10% in politics')?

> I would like to index these concepts with an indication of their relevancy
> levels. Is there a recommended way of doing this? Searching the FAQs, I
> found none, but from my knowledge of lucene, I gather I could do it the
> following ways:
>
> 1) If all concepts were to be stored in a single field (as I would
> prefer), I don't think I can use field boosting, so I would have to
> probably hold multiple instances of my concept (e.g. I could have 100
> "sport", 84 "litterature" and 10 "politics") in my field.
>
> 2) I could use multiple fields with varying boost factors. But I would be
> forced to determine ahead of time how many concepts I'll have to perform
> searches on all of the appropriate fields. This could probably affect the
> performance of the app (I say this with no numbers, simple intuition, so
> correct me if I'm wrong).

How do you intend to use these concepts in the search process?  That is,
how will these concepts be used by (a) the user in specifying a query, (b)
the indexer in storing the associated documents, (c) the searcher in
retrieving documents, and (d) the presentation of the results to the user?
Without knowing these things, it's hard to answer your question (at least
for me).

Regards,

Joshua O'Madadhain

 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message