lucene-lucene-net-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Honeycutt <mbhoneyc...@gmail.com>
Subject Re: FieldLookup for field with multiple values
Date Thu, 12 Nov 2009 06:17:11 GMT
Well, let me prefix what I'm about to describe by saying that I know that
I'm doing something with Lucene that it wasn't meant to do.  This is for a
"proof of concept" system that I'm helping put together on a tight schedule
with very limited resources, and we're trying to get to a mostly-working
state as quickly as possible.

That said, we are basically storing reports in Lucene.  The reports are
fairly standard documents for the most part: they have a title, body,
abstract, etc, all of which we index and search with Lucene.  However, they
also have a few fields that aren't standard, including a list of involved
organizations as well as a dollar amount for each report.  The organizations
are stored as IDs, and we add the org ID field multiple times, once for each
organization involved in the report.  The funding is also stored as a
non-indexed field on the Lucene document.

What I'm trying to do is build a quick-and-dirty org-by-dollar report off of
the reports that match the user's query.  So, a query for "aerospace" might
match 50,000 documents, and I want to show the user the top 5 organizations
in terms of dollars.  Again, I know reporting like this isn't what Lucene
was meant for, and we do have some ideas on how to handle it long-term, but
for now, I'm trying to get it working as well as I can using Lucene alone,
and Lucene does do a great job of finding the relevant set of documents to
build a report from.

On Wed, Nov 11, 2009 at 8:56 PM, Michael Garski <mgarski@myspace-inc.com>wrote:

> Matt,
>
> StringIndex is for use when a field has only one value in it for the
> purposes of sorting results, not for tokenized fields with multiple
> values.  TermVectors might be a better approach, but for 50K docs,
> you'll encounter an IO hit on reading them.
>
> I'm curious why you are looking to grab all of the terms for a
> ScoreDoc...  can you shed some light on that?
>
> Michael
>
> -----Original Message-----
> From: Matt Honeycutt [mailto:mbhoneycutt@gmail.com]
> Sent: Wednesday, November 11, 2009 4:57 PM
> To: lucene-net-user@incubator.apache.org
> Subject: FieldLookup for field with multiple values
>
> It seems that the StringIndex returned by
> FieldCache.Fields.Default.GetStringIndex() only indexes one value for a
> document even when the document has multiple values for the field.  Is
> there
> a performant want to get all the values for a particular field in a
> ScoreDoc?  I'm having to do this across the entire result set of
> ScoreDocs
> (up to 50,000), and retrieving the values through
> LuceneDocument.GetFields
> is not going to cut it.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message