lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ophir Cohen <oph...@gmail.com>
Subject Fwd: Payloads API and support
Date Tue, 01 Feb 2011 10:18:36 GMT
Hi Guys,

I've been using Lucene for more than 5 years and it is a great tool - 
great job! Thanks for everything...


Lately I encountered the new payloads support and it looks its a great 
solution for my project.


*The problem:*

The use case is as follows:

I need to support a way to calculate statistics on web pages.

Each page has few metrics that comes with it (how many user saw it, what 
was the average time on page etc...).


The requirement is to support query such as:

How many users saw pages contains the tokens 'house' and 'white'.

Or

What was the average time on pages contains tokens 'horse' and 'pony'.


*First solution:*

Add pages to Lucene, index the words and store the metrics.

*The problem: performance.*

Not as regular search, I need to provide results for all matched 
documents and those I need to iterate on all results and load the 
document data.
This method take to much time.


*Better solution:*

Store the metrics as payloads and calculate the needed data without 
access to the storage - a huge performance boost.


The problem is (unless I miss something) that I can't get the payloads 
from anything except TermPositions and it isn't good enough as I want to 
use complex queries.

Is there is any other way to access it?

One option can be to get the payload with the document id in the collector.


Any ideas/comments/suggests?

-- 
Thanks in advance,
Ophir Cohen


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message