lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: Fine Tuning Lucene implementation
Date Wed, 25 Jul 2007 18:30:27 GMT
"Askar Zaidi" wrote:

> ... Heres what I am trying to accomplish:
> 1. Iterate over itemID (unique) in the database using one SQL query.
> 2. For every itemID found, run 4 searches on Lucene Index.
> 3. doTagSearch(itemID....) ; collect score
> 4. doTitleSearch(itemID...) ; collect score
> 5. doSummarySearch(itemID...) ; collect score
> 6. doBodySearch(itemID....) ; collect score
> These scores are then added and I get a total score for each
> unique item in the database.

oining this late I might be missing something. Still I
would like to understand better *what* you are trying to do
here (before going into the *how*).

By your description above, my understanding is this:

1. Assume one table in the DB, with textual
   columns: ItemID(unique), Title, Summary, Body, Tags.
2. The ItemID columns is a unique key in the table.
3. Assume entries in the ItemID column looks like
   this: itemID=127, itemID=75, etc.
4. Some of the other columns (not the ItemID column)
   can contain IDs as well.
5. You are iterating over the ItemID column, and,
   for each value, (each ID), ranking all the documents
   in the index (all the rows in that table) for
   occurrences of that ID.

Is that so?

If so, you are actually trying to find for each row (doc),
which (other) rows (docs) "refer" to it most. Right?
Is this really a textual search problem?

For instance, if rows X has N references to row Z,
and row Y has N+1 references to row Z, but the length
of the text in row Z is much more than that of row X,
would you expect row X to rank higher, because it is
shorter (what Lucene is likely to do) or that row Y
will rank higher, because it has slightly more
references to row Z?

In another email you have this:

> Can I just add:
> +contents:Harvard +contents:Business +contents: Review +itemID=77
> That query would just return one document.

Which is different than the above - it has a textual
task, not only ID. Are you interested here in all docs
(rows) that reference itemID=77 or only want to check
if the specific row whose ID is itemID=77, satisfies
the textual part of this query?

This brings back to the start point: perhaps it would
help more if you once again define the task/problem you
are trying to solve? Forget about loops and doXyzSearch()
methods - just define input; output; logic;


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message