lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Askar Zaidi" <askar.za...@gmail.com>
Subject Re: Fine Tuning Lucene implementation
Date Wed, 25 Jul 2007 18:44:47 GMT
Hey Guys,

Thanks for all the responses. I finally got it working with some query
modification.

The idea was to pick an itemID from the database and for that itemID in the
Index, get the scores across 4 fields; add them up and ta-da !

I still have to verify my scores.

Thanks a ton, I'll be active on this list from now on and try and answer
questions to which I was seeking answers.

later,
Askar

On 7/25/07, Doron Cohen <DORONC@il.ibm.com> wrote:
>
> "Askar Zaidi" wrote:
>
> > ... Heres what I am trying to accomplish:
> >
> > 1. Iterate over itemID (unique) in the database using one SQL query.
> > 2. For every itemID found, run 4 searches on Lucene Index.
> > 3. doTagSearch(itemID....) ; collect score
> > 4. doTitleSearch(itemID...) ; collect score
> > 5. doSummarySearch(itemID...) ; collect score
> > 6. doBodySearch(itemID....) ; collect score
> >
> > These scores are then added and I get a total score for each
> > unique item in the database.
>
> oining this late I might be missing something. Still I
> would like to understand better *what* you are trying to do
> here (before going into the *how*).
>
> By your description above, my understanding is this:
>
> 1. Assume one table in the DB, with textual
>    columns: ItemID(unique), Title, Summary, Body, Tags.
> 2. The ItemID columns is a unique key in the table.
> 3. Assume entries in the ItemID column looks like
>    this: itemID=127, itemID=75, etc.
> 4. Some of the other columns (not the ItemID column)
>    can contain IDs as well.
> 5. You are iterating over the ItemID column, and,
>    for each value, (each ID), ranking all the documents
>    in the index (all the rows in that table) for
>    occurrences of that ID.
>
> Is that so?
>
> If so, you are actually trying to find for each row (doc),
> which (other) rows (docs) "refer" to it most. Right?
> Is this really a textual search problem?
>
> For instance, if rows X has N references to row Z,
> and row Y has N+1 references to row Z, but the length
> of the text in row Z is much more than that of row X,
> would you expect row X to rank higher, because it is
> shorter (what Lucene is likely to do) or that row Y
> will rank higher, because it has slightly more
> references to row Z?
>
> In another email you have this:
>
> > Can I just add:
> >
> > +contents:Harvard +contents:Business +contents: Review +itemID=77
> ??
> >
> > That query would just return one document.
>
> Which is different than the above - it has a textual
> task, not only ID. Are you interested here in all docs
> (rows) that reference itemID=77 or only want to check
> if the specific row whose ID is itemID=77, satisfies
> the textual part of this query?
>
> This brings back to the start point: perhaps it would
> help more if you once again define the task/problem you
> are trying to solve? Forget about loops and doXyzSearch()
> methods - just define input; output; logic;
>
> Regards,
> Doron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message