lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: merged search of document
Date Wed, 07 Jan 2004 18:00:02 GMT
The solution is simple, but you need to think of it conceptually in a
different way. Instead of "all documents with the same DocID are the same
document" think "fetch all the document where DocId is XYZ."

Assuming the contents are in a field called contents
you do 
+(DocID:XYZ) (contents:foo) (contents:bar)

For that matter, you can use a standard analyzer on the query and use a
boolean to tie it to the specific document set.

This is how we do searching on a specific channel at fastbuzz.com.

Dror


On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote:
> 
> Jamie Stallwood sagte:
> > +(DocID:XYZ DocID:ABC) +(foo bar)
> >
> > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo OR
> > bar)).
> 
> This is just the solution for the example in real world I really don't
> have noc documents containing "foo" or "bar". What I meant was: Make
> Lucene think, that all Documents with the same DocID are ONE Document.
> Imagine you have a big book, say 1000 pages. Instead of putting the whole
> book in the index, you split it up in single pages and index them. Now
> it's faster if a page changes or is deleted to update your index instead
> of doing it over and over again for all 1000 pages. So you problem starts
> when you're searching on the book. You search for (foo bar), foo is on
> site 345 while bar ist on 435. You want to get a hit for the book. So I
> need a solution matching this more generic example.
> 
> >
> > -----Original Message-----
> > From: Thomas Scheffler [mailto:thomas.scheffler@uni-jena.de]
> > Sent: 07 January 2004 11:23
> > To: lucene-user@jakarta.apache.org
> > Subject: merged search of document
> >
> > Hi,
> >
> > I need a tip for implementation. I have several documents all of them with
> > a field named DocID. DocID identifies not a single Lucene Document but a
> > collection of them. When I wan't to start a seach it should handle the
> > search in that way, as these lucene documents where one.
> >
> > example:
> >
> > Document 1: DocID:XYZ
> >
> > containing: foo
> >
> > Document 2: DocID:XYZ
> >
> > containing: bar
> >
> > Document 3: DocID:ABC
> >
> > containing: foo bar
> >
> > Document 4: GHJ
> >
> > containing: foo
> >
> > As you already guesses, when I'm searching for "+foo +bar" I wan't the
> > hits to contain Document 1, Document 2 and Document 3, not Document 4. Is
> > that clear what I want? How do I implement such a monster? Is that
> > possible with lucene? The content is not stored within lucene it's just
> > tokenized and indexed.
> >
> > Any help?
> >
> > Thanks in advance!
> >
> > Thomas Scheffler
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> 
> 
> -- 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message