lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: merged search of document
Date Wed, 07 Jan 2004 19:10:25 GMT
On Wed, Jan 07, 2004 at 07:58:52PM +0100, Thomas Scheffler wrote:
> Am Mit, den 07.01.2004 schrieb Dror Matalon um 19:00:
> > The solution is simple, but you need to think of it conceptually in a
> > different way. Instead of "all documents with the same DocID are the same
> > document" think "fetch all the document where DocId is XYZ."
> > 
> > Assuming the contents are in a field called contents
> > you do 
> > +(DocID:XYZ) (contents:foo) (contents:bar)
> 
> I allready was on that way but think of a search like (foo -bar). With
> your solution it will result in a hit because on page 345 (to keep my
> example) is the word "foo" and no "bar". Of cause I want with my model,
> that the book don't get a hit for that query. You see how hard it is to
> handle, isn't it? 

I think, I'm starting to understand. So you want to treat several
documents as one, and if the hit fails for one of the documents, it
should fail for all the documents with the same id. OK. This begs the
question. Why don't you make all these document with the same id one
document, and index them together?

> 
> > 
> > For that matter, you can use a standard analyzer on the query and use a
> > boolean to tie it to the specific document set.
> > 
> > This is how we do searching on a specific channel at fastbuzz.com.
> > 
> > Dror
> > 
> > 
> > On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote:
> > > 
> > > Jamie Stallwood sagte:
> > > > +(DocID:XYZ DocID:ABC) +(foo bar)
> > > >
> > > > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo
OR
> > > > bar)).
> > > 
> > > This is just the solution for the example in real world I really don't
> > > have noc documents containing "foo" or "bar". What I meant was: Make
> > > Lucene think, that all Documents with the same DocID are ONE Document.
> > > Imagine you have a big book, say 1000 pages. Instead of putting the whole
> > > book in the index, you split it up in single pages and index them. Now
> > > it's faster if a page changes or is deleted to update your index instead
> > > of doing it over and over again for all 1000 pages. So you problem starts
> > > when you're searching on the book. You search for (foo bar), foo is on
> > > site 345 while bar ist on 435. You want to get a hit for the book. So I
> > > need a solution matching this more generic example.
> > > 
> > > >
> > > > -----Original Message-----
> > > > From: Thomas Scheffler [mailto:thomas.scheffler@uni-jena.de]
> > > > Sent: 07 January 2004 11:23
> > > > To: lucene-user@jakarta.apache.org
> > > > Subject: merged search of document
> > > >
> > > > Hi,
> > > >
> > > > I need a tip for implementation. I have several documents all of them
with
> > > > a field named DocID. DocID identifies not a single Lucene Document but
a
> > > > collection of them. When I wan't to start a seach it should handle the
> > > > search in that way, as these lucene documents where one.
> > > >
> > > > example:
> > > >
> > > > Document 1: DocID:XYZ
> > > >
> > > > containing: foo
> > > >
> > > > Document 2: DocID:XYZ
> > > >
> > > > containing: bar
> > > >
> > > > Document 3: DocID:ABC
> > > >
> > > > containing: foo bar
> > > >
> > > > Document 4: GHJ
> > > >
> > > > containing: foo
> > > >
> > > > As you already guesses, when I'm searching for "+foo +bar" I wan't the
> > > > hits to contain Document 1, Document 2 and Document 3, not Document 4.
Is
> > > > that clear what I want? How do I implement such a monster? Is that
> > > > possible with lucene? The content is not stored within lucene it's just
> > > > tokenized and indexed.
> > > >
> > > > Any help?
> > > >
> > > > Thanks in advance!
> > > >
> > > > Thomas Scheffler
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > > 
> > > 
> > > -- 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > 
> --
> Fachbegriffe der Informatik - Einfach erklärt
> =============================================
> N° 37 -- Fehlertolerant :
> 
> Das Programm erlaubt keine Benutzereingaben. 
> 



-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message