lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject Re: Regarding Setup Lucine for my site
Date Thu, 06 Mar 2003 20:19:06 GMT
> > 1. 2 threads per request may improve speed up to 50%
> Hmm? Could you clarify? During indexing, multithreading may speed things
> up (splitting docs to index in 2 or more sets, indexing separately, combining
> indexing). But... isn't that a good thing? Or are you saying that it'd be good 
> to have multi-threaded search functionality for single search? (in my 
> experience searching is seldom the slow part)

you may improve indexing and searching. Indexing, because the merge
operation will lock just one thread and smaller part of an index while
other threads are still working;  searching, because you can distribute
the query to more barrels. In both cases you save up to 50% of time (I
assume mergefactor=2).

> > 2. Merger is hard coded
> 
> In a way that is bad because... ?
> (ie. what is the specific problem... I assume you mean index merging
> functionality?)

Because you cannot process local and/or remote barrels -- all must be
local in Lucene object model. That is the serious bug IMHO.

> > 4. you cannot implement dissemination + wrappers for internet servers
> > which would serve as static barrels.
> Could you explain this bit more thoroughly (or pointers on longer 
> explanation)?

Read more about dissemination, metasearch engines (i.e. Savvysearch),
dDIRs (i.e. Harvest). BTW, let's go to a pub and we can talk til morning
:) (it is a serious offer, because I would like to know more about IR).

This example is about metasearch (the simplest case of dDIRs): Can Lucene
allow that a barrel (index segment?) is static and a query is solved via
wrapper, that sends the query ${QUERY} to
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=${QUERY} and then
reads the HTML output as a result?

> > 5. Document metadata cannot be stored as a programmer wants, he must
> > translate the object to a set of fields
> Yes? I'd think that possibility of doing separate fields is a good thing; 
> after all, all a plain text search engine needs to provide (to be considered 
> one) is indexing of plain text data, right?

I talked about metadata. When metadata object knows how to achieve its 
persistence, why would one translate anything to fields and then back?
Why would you touch the users metadata at all? You need flat fields for
indexing, and what's around -- it is not your problem :). Lucene is
something between CMS and CIS, you say that it's closer to CIS, but when
you need metadata in fields, you are closer to CMS IMHO.

> > 6. Lucene cannot implement your own dynamization
> 
> (sorry, I must sound real thick here).
> Could you elaborate on this... what do you mean by dynamization?

Read more about "Dynamization of Decomposable Searching Problems".

-g-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message