lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Bernard <emman...@hibernate.org>
Subject Re: Modelling relational data in Lucene Index?
Date Mon, 06 Nov 2006 15:13:23 GMT
I had a quick look at SOLR and DBSight. They seem to achieve a different 
goal than Hibernate Lucene.
The formers belong to the project box category: you set up a server that 
will handle the search for you. The application will then delegate the 
work to those servers.
The latter belongs to the framework category: you use it inside your 
Hibernate/EJB 3.0 application to enable an index based search feature.
To a certain extend, it is the same difference between a Google box and 
Lucene.

You can write some code based on the latter to covers the formers 
features esp the platform abstraction (PHP, .net), but it is probably a 
lot of work and that is not really the point.
You can write some code based on the formers to enable indexing and 
search of your persistent domain model (persisted through Hibernate), 
but that is probably more work.

Really it is a matter of easing the pain from one side of the problem or 
the other side. I don't see much competition between the 2 approaches, 
they cover different goals.

To specifically answer some of your remarks:
 - yes, you need to write some code to recreate an index. Literally, 6 
lines of code.
 - no, I do not currently cache the searcher because, Hibernate is 
transactional by nature and protect yourself as much as possible from 
read uncommited and other data inconsistencies. I guess I could 
implement some caching capabilities using reader.isCurrent() or 
something equivalent.
 - the ability to split searchers servers from indexers servers is on my 
todo list.

Cheers

Emmanuel


Chris Lu wrote:
> I personally like your effort, but technically I would  disagree.
>
> The SOLR project, and the project I am working on, DBSight, have an
> detached approach which is implementation agnostic, no matter if it's
> java, ruby, php, .net. The return results can be a rendered HTML,
> JSON, XML. I don't think you can be more flexible than that. If
> creating an new index takes 5 minutes without any coding, you can
> create something more creative.
>
>> From business side, you don't need to worry about indexing when
> designing a system. New requirement may come. It's very hard trying to
> anticipate all the needs.
>
> Technically, detached approach gives more flexible on resources like
> CPU, memory, hard drive. For example, if your index grows large, say
> 1G, indexing can take hours with merging, I am not sure how compass or
> hibernate/lucene handles it. Need to re-write code at that time? I
> actually feel it's a dangerous trap.
>
>> I've introduced a session.index() which forces the (re)indexing of the
>> document
> So does it mean you need to write some code to fix the index if it's 
> crashed?
>
>> What do you mean by multithread safe? The indexing?
>> the indexing is multithread safe in the Hibernate Lucene integration
> The indexing can be threadsafe. But will it affect the searching? With
> many files changing and merging, if you cache the searcher. the
> searching will have "read passed EOF" exceptions. If you don't cache
> the searcher, you will loose the built-in caching, FieldCacheImpl, in
> Lucene.
>
>>
>> The query process?
>> the query doesn't have to since you query on a give session (aka user
>> conversation), so no multithread threat here.
> So you are not caching searcher.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message