lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dirk <>
Subject Re: Building an enterprise quality search engine using Apache Solr
Date Fri, 19 Oct 2012 07:05:42 GMT
your question is not easy to answer. It depends on so many things, that
there is no standard way to realize an enterprise solution and time planning
aspects are depending on so much things. 

I can try to give you some brief notes about our solution, but there are
some differences in target group and data source. I am technical responsible
for the system disco (a research and discovery system) at the library at
university of M√ľnster. (excuse me, I don't want to make a promotion tour
here, I earn no money with such activities -:)). Ok, in this search engine,
based on lucene, we search in about 200 Mio Articles, Books, Journals and so
on. So we have different data sources in structure and also in the way of
delivery. At the beginning we thought, lets buy a solution in order to avoid
more or less own developement work. So we bought a commercial search engine,
which works on a lucene core with a proprietary business logic in order to
talk to lucene core. So far so good - or not good. At that time I was the
onliest worker on this project and I need nearly one and a half year in
fulltime in order to fullfill most features and requirements. And the reason
for that long time is not, that I had no exiperiences, (I hope so). I work
in this area nearly 15 years in different companies, always as developer in
J2EE. (That`s rare today, because today every experienced developer wants to
work as "leader" or manager, that`s sounds better and less project leader
are outsourced. ok, other topic) And other universities (customers) who
realized a comparable search engine in that environment took as long or
longer. So I am hopefully...

In germany we say "der teufel steckt im detail" (translation literally:
devil is hidden in detail), which means you start work and parallel to that
process mostly requirements changed, sadly in most cases after development
has done the software basis. For example we need a lot of time for the fine
tuning of ranking and for realizing a complete automatic mechanism to update
data sources. And it was one thing to realize the search in development and
run a first developer test, a complete other thing is to make the system fit
for 24/7 service and run a productive system without problems.

Most time we need on data pre-processing because of the "shit in - shit out"
problem. Work on the quality of data is expensive but you get no
appreciation, because everybody is cope with searching features. This
requirement shows us, that mostly it is impossible to avoid own developement
Next thing is user interface, not every feature a customer knows from good
old database backboned systems is easy to realized in a search engine
because of more or less flat data structure. So we had to develop one
service after the other in order to read additional informations. In our
case for example runtime holding informations of our library. 

Summarized, if you want to estimate a concrete time duration in order to
realize a complete productive enterprise search solution, you should talk to
some people with similar solutions, think of your own requirements in detail
and then multiply your estimation with 2. Then perhaps you have a realistic

my developer logs 
View this message in context:
Sent from the Solr - User mailing list archive at

View raw message