lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer" <>
Subject Re: Gdata - Indexing feeds and entries
Date Mon, 17 Jul 2006 22:40:20 GMT
On 7/17/06, Yonik Seeley <> wrote:
> Hi Simon, welcome back, hope your exams went well!
Thanks for asking everything went very well. Especially the compiler I
wrote beside SoC gave me a good preparation for the exam on that
topic. Let me tell ya LR Parser can drive you mad having SR RR

> On 7/15/06, Simon Willnauer <> wrote:
> > My first and main problem is pretty well know on this mailinglist,
> > well I found lots of questions and suggestions via google but these
> > discussions are quite a while ago. I was wondering if there are some
> > new cognitions about distributed searching / indexing. The server
> > should be able to run in clusters/ server farms so indexed data must
> > be available on each server / machine. I thought about this for a
> > while and all my ideas seem to be problematic in a certain way.
> > i found this thread on the mailing list
> >
> Will every server contain all the data, or are you planning on having
> federated search?

Basically I was looking for some ideas I know about the problems with
distributed searching / indexing. We build a search application 1 year
ago at work using soap services to interact with the
indexing/searching master servers. We index every document into each
index which results in quiet big indexes on each master. Each master
handles searching and indexing. The so called search front end sends
the query to the server and the server returns the results via SOAP.
That approach does the job for that purpose but there might be some
other approaches around which could be more suitable for gdata.
Searching Gdata entries returns just the id of an entry and the score
for each entry. To render the response I fetch the entries from the
storage in the order of the search result otherwise I have to store
the entire entry inside the search index which does not make sense.
Well at least if you use a other storage than the simple lucene one i
use for development. By the way I did set up a distributed storage
using DB4o so there are already two storages.
I did read some mails on the mailing list about the distributed thing
and realized all the problems I would have missed (not too many:). So
I'm quiet aware of the problems just looking for some new ideas or
solutions as I already said.
> Federated search is difficult enough on it's own... throw in
> distributed updates, and it gets even tougher.
I totally agree.

> If you could use a storage component that handled distributed updates,
> that would make things quite a bit easier.

Could you give me more details on that.

> I'd still be inclined to try and get everything working on a single
> server first, while keeping in mind the goal of super-scalability for
> the future.
Well I guess that might be a good idea. After SoC some other
experienced guys can contribute and help to build such a solid system.
I will think about that :)

> -Yonik

regards simon

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message