lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer" <simon.willna...@googlemail.com>
Subject Re: Lucene Gdata -- the best way to store the feeds / entries
Date Sun, 28 May 2006 20:36:52 GMT
I do agree with you that subversion would be a kind of an overkill to
include it into the project. I will provied an Interfaces to these
component to change the implementation and / or provide more than one
alternative.
For simplizity I will store the feeds and enties into the index or
create two using one for searching and the otherone for storing the
actual data.

Integrating BDB looks pretty good to me so we might provide a BDB
implementation as well.

simon

On 5/28/06, Grant Ingersoll <gsingers@syr.edu> wrote:
> If the lazy field loading gets applied (which it should soon), you would
> see less of a performance hit for storing items in Lucene, at least when
> just getting hits.  And you could compress the feeds too
>
> Also, maybe Subversion could act as your repository?  I don't know if it
> is a viable solution given licensing and all that, but it supports
> versioning, etc. and is pretty easy to work with, but it may be overkill
> and may complicate your architecture too much.  Perhaps the best way is
> to define an Interface to this component and one or two implementations
> of it (maybe flat file and BDB) and then other people can write their own.
>
> -Grant
>
> Simon Willnauer wrote:
> > Yes and No :) well the problem with the versioning system is still not
> > solved but I did contact the google developers to get in touch with
> > them to solve this problem.
> > I will definately have a look at the BDB JE and will keep it in mind.
> > I had a quick look at it and it sound quiet suitable for storing
> > feeds.
> > Thank you Otis!!1
> >
> > simon
> >
> > On 5/28/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> >> Not sure if Berkeley DB is an option, but it sounds like you just
> >> need a "storage" component for feeds, and BDB JE might be a good
> >> fit.  I just used it recently for one such system and was quite happy
> >> with performance and ease of use.
> >>
> >> Otis
> >>
> >> ----- Original Message ----
> >> From: Simon Willnauer <simon.willnauer@googlemail.com>
> >> To: java-dev@lucene.apache.org
> >> Sent: Saturday, May 27, 2006 7:33:28 PM
> >> Subject: Lucene Gdata -- the best way to store the feeds / entries
> >>
> >> For those who haven't heard about the GData project please check
> >> today's mailing list  .
> >> The Lucene Indexer is supposed to be used as the search component of
> >> this implementation. As GData is an extension to the Atom/Rss format
> >> including search and a kind of versioning. This project is a server
> >> side implementation of the protocol. So what's the problem, the
> >> incoming feed entries and their updates have to be stored somewhere in
> >> a persistent storage. The easiest approach would be a flat file
> >> storage which is not sufficient in my eyes. I thought about using a
> >> similar approach to the Nutch dist. file system by Indexing the
> >> incoming entries in a "searchable" index and store the whole entry in
> >> an associated index to prevent the index from growing to fast.
> >> To keep the index small I would create a separate index for each feed
> >> instance which is organized in the local file system.
> >> I would be interested if anybody has experience with retrieving large
> >> data like whole feed entries out of a "storage" lucene  index. Am I
> >> supposed to face any performance problems with this approach?
> >> As far as I know lucene doesn't support any versioning or did that
> >> change by any chance? Well, the protocol description doesn't say
> >> anything about retrieving old versions.(the documentation only about
> >> optimistic locking / updating versions)
> >>
> >> regards Simon
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
>
> --
>
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 335 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
Mime
View raw message