lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Implementing filtering based on multiple fields
Date Fri, 08 Jan 2010 00:08:41 GMT
Ah, well, masking it didn't help.  Yes, ignore Bixo, Nutch, and Droids then.
Consider DataImportHandler from Solr or wait a bit for Lucene Connectors Framework to materialize.
 Or use LuSql, or DbSight, or Sematext's Database Indexer.

Yes, I was suggesting a separate index for each user.  That's what Simpy uses and has some
200K indices on 1 box.... and I think dozens of QPS without any caching, if I remember correctly.
 Load is under 1.0.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Yaniv Ben Yosef <yanivby@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thu, January 7, 2010 6:55:18 PM
> Subject: Re: Implementing filtering based on multiple fields
> 
> Thanks Otis.
> 
> If I understand correctly - Bixo, Nutch and Droids are technologies to use
> for crawling the web and building an index. My project is actually about
> indexing a large database, where you can think of every row as a web page,
> and a particular column is the equivalent of a web site. (I didn't mention
> that in the previous post because I didn't want to complicate my question,
> and it seems equivalent to Google CSE given that Lucene can use virtually
> any input for indexing, AFAIK)
> Therefore I'm not sure if the frameworks you've mentioned are applicable to
> my project as they seem to be related to web page indexing, but perhaps I'm
> missing something.
> Also, what did you mean about isolating users and their data/indices. Did
> you mean that I should create a separate index per user?
> 
> Thanks again!
> 
> On Fri, Jan 8, 2010 at 12:35 AM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
> 
> > For something like CSE, I think you want to isolate users and their
> > data/indices.
> >
> > I'd look at Bixo or Nutch or Droids ==> Lucene or Solr
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: Yaniv Ben Yosef 
> > > To: java-user@lucene.apache.org
> > > Sent: Thu, January 7, 2010 3:54:22 PM
> > > Subject: Implementing filtering based on multiple fields
> > >
> > > Hi,
> > >
> > > I'm very new to Lucene. In fact, I'm at the beginning of an evaluation
> > > phase, trying to figure whether Lucene is the right fit for my needs.
> > > The project I'm involved in requires something similar to the Google
> > Custom
> > > Search Engine (CSE). In CSE, each user can
> > > define a set (could be a large set) of websites, and limit the search to
> > > only those websites. So for example, I can create a CSE that searches all
> > > web pages on cnn.com, msnbc.com and nytimes.com only.
> > > I am trying to understand whether and how I can do something similar in
> > > Lucene.
> > >
> > > The FAQ hints about this possibility
> > > here,
> > > but it mentions a class that no longer exists in 3.0 (QueryFilter), and
> > is
> > > very laconic about the suggested options. Also I'm not sure how well it
> > will
> > > perform in my use case (or even if it fits at all).
> > > I thought about creating a separate index for each user or CSE. However,
> > my
> > > system should be able to handle tens of thousands of concurrent users. I
> > > haven't done any analysis yet on how this will affect CPU, RAM, I/O and
> > > storage size, but was wondering if any of you experienced Lucene
> > > users/developers think it's a good direction.
> > > If that's not a good idea, what would be a good strategy here?
> > >
> > > Any help will be much appreciated,
> > > Yaniv
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message