lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaniv Ben Yosef <yani...@gmail.com>
Subject Re: Implementing filtering based on multiple fields
Date Thu, 07 Jan 2010 23:55:18 GMT
Thanks Otis.

If I understand correctly - Bixo, Nutch and Droids are technologies to use
for crawling the web and building an index. My project is actually about
indexing a large database, where you can think of every row as a web page,
and a particular column is the equivalent of a web site. (I didn't mention
that in the previous post because I didn't want to complicate my question,
and it seems equivalent to Google CSE given that Lucene can use virtually
any input for indexing, AFAIK)
Therefore I'm not sure if the frameworks you've mentioned are applicable to
my project as they seem to be related to web page indexing, but perhaps I'm
missing something.
Also, what did you mean about isolating users and their data/indices. Did
you mean that I should create a separate index per user?

Thanks again!

On Fri, Jan 8, 2010 at 12:35 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> For something like CSE, I think you want to isolate users and their
> data/indices.
>
> I'd look at Bixo or Nutch or Droids ==> Lucene or Solr
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: Yaniv Ben Yosef <yanivby@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Thu, January 7, 2010 3:54:22 PM
> > Subject: Implementing filtering based on multiple fields
> >
> > Hi,
> >
> > I'm very new to Lucene. In fact, I'm at the beginning of an evaluation
> > phase, trying to figure whether Lucene is the right fit for my needs.
> > The project I'm involved in requires something similar to the Google
> Custom
> > Search Engine (CSE). In CSE, each user can
> > define a set (could be a large set) of websites, and limit the search to
> > only those websites. So for example, I can create a CSE that searches all
> > web pages on cnn.com, msnbc.com and nytimes.com only.
> > I am trying to understand whether and how I can do something similar in
> > Lucene.
> >
> > The FAQ hints about this possibility
> > here,
> > but it mentions a class that no longer exists in 3.0 (QueryFilter), and
> is
> > very laconic about the suggested options. Also I'm not sure how well it
> will
> > perform in my use case (or even if it fits at all).
> > I thought about creating a separate index for each user or CSE. However,
> my
> > system should be able to handle tens of thousands of concurrent users. I
> > haven't done any analysis yet on how this will affect CPU, RAM, I/O and
> > storage size, but was wondering if any of you experienced Lucene
> > users/developers think it's a good direction.
> > If that's not a good idea, what would be a good strategy here?
> >
> > Any help will be much appreciated,
> > Yaniv
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message