lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Contrib module for Document Clustering
Date Thu, 07 Apr 2016 15:33:13 GMT
My gut instinct is that it's a hard path you're considering. There is the
logistics of sharding by document similarity on both the indexing side and
query side. Even if you pull that off, it would be extremely difficult to
know if you're getting good results and really hard to fix if you're not
getting good results.

I would check the search performance you're getting on each shard. It may
very be that you just need to speed up the searches on the shards
themselves, rather then trying to limit the search to a subset of shards.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Apr 7, 2016 at 12:10 AM, davidphilip cherian <
davidphilipcherian@gmail.com> wrote:

> Hi Joel,
>
> Right now, we are (web) crawling almost 85millions of documents and this
> can increase to double. Collection is plainly divided into shards and so
> while searching, its search across all shards.
> If it is possible for a system to distributed documents into shards based
> on documents similarity, and at search time, analyze the query and search
> across these shards, it can improve search time performance and reduce
> resource utilization as well.  Let me know your thoughts. Use Case: Since
> this is a web search kind of data, both false positives and false negatives
> to an extent should be fine.
>
>
>
> On Wed, Apr 6, 2016 at 11:18 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > I don't know of any contrib or module that does this. Can you describe
> why
> > you'd want to route documents to shards based on similarity? What
> > advantages would you get by using this approach?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Apr 6, 2016 at 1:36 PM, davidphilip cherian <
> > davidphilipcherian@gmail.com> wrote:
> >
> > > Any thoughts?
> > >
> > >
> > > On Tue, Apr 5, 2016 at 9:05 PM, davidphilip cherian <
> > > davidphilipcherian@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there any contribution(open source contrib module) that routes
> > > > documents to shards based on document similarity technique? Or any
> > > > suggestions that integrates mahout to solr for this use case?
> > > >
> > > > From what I know, currently there are two document route strategies
> as
> > > > explained here
> > > > https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/.
> > But
> > > > Is there anything else that I'm missing?
> > > >
> > > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message