lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prateek Jain J <prateek.j.j...@ericsson.com>
Subject DataImportHandler | Query | performance
Date Fri, 23 Dec 2016 12:15:35 GMT

Hi All,

We need some advice/views on the way we push our documents in SOLR (4.8.1). So, here are the
requirements:


1.       Document could be from 5 to 100 KB in size.

2.       10-50 users actively querying solr with different sort of data.

3.       Data will be available frequently to be pushed to solr (streaming). It must be available
with-in 15 seconds to be queried.

Current scenario:
      We dump data to a json file and have a cron job (in java, each time a new file is created)
which reads this file periodically and sends it to SOLR using solrj (via http). This file
is massive and could be of size ~GBs in some cases (soft and hard solr commits are configured
appropriately).

Issue:

1.       Multiple cores exist in this SOLR and they too follow similar pattern.

2.       This causes SOLR to hang and cause OOM in some cases due to, too many FIleDescriptors
opened (sometimes, due to other issues)

We would like to know if using DataImportHandler give us any advantage? I just gave a quick
glance on Solr Wiki but not clear if it offers any advantages in terms of performance (in
this scenario).


Regards,
Prateek Jain


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message