lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francis Yakin <fya...@liquid.com>
Subject RE: SolrJ and Solr web simultaneously?
Date Wed, 26 Aug 2009 21:07:56 GMT
No, we don't want to put at the same box as Database box.

Agree, that indexing/committing/merging and optimizing is the bottle neck.

I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and let's see what
happened to load 3 millions docs.

Thanks

Francis

-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca]
Sent: Wednesday, August 26, 2009 1:34 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

With this configuration probably preferred method is to run standalone Java
application on same box as DB, or very close to DB (in same network
segment).

HTTP is not a bottleneck; main bottleneck is
indexing/committing/merging/optimizing in SOLR...

Just as a sample, if  you submit to SOLR batch of large documents, - expect
5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but
nothing related to network latency nor to firewalling... upload 1Mb over
100Mbps network takes less than 0.1 seconds, but indexing it may take > 0.5
secs...

Standalone application with SolrJ is also good because you may schedule
batch updates etc; automated...


P.S.
In theory, if you are using Oracle, you may even try to implement triggers
written in Java causing SOLR update on each row update (transactional); but
I haven't heard anyone uses stored procs in Java, too risky and slow, with
specific dependencies...




-----Original Message-----
From: Francis Yakin [mailto:fyakin@liquid.com]
Sent: August-26-09 4:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

 We already opened port 80 from solr to DB so that's not the issue, but
httpd(port 80) is very flaky if there is firewall between Solr and DB.
We have Solr master/slaves env, client access the search thru slaves( master
only accept the new index from DB and slaves will pull the new indexes from
Solr master).

We have someone in Development team knows Java and implement JDBC.

We don't share Solr master and DB on the same box, it's separate box and
separate network, port 80 opened between these.

It looks like CommonsHttpSolrServer is better approach than
EmbeddedSolrServer, since we want the Solr Master acting as a solr server as
well.
I just worried that http will be a bottle neck, that's why I prefer JDBC
connection method.

Francis

-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca]
Sent: Wednesday, August 26, 2009 11:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

Do you have firewall between DB and possible SOLR-Master instance? Do you
have firewall between Client application and DB? Such configuration is
strange... by default firewalls allow access to port 80, try to set port 80
for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you
might have; btw  Apache HTTPD with SOLR supports HTTP caching for
SOLR-slaves...

1. SolrJ does not provide multithreading, but instance of
CommonsHttpSolrServer is thread-safe. Developers need to implement
multithreaded application.
2. SolrJ does not use JDBC; developers need to implement...

It requires some Java coding, it is not out-of-the-box Document Import
Handler.

Suppose you have 2 quad-cores, why use single-threaded if we can use
8-threaded... or why wait 5 seconds responce from SOLR if we can use
additional 32 threads doing job with DB at the same time... and why to share
I/O between SOLR and DB?

Diversify, lower risks, having SOLR and DB on same box is extremely
unsafe...

-Fuad


-----Original Message-----
From: Francis Yakin [mailto:fyakin@liquid.com]
Sent: August-26-09 2:25 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: SolrJ and Solr web simultaneously?

Thanks.

The issue we have actually, it could be firewall issue more likely than
network latency, that's why we try to avoid to use http connection.
Fixing the firewall is not an option right now.
We have around 3 millions docs to load from DB to Solr master( first initial
load only) and subsequently we actively adding the new docs to Solr after
the initial load. We prefer to use JDBC connection , so if solrj uses JDBC
connection that might usefull. I also like the multi-threading option from
Solrj. So, since we want the solr Master running as server also
EmbedderSolrServer is not a good better approach for this?

Francis





-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca]
Sent: Wednesday, August 26, 2009 10:56 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ and Solr web simultaneously?

> I don't want or try not to use http connection from Database to Solr
Master because of network latency( very slow).

"network latency" does not play any role here; throughput is more important.
With separate SOLR instance on a separate box, and with separate java
application (SOLR-bridge) querying database and using SolrJ, letency will be
1 second (for instance), but you can fine-tune performance by allocating
necessary amount of threads (depends on latency of SOLR and Oracle, average
doc size, etc), JDBC connections, etc. - and you can reach thousands docs
per second throughput. DIHs only simplify some staff for total beginners...

In addition, you will have nice Admin screen of standalone SOLR-master.

-Fuad
http://www.tokenizer.org



-----Original Message-----
From: Francis Yakin [mailto:fyakin@liquid.com]
Sent: August-26-09 1:41 PM
To: 'solr-user@lucene.apache.org'; Paul Tomblin
Subject: RE: SolrJ and Solr web simultaneously?

I have the same situation now.

If I don't want to use http connection, so I need to use EmbeddedSolrServer
that what I think I need correct?
We have Master/slaves solr, the applications use slaves for search. The
Master only taking the new index from Database and slaves will pull the new
index using snappuller/snapinstaller.

I don't want or try not to use http connection from Database to Solr Master
because of network latency( very slow).

Any suggestions?

Francis

-----Original Message-----
From: Smiley, David W. [mailto:dsmiley@mitre.org]
Sent: Wednesday, August 26, 2009 10:23 AM
To: solr; Paul Tomblin
Subject: Re: SolrJ and Solr web simultaneously?

Once a commit occurs, all data added before it (by any & all clients)
becomes visible to all searches henceforth.

The "web interface" has direct access to Solr, and SolrJ remotely accesses
that Solr.

SolrEmbeddedSolrServer is something that few people should actually use.
It's mostly for embedding Solr without running Solr as a server, which is a
somewhat rare need.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/26/09 1:14 PM, "Paul Tomblin" <ptomblin@xcski.com> wrote:

Is Solr like a RDBMS in that I can have multiple programs querying and
updating the index at once, and everybody else will see the updates
after a commit, or do I have to something explicit to see others
updates?  Does it matter whether they're using the web interface,
SolrJ with a
CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer?


--
http://www.linkedin.com/in/paultomblin








Mime
View raw message