lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Taylor <Russell.Tay...@interactivedata.com>
Subject RE: Solr Join between two indexes taking too long.
Date Wed, 09 Sep 2015 11:10:37 GMT
Hi Mikhail,



- is it possible to keep both type of data at the same core? Why not?

We have two separate feeds populating what is mostly distinct data at different times, hence
the two indexes. IndexB is also used by other products which don’t need any data from indexA.

- can you manually shard both indices by those longValues?

I’m not sure what you mean by this, can you show an example?

- It seems like you query a plenty of data, don't you have another query/filter to intersect
that join result with?

I hoped this was the answer too but sadly no, the only common field is this long.



Such a long time for "universe of 5 docs" seems really strange

Yes I would have thought the filter on indexB would speed things up, but no difference. The
original join field was an alphanumeric, when we used that it took an extra 15 seconds to
process.



We are running 4.10.3 can we do the {!join ... score=none} with that version?



Do you have a link to your talk at Berlin Buzzwords?



I’ve got the developer to describe what he’s trying to do, hopefully this will help in
what we are trying to do.

#######################################################################################################

We have a massive index describing the contents of our client portfolios.  Each client portfolio
can contain between 1 and 1 million securities.  We have approximately 8000 portfolios thus
the index has approximately 250 million documents.  And, as would be expected, a particular
security can be held in many client portfolios.  Each document contains the portfolio id and
security id.



We have another large index holding security information, containing about 30 million entries.
 Each document has the security id.



We are trying, via Solr, to do the following query expressed, for convenience, in sql:



select * from 'security information' where portfolio id = 'X'.



Of course, this is a simple idea in sql, one simply joins the two indexes on security id.
 However, when we perform a solr join our response time is approximately 50 seconds, and changing
the 'start' position causes another 50 second query (there seems to be no caching of the work
performed in the initial query).



How can we speed this query up significantly?



How can we force Solr to cache the initial expensive query so that subsequent changes to the
start parameter are fast?





Thanks





Russ.



-----Original Message-----
From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com]
Sent: 08 September 2015 23:08
To: solr-user
Subject: Re: Solr Join between two indexes taking too long.



Hello Russ,



It's an interesting case! Can you get a brief context?

- is it possible to keep both type of data at the same core? Why not?

- can you manually shard both indices by those longValues?

- It seems like you query a plenty of data, don't you have another query/filter to intersect
that join result with?



Such a long time for "universe of 5 docs" seems really strange. Can you open the index with
Solr 5.3 and run the same query with number of result in universe:universeValue, but adding
local param {!join ... score=none}?

that triggers alternative algorithm.



Also, profiler snapshots always help, you know. I've given a brief intro in join algorithms,
and problems in Solr at recent Berlin Buzzwords, feel free to have a look if you are interested.



On Tue, Sep 8, 2015 at 3:09 PM, Russell Taylor < Russell.Taylor@interactivedata.com<mailto:Russell.Taylor@interactivedata.com>>
wrote:



> Hi,

>  I hope somebody can help.

>

> We have two indexes, one which holds the descriptive data and the

> other one which holds lists of docs which are of a certain type

> (called universes in our world). They need to be joined together to

> show a list of data from indexA where a filtered indexB (by

> universe:value) has matching longs (The join field).

>

> At the moment the query is taking 55 seconds we need to get it under a

> second, any help most appreciated.

>

> INDEXES:

>

> Index a (primary index)

> 31 million docs with a converted alphanumeric to a long value with a

> possible 10 million unique values.

>

> Index B (the joined index)

> 250 million documents with a converted alphanumeric to a long value

> with a possible 10 million unique values.

> IndexB is filtered by universe which could be between 1 and 500,000 docs.

>

> QUERY:

>

> http://127.0.0.1:8080/solr/indexA/select?q={!join+from=longValue+to=lo<http://127.0.0.1:8080/solr/indexA/select?q=%7b!join+from=longValue+to=lo>

> ngValue+fromIndex=IndexB}universe

> :<

> http://127.0.0.1:8080/solr/indexA/select?q=%7b!join+from=longValue+to=

> longValue+fromIndex=IndexB%7duniverse

> :>universeValue

>

> Qtime is 55 seconds for either a universe of 5 docs or 500,000 docs.

>

>

>

> Thanks

>

>

> Russ.

>

>

> *******************************************************

> This message (including any files transmitted with it) may contain

> confidential and/or proprietary information, is the property of

> Interactive Data Corporation and/or its subsidiaries, and is directed

> only to the addressee(s). If you are not the designated recipient or

> have reason to believe you received this message in error, please

> delete this message from your system and notify the sender

> immediately. An unintended recipient's disclosure, copying,

> distribution, or use of this message or any attachments is prohibited and may be unlawful.

> *******************************************************

>







--

Sincerely yours

Mikhail Khludnev

Principal Engineer,

Grid Dynamics



<http://www.griddynamics.com>

<mkhludnev@griddynamics.com<mailto:mkhludnev@griddynamics.com>>


*******************************************************
This message (including any files transmitted with it) may contain confidential and/or proprietary
information, is the property of Interactive Data Corporation and/or its subsidiaries, and
is directed only to the addressee(s). If you are not the designated recipient or have reason
to believe you received this message in error, please delete this message from your system
and notify the sender immediately. An unintended recipient's disclosure, copying, distribution,
or use of this message or any attachments is prohibited and may be unlawful. 
*******************************************************
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message