cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artur Siekielski <a...@vhex.net>
Subject Re: Tag filtering data model
Date Sat, 19 Sep 2015 10:34:57 GMT
I came to a similar conclusion, that is if you have more than a few 
tags, then the problem is no more simple "tagging" but more like regular 
"document search" with indexed words. There are too many word subsets to 
precompute matching documents, so you need to index documents 
individually and compute intersections dynamically. And for acceptable 
performance you need indexes stored fully in memory in data structures 
allowing computing intersections fast. This is not something regular 
databases implement (but they can be used as backing storage for indexes 
loaded into memory).

So the solution is to either limit the number of tags to 3-4 and do full 
denormalization (up to 8-16 times duplication factor) or use a search 
engine.

On 09/16/2015 11:29 AM, Naresh Yadav wrote:
> We also had similar usecase, after lot of trials with cassandra, we
> finally created solr schema doc_id(unique key), tags(indexed)
> in apache solr for answering search query "Get me matching docs by any
> given no of tags" and that solved our usecase. We had usecase of
> millions of docs and in tags we can have 100's of tags on a doc.
>
> Please share your final conclusion if you crack this problem within
> cassandra only, would be interested to know your solution.
>


Mime
View raw message