cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Cassandra row ordering best practice Modeling
Date Thu, 22 Jan 2015 20:49:54 GMT
You get it :D

 This is the real issue. However it's quite an extreme case. If you can
guarantee that there will be a minimum X articles per day and per country,
the maximum number of request to fetch 100 articles will be bounded.

 Furthermore, do not forget that SELECT statement using a partition key
will leverage bloom filters so in case of true negative (no article for a
day) Cassandra will not touch disk

On Thu, Jan 22, 2015 at 9:30 PM, SEGALIS Morgan <msegalis@gmail.com> wrote:

> Oh yeah, I though about it, even raised the reflexion on the first mail,
>
> "Let's say I want to show only 100 of the newer articles, I'll get the
> today's articles, and if it does not fill the request (too few articles),
> I'll check the day before that, etc..."
>
> but your answer raised another issue I did not though of before :
> - going back on previous days, let's say I want 100 newest articles
> - If there is at most 1 article per day, and some 0, I will have do more
> 100+ queries to get all the posts, won't it be a little too much ?
>
> 2015-01-22 20:47 GMT+01:00 DuyHai Doan <doanduyhai@gmail.com>:
>
>> well, if the current day bucket does not contain enough article, you may
>> need to search back in the previous day. If the previous day does not have
>> any article, you may need to go back time a day before ... and so on ...
>>
>>  Of course it's a corner case but I've seen some code that misses this
>> scenario and ends up in an infinite loop back in time ...
>>
>> On Thu, Jan 22, 2015 at 8:41 PM, SEGALIS Morgan <msegalis@gmail.com>
>> wrote:
>>
>>> Hi DuyHai,
>>>
>>> if there is 0 article, the row will obviously not exist I guess... (no
>>> article insertion will create the row)
>>> What is bugging you exactly ?
>>>
>>> 2015-01-22 20:33 GMT+01:00 DuyHai Doan <doanduyhai@gmail.com>:
>>>
>>>> Hello Morgan
>>>>
>>>>  The data model looks reasonable. Bucketing by day will help you to
>>>> scale. The only thing I can see is how to go back in time to fetch articles
>>>> from previous buckets (previous days). It is possible to have 0 article for
>>>> a country for a day ?
>>>>
>>>>
>>>> On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <msegalis@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry, I copied/pasted the question from another platform where you
>>>>> don't generally say hello,
>>>>>
>>>>> So : Hello everyone,
>>>>>
>>>>>
>>>>> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <msegalis@gmail.com>:
>>>>>
>>>>>> I have a column family that store articles. I'll need to get those
>>>>>> articles from the most recent to the oldest, getting them from Country,
and
>>>>>> of course the ability to limit the number of fetched articles.
>>>>>>
>>>>>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>>>>>> dynamic columns
>>>>>>
>>>>>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and
>>>>>> the articles day's date so something like : US-20150118 or FR-20141230
--
>>>>>> (XX-YYYYMMDD)
>>>>>>
>>>>>> In those Row, the column name would be the timeuuid of the article,
>>>>>> and the value is the article's ID.
>>>>>>
>>>>>> It would probably get a thousand of articles per day for each country.
>>>>>>
>>>>>> Let's say I want to show only 100 of the newer articles, I'll get
the
>>>>>> today's articles, and if it does not fill the request (too few articles),
>>>>>> I'll check the day before that, etc...
>>>>>>
>>>>>> Is that the best practice, or does someone has a better idea for
this
>>>>>> purpose ?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Morgan SEGALIS
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Morgan SEGALIS
>>>
>>
>>
>
>
> --
> Morgan SEGALIS
>

Mime
View raw message