couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: multiview on github
Date Tue, 21 Sep 2010 01:22:57 GMT
Norman,

  Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. Think of it as
the contents of my /etc directory.

  The indexer I'm chipping away at is just a proof of concept hacked up from Joe Armstrong's
Erlang book (with his permission). Anyone is welcome to use it that as they see fit, though
it does have restrictions from Armstrong press. It's been great for me to learn erlang and
explore the couch internals. It's also nice to have something nice and light running in couch.

  My thoughts about plugins have nothing to do with licenses. I'd like the fact that couchdb
is simple and lean and more rock solid. I'm not sure multiview, geocouch, fti, or any other
indexers belong in the core. With multiview I think there's perhaps something more general
that might be part of core but I haven't given it a lot of thought yet.

Cheers,

Bob




On Sep 20, 2010, at 7:02 PM, Norman Barker wrote:

> Bob,
> 
> I can see why plugins might work for you since your ontology /
> indexing code is GPL, however I am more than happy for the multiview
> to be apache licensed and would like to see it in trunk.
> 
> I like the concept of plugins as it creates a stable API for third
> parties, but I think a multiview is a core feature of CouchDB.
> 
> Norman
> 
> On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne
> <dionne@dionne-associates.com> wrote:
>> I see, neat.
>> 
>> I ask because you might treat disjunction and conjunction  differently in terms of
whether you run around the ring or broadcast to all the nodes. For conjunctions you need all
to succeed so broadcast might fare better whereas for disjunctions only one need succeed.
I suppose it would depend largely on the number of views and the amount of each computation.
>> 
>> Anyway I guess I have mixed feelings about seeing this in core. I see a lot of folks
already struggling to get their arms around working with map/reduce. It would make a good
plugin for advanced users. Actually the ability to have plugins is almost there now. I have
an indexer that only requires some ini file mods and getting the code on the classpath. I
think all that's needed at this point is:
>> 
>> 1. conventions for a plugins directory
>> 
>> 2. way of specing gen_servers in order to supervise them
>> 
>> 3. some apis around some of the internals.
>> 
>> I'm oversimplifying it for sure, the devils in the details and it's the kind of thing
programmers love to argue about ad nauseum but no one wants to do it (myself included :)
>> 
>> Best,
>> 
>> Bob
>> 
>> 
>> 
>> On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:
>> 
>>> Bob,
>>> 
>>> it is just checking that a given id participates in a view, if it
>>> makes it around the ring then it wins and gets streamed to the client,
>>> adding disjoints would be fairly simple. Currently the only way I can
>>> check if an id is in a view is to loop over the results of each view,
>>> hence each node in the ring is in its own process to keep things
>>> moving.
>>> 
>>> A use case is two views, one that emits datetime (numeric) and another
>>> view that emits values, e.g. A, B, C ..., the query would then be to
>>> find the all documents with value A between start time and end time.
>>> 
>>> Norman
>>> 
>>> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
>>> <dionne@dionne-associates.com> wrote:
>>>> I took another peek at this and I'm curious as to what it's doing. Is it
just checking that a given id participates in a view? So if it makes it around the ring it
wins? Or is it actually computing the result of passing the doc thru all the views?
>>>> 
>>>> If the answer is the former then would disjunction also be something one
might want? I'm just curious, I don't have a use case and I forget the original discussion
around this. I sort of think of views as a functional mapping from the database to some subset.
That's not entirely accurate given there's this reduce phase also. So I could imagine composing
views in a functional way, but the same thing can be had with just a different map function
that is the composition.
>>>> 
>>>> Anyway if you have a brief description of this, with a use case,  it would
help.
>>>> 
>>>> Cheers,
>>>> 
>>>> Bob
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>>>> 
>>>>> Chris, James
>>>>> 
>>>>> thanks for bumping this, we are using this internally at 'scale'
>>>>> (million+ keys). I want this to work for couchdb as we want to give
>>>>> back for such a great product and support this going forward, so any
>>>>> suggestions welcomed and we will test and add them to the local github
>>>>> account with the aim of getting this into trunk.
>>>>> 
>>>>> Norman
>>>>> 
>>>>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton <theboss@purplebulldog.com>
wrote:
>>>>>> I want to use it!  I just haven't gotten around to it.  I was going
to try
>>>>>> and test it out this weekend and if I am able, I will certainly report
back
>>>>>> what I find.
>>>>>> 
>>>>>> James
>>>>>> 
>>>>>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson <jchris@apache.org>
wrote:
>>>>>> 
>>>>>>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker <norman.barker@gmail.com>
>>>>>>> wrote:
>>>>>>>> Bob,
>>>>>>>> 
>>>>>>>> I can and have been testing the multiview at this scale,
it is ok
>>>>>>>> (fast enough), but I think being able to test inclusion of
a document
>>>>>>>> id in a view without having to loop would be a considerable
speed
>>>>>>>> improvement. If you have any ideas let me know.
>>>>>>>> 
>>>>>>> 
>>>>>>> I just want to bump this thread, as I think this is a useful
feature.
>>>>>>> I don't expect to be able to test it in the coming weeks, but
if I did
>>>>>>> I would. Is anyone besides Norman using this? Has anyone used
it at
>>>>>>> scale?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> 
>>>>>>>> Norman
>>>>>>>> 
>>>>>>>> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson <robert.newson@gmail.com>
>>>>>>> wrote:
>>>>>>>>> I'm sorry, I've had no time to play with this at scale.
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker <norman.barker@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> are there any more comments on this, if not can you
describe the
>>>>>>>>>> process (in particular how to obtain a wiki and jira
account for
>>>>>>>>>> couchdb which I have been unable to do) and I will
start documenting
>>>>>>>>>> this so we can put this into the trunk.
>>>>>>>>>> 
>>>>>>>>>> Bob, were you able to do any more testing with large
views, are there
>>>>>>>>>> any suggestions on how to speed up the document id
inclusion test as
>>>>>>>>>> described below?
>>>>>>>>>> 
>>>>>>>>>> thanks,
>>>>>>>>>> 
>>>>>>>>>> Norman
>>>>>>>>>> 
>>>>>>>>>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
>>>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>>>> Bob,
>>>>>>>>>>> 
>>>>>>>>>>> thanks for the feedback and for taking a look
at the code. Guidelines
>>>>>>>>>>> on when to use a supervisor within couchdb with
a gen_server would be
>>>>>>>>>>> appreciated, currently I have a supervisor and
a gen_server, but if
>>>>>>>>>>> couchdb has a supervision process I could remove
that layer.
>>>>>>>>>>> 
>>>>>>>>>>> I think plugins is a great idea, however intersection
of views is such
>>>>>>>>>>> as common request, perhaps there needs to plugin
system and if a
>>>>>>>>>>> plugin is rated enough it goes into trunk as
a core feature.
>>>>>>>>>>> 
>>>>>>>>>>> the four (or slightly more) summary is here
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>>>>>>>>>> 
>>>>>>>>>>> %
>>>>>>>>>>> % send an id from the start list to the next
node in the ring, if the
>>>>>>>>>>> id is in adjacent node then the this node sends
to the next ring node
>>>>>>>>>>> ....
>>>>>>>>>>> % if the id gets all round the ring and back
to the start node then is
>>>>>>>>>>> has intersected all queries and should be included.
The nodes in the
>>>>>>>>>>> ring
>>>>>>>>>>> % should be sorted in size from small to large
for this to be
>>>>>>> effective
>>>>>>>>>>> %
>>>>>>>>>>> % In addition send the initial id list round
in parallel
>>>>>>>>>>> 
>>>>>>>>>>> it really needs some eyes from the core couchdb
coders to see how to
>>>>>>>>>>> speed up the inclusion testing, looping is bad
even if it is done in
>>>>>>>>>>> parallel.
>>>>>>>>>>> 
>>>>>>>>>>> Multiview is usable, I am using it with some
pretty big mega-views (as
>>>>>>>>>>> per the raindrop) model, I am also available
to add features to this
>>>>>>>>>>> as this is core part of our work and we want
to give it to couch as a
>>>>>>>>>>> contribution.
>>>>>>>>>>> 
>>>>>>>>>>> thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Norman
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>>>>>>>>>> <dionne@dionne-associates.com> wrote:
>>>>>>>>>>>> Hi Norman,
>>>>>>>>>>>> 
>>>>>>>>>>>>  I took a peek at multiview. I haven't followed
this too closely on
>>>>>>> the mailing list but this is *view intersection*? Is there a
5 line summary
>>>>>>> of what this does somewhere?
>>>>>>>>>>>> 
>>>>>>>>>>>>  I'm curious as to why the daemon needs to
be a supervisor, most if
>>>>>>> not all of the other daemons are gen_servers. OTP allows this
but I think
>>>>>>> this is a good area where some CouchDB guidelines on plugins
would apply.
>>>>>>>>>>>> 
>>>>>>>>>>>>  It strikes me that views, the use of map/reduce,
etc. are one of the
>>>>>>> trickier aspects of using CouchDB, particularly for new users
coming from
>>>>>>> the SQL world. People are also reporting issues with performance
of views, I
>>>>>>> guess often because reduce functions go out of control.
>>>>>>>>>>>> 
>>>>>>>>>>>>  I think the project would be better served
if features like this
>>>>>>> were available as plugins. I would put GeoCouch in the same category.
Its
>>>>>>> very neat and timely (given everyone wants to know where everyone
else is
>>>>>>> using their telephone but without talking other than asynchronously),
but a
>>>>>>> server plugin architecture that would allow this to be done cleanly
should
>>>>>>> come first.
>>>>>>>>>>>> 
>>>>>>>>>>>>  This is just my opinion. I'd love to see
some of the project
>>>>>>> founders and committers weigh in on this and set some direction.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> 
>>>>>>>>>>>> Bob
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Aug 22, 2010, at 5:45 PM, Norman Barker
wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I would like to take this multiview code
and have it added to trunk
>>>>>>> if
>>>>>>>>>>>>> possible, what are the next steps?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Norman
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Aug 18, 2010 at 11:44 AM, Norman
Barker <
>>>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>>>>>>> I have made
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://github.com/normanb/couchdb
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> which is a fork of the latest couchdb
trunk with the multiview code
>>>>>>>>>>>>>> and tests added.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If geocouch is available then it
can still be used.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are a couple of questions about
the multiview on the user
>>>>>>> /dev
>>>>>>>>>>>>>> list so I will be adding some more
test cases during today.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Norman
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Aug 17, 2010 at 9:23 PM,
Norman Barker <
>>>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>>>>>>>> this is possible, I forked geocouch
since I use it, but I have
>>>>>>> already
>>>>>>>>>>>>>>> separated the geocouch dependencies
from the trunk.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I can do this tomorrow, certainly
be interested in any feedback.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Norman
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Aug 17, 2010 at 7:49
PM, Volker Mische <
>>>>>>> volker.mische@gmail.com> wrote:
>>>>>>>>>>>>>>>> On 08/18/2010 03:26 AM, J
Chris Anderson wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Aug 16, 2010, at 4:38
PM, Norman Barker wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I have made the changes
as recommended, adding a test case
>>>>>>>>>>>>>>>>>> multiview.js and
also adding the userCtx to open the db.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I have also forked
geocouch and this is available here
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> this patch seems important
(especially as people are already
>>>>>>> asking for
>>>>>>>>>>>>>>>>> help using it on user@)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> to get it committed,
it either must remove the dependency on
>>>>>>> GeoCouch, or
>>>>>>>>>>>>>>>>> become part of CouchDB
when (and if) GeoCouch becomes part of
>>>>>>> CouchDB.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is it possible / useful
to make a version that doesn't use
>>>>>>> GeoCouch? And
>>>>>>>>>>>>>>>>> then to make the GeoCouch
capabilities part GeoCouch for now?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Norman,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> if the patch is ready for
trunk, I'd be happy to move the
>>>>>>> GeoCouch bits to
>>>>>>>>>>>>>>>> GeoCouch itself (as GeoCouch
isn't ready for trunk yet).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Lately I haven't been that
responsive when it comes to GeoCouch,
>>>>>>> but that
>>>>>>>>>>>>>>>> will change (in about a month)
after holidays and FOSS4G.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>  Volker
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Chris Anderson
>>>>>>> http://jchrisa.net
>>>>>>> http://couch.io
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message