drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <...@cloudera.com>
Subject Re: Naming the new ValueVector Initiative
Date Tue, 15 Dec 2015 22:38:25 GMT
For now I have presumptuously moved my C++ prototype to

https://github.com/arrow-data/arrow

I may have some cycles for this over the next few weeks -- it would be
great to develop a draft of the IPC protocol for transmitting table /
row batch metadata and data headers. I am going to be working on
building up enough tools and scaffolding to start assembling a
pandas.DataFrame-like Python wrapper layer which will keep me busy for
a fair while.

Let's decide soon whether we want 1 repo or multiple repos for the
reference implementations (C/C++ and Java). 1 repo might be easier for
integration testing.

I can convert the Google doc spec floating around to Markdown and
perhaps we can discuss specific details in GitHub issues? I'll use a
separate repo for the format docs.

best,
Wes

On Mon, Dec 14, 2015 at 9:43 AM, Wes McKinney <wes@cloudera.com> wrote:
> hi folks,
>
> In the interim I created a new public GitHub organization to host code
> for this effort so we can organize ourselves in advance of more
> progress in the ASF:
>
> https://github.com/arrow-data
>
> I have a partial C++ implementation of the Arrow spec that I can move
> there, along with a to-be-Markdown-ified version of a specification
> subject to more iteration. The more pressing short term matter will be
> making some progress on the metadata / data headers / IPC protocol
> (e.g. using Flatbuffers or the like).
>
> Thoughts on git repo structure?
>
> 1) Avro-style — "one repo to rule them all"
> 2) Parquet-style — arrow-format, arrow-cpp, arrow-java, etc.
>
> (I'm personally more in the latter camp, though integration tests may
> be more tedious that way)
>
> Thanks
>
> On Thu, Dec 3, 2015 at 4:18 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>> I've opened a name search for our top vote getter.
>>
>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-92
>>
>>
>> I also just realized that my previously email dropped other recipients.
>> Here it is below.
>>
>> ----
>> I think we can call the voting closed. Top vote getters:
>>
>> Apache Arrow (17)
>> Apache Herringbone (9)
>> Apache Joist (8)
>> Apache Colbuf (8)
>>
>> I'll up a PODLINGNAMESEARCH-* shortly for Arrow.
>>
>> ---
>>
>>
>>
>>
>>
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Thu, Dec 3, 2015 at 1:23 AM, Marcel Kornacker <marcel@cloudera.com>
>> wrote:
>>>
>>> Just added my vote.
>>>
>>> On Thu, Dec 3, 2015 at 12:51 PM, Wes McKinney <wes@cloudera.com> wrote:
>>> > Shall we call the voting closed? Any last stragglers?
>>> >
>>> > On Tue, Dec 1, 2015 at 5:39 PM, Ted Dunning <ted.dunning@gmail.com>
>>> > wrote:
>>> >>
>>> >> Apache can handle this if we set the groundwork in place.
>>> >>
>>> >> Also, Twitter's lawyers work for Twitter, not for Apache. As such,
>>> >> their
>>> >> opinions can't be taken by Apache as legal advice.  There are issues
of
>>> >> privilege, conflict of interest and so on.
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Dec 2, 2015 at 7:51 AM, Alex Levenson
>>> >> <alexlevenson@twitter.com>
>>> >> wrote:
>>> >>>
>>> >>> I can ask about whether Twitter's lawyers can help out -- is that
>>> >>> something we need? Or is that something apache helps out with in
the
>>> >>> next
>>> >>> step?
>>> >>>
>>> >>> On Mon, Nov 30, 2015 at 9:32 PM, Julian Hyde <jhyde@apache.org>
wrote:
>>> >>>>
>>> >>>> +1 to have a vote tomorrow.
>>> >>>>
>>> >>>> Assuming that Vector is out of play, I just did a quick search
for
>>> >>>> the
>>> >>>> top 4 remaining, (“arrow”, “honeycomb”, “herringbone”,
“joist"), at
>>> >>>> sourceforge, open hub, trademarkia, and on google. There are
no
>>> >>>> trademarks
>>> >>>> for these in similar subject areas. There is a moderately active
>>> >>>> project
>>> >>>> called “joist” [1].
>>> >>>>
>>> >>>> I will point out that “Apache Arrow” has native-american
connotations
>>> >>>> that we may or may not want to live with (just ask the Washington
>>> >>>> Redskins
>>> >>>> how they feel about their name).
>>> >>>>
>>> >>>> If someone would like to vet other names, use the links on
>>> >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-90,
and fill
>>> >>>> out
>>> >>>> column C in the spreadsheet.
>>> >>>>
>>> >>>> Julian
>>> >>>>
>>> >>>> [1] https://github.com/stephenh/joist
>>> >>>>
>>> >>>>
>>> >>>> On Nov 30, 2015, at 7:01 PM, Jacques Nadeau <jacques@dremio.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> +1
>>> >>>>
>>> >>>> --
>>> >>>> Jacques Nadeau
>>> >>>> CTO and Co-Founder, Dremio
>>> >>>>
>>> >>>> On Mon, Nov 30, 2015 at 6:34 PM, Wes McKinney <wes@cloudera.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Should we have a last call for votes, closing EOD tomorrow (Tuesday)?
>>> >>>> I
>>> >>>> missed this for a few days last week with holiday travel.
>>> >>>>
>>> >>>> On Thu, Nov 26, 2015 at 3:04 PM, Julian Hyde <julian@hydromatic.net>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Consulting a lawyer is part of the Apache branding process but
the
>>> >>>> first
>>> >>>> stage is to gather a list of potential conflicts -
>>> >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-90 is
an
>>> >>>> example.
>>> >>>>
>>> >>>> The other part, frankly, is to pick your battles.
>>> >>>>
>>> >>>> A year or so ago Actian re-branded Vectorwise as Vector.
>>> >>>>
>>> >>>>
>>> >>>> http://www.zdnet.com/article/actian-consolidates-its-analytics-portfolio/.
>>> >>>> Given that it is an analytic database in the Hadoop space I
think
>>> >>>> that is
>>> >>>> as close to a “direct hit” as it gets. I don’t think we
need a lawyer
>>> >>>> to
>>> >>>> tell us that. Certainly it makes sense to look for conflicts
for the
>>> >>>> other
>>> >>>> alternatives before consulting lawyers.
>>> >>>>
>>> >>>> Julian
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Nov 25, 2015, at 9:42 PM, Marcel Kornacker <marcel@cloudera.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 24, 2015 at 3:25 PM, Jacques Nadeau <jacques@dremio.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Ok guys,
>>> >>>>
>>> >>>> I don't think anyone is doing a thorough analysis of viaability.
I
>>> >>>> did a
>>> >>>> quick glance and the top one (Vector) seems like it would have
an
>>> >>>> issue
>>> >>>> with conflict of an Actian product. The may be fine. Let's do
a
>>> >>>> second
>>> >>>> phase vote.
>>> >>>>
>>> >>>>
>>> >>>> I'm assuming you mean Vectorwise?
>>> >>>>
>>> >>>> Before we do anything else, could we have a lawyer look into
this?
>>> >>>> Last
>>> >>>> time around that I remember (Parquet), Twitter's lawyers did
a good
>>> >>>> job
>>> >>>> of
>>> >>>> weeding out the potential trademark violations.
>>> >>>>
>>> >>>> Alex, could Twitter get involved this time around as well?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Pick your top 3 (1,2,3 with 3 being top preference)
>>> >>>>
>>> >>>> Let's get this done by Friday and then we can do a podling name
>>> >>>> search
>>> >>>> starting with the top one.
>>> >>>>
>>> >>>> Link again:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=304381532&vpid=A1
>>> >>>>
>>> >>>> thanks
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Jacques Nadeau
>>> >>>> CTO and Co-Founder, Dremio
>>> >>>>
>>> >>>> On Fri, Nov 20, 2015 at 9:24 AM, Jacques Nadeau <jacques@dremio.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Ok, it looks like we have a candidate list (we actually got
11 since
>>> >>>> there was a three-way tie for ninth place):
>>> >>>>
>>> >>>> VectorArrowhoneycombHerringbonejoistV2Pietcolbufbatonimpulsevictor
>>> >>>> Next we need to do trademark searches on each of these to see
whether
>>> >>>> we're likely to have success. I've moved candidates to a second
tab:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=304381532
>>> >>>>
>>> >>>> Anybody want to give a hand in analyzing potential conflicts?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Jacques Nadeau
>>> >>>> CTO and Co-Founder, Dremio
>>> >>>>
>>> >>>> On Mon, Nov 16, 2015 at 12:10 PM, Jacques Nadeau <jacques@dremio.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Everybody should pick their ten favorites using the numbers
1 to 10.
>>> >>>>
>>> >>>> 10 is most preferred
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Jacques Nadeau
>>> >>>> CTO and Co-Founder, Dremio
>>> >>>>
>>> >>>> On Mon, Nov 16, 2015 at 10:17 AM, Ted Dunning <ted.dunning@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>> Single vote for most preferred?
>>> >>>>
>>> >>>> Single transferable vote?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 17, 2015 at 2:50 AM, Jacques Nadeau <jacques@dremio.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Given that a bunch of people added names to the sheet, I'll
take
>>> >>>> that as tacit agreement to the proposed process.
>>> >>>>
>>> >>>> Let's move to the first vote phase. I've added a column for
>>> >>>> everybody's votes. Let's try to wrap up the vote by 10am on
>>> >>>> Wednesday.
>>> >>>>
>>> >>>> thanks!
>>> >>>> Jacques
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Jacques Nadeau
>>> >>>> CTO and Co-Founder, Dremio
>>> >>>>
>>> >>>> On Thu, Nov 12, 2015 at 12:03 PM, Jacques Nadeau <jacques@apache.org
>>> >>>>
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>> Hey Guys,
>>> >>>>
>>> >>>> It sounds like we need to do a little more work on the Vector
>>> >>>> proposal
>>> >>>> before the board would like to consider it. The main point of
>>> >>>> contention
>>> >>>> right now is the name of the project. We need to decide on a
name
>>> >>>> and get
>>> >>>> it signed off through PODLINGNAMESEARCH.
>>> >>>>
>>> >>>> Naming is extremely subjective so I'd like to propose a process
for
>>> >>>> selection that minimizes pain. This is an initial proposal and
>>> >>>>
>>> >>>> We do the naming in the following steps
>>> >>>> - 1: Collect a set of names to be considered
>>> >>>> - 2: Run a vote for 2 days where each member ranks their top
10
>>> >>>> options
>>> >>>> 1..10
>>> >>>> - 3: Take the top ten vote getters and do a basic analysis of
>>> >>>> whether we
>>> >>>> think that any have legal issues. Keep dropping names that have
>>> >>>> this until
>>> >>>> we get with 10 reasonably solid candidate names
>>> >>>> - 5: Take the top ten names and give people 48 hours to rank
their
>>> >>>> top 3
>>> >>>> names
>>> >>>> - 6: Start a PODLINGNAMESEARCH on the top rank one, if that
doesn't
>>> >>>> work,
>>> >>>> try the second and third options.
>>> >>>>
>>> >>>> I suggest we take name suggestions for step 1 from everyone
but then
>>> >>>> constrain the voting to the newly proposed project [1]. We could
>>> >>>> just do
>>> >>>> this in a private email thread but I think doing it on Drill
dev is
>>> >>>> better
>>> >>>> in the interest of transparency. This isn't the perfect place
for
>>> >>>> that but
>>> >>>> I'm not sure a better place exists.
>>> >>>>
>>> >>>> I'm up for changing any or all of this depending on what others
>>> >>>> think. Just
>>> >>>> wanted to get the ball rolling on a proposed process.
>>> >>>>
>>> >>>> If this works, I've posted a doc at [2] that we can use for
step 1.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Jacques
>>> >>>>
>>> >>>> [1] List of proposed new project members/voters: Todd Lipcon,
Ted
>>> >>>> Dunning,
>>> >>>> Michael Stack, P. Taylor Goetz, Julian Hyde, Julien Le Dem,
Jacques
>>> >>>> Nadeau,
>>> >>>> James Taylor, Jake Luciani, Parth Chandra, Alex Levenson, Marcel
>>> >>>> Kornacker,
>>> >>>> Steven Phillips, Hanifi Gunes, Wes McKinney, Jason Altekruse,
David
>>> >>>> Alves,
>>> >>>> Zain Asgar, Ippokratis Pandis, Abdel Hakim Deneche, Reynold
Xin.
>>> >>>> [2]
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> https://docs.google.com/spreadsheets/d/1q6UqluW6SLuMKRwW2TBGBzHfYLlXYm37eKJlIxWQGQM/edit#gid=0
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Alex Levenson
>>> >>> @THISWILLWORK
>>> >>
>>> >>
>>
>>

Mime
View raw message