incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <robw...@apache.org>
Subject Re: Word cloud for ooo-dev post subjects
Date Fri, 29 Jun 2012 16:55:40 GMT
On Thu, Jun 28, 2012 at 11:46 PM, Kevin Grignon
<kevingrignon.oo@gmail.com> wrote:
> KG01 - See comments inline.
>
> On Thu, Jun 28, 2012 at 6:08 AM, Rob Weir <robweir@apache.org> wrote:
>
>> On Wed, Jun 27, 2012 at 5:36 PM, Donald Whytock <dwhytock@gmail.com>
>> wrote:
>> > On Wed, Jun 27, 2012 at 4:54 PM, Rob Weir <robweir@apache.org> wrote:
>> >> http://people.apache.org/~robweir/ooo-dev-cloud.png
>> >>
>> >> This looks at the top 1000 terms used in ooo-dev post subjects since
>> >> this project moved to Apache in June 2011.  The only thing I removed
>> >> was "Re:", since that would have dominated the cloud and is machine,
>> >> not user written
>> >>
>>
>
> KG01 - Great stuff Rob. These simple analytics are really interesting. *Would
> you be open to harvesting and sharing the extracted post titles for both
> ooo-dev and ooo-users and sending along to me*. The Python bit is
> unfamiliar to me. The text files, like you did for the twitter feeds would
> be great. Then I can go into wordle and tweak, using a variety of filters.
> Also, I am exploring other analytic tools to parse the data. Thanks.
>

Post titles on the users list tend to be short and not very
informative.  We see a lot that just say "Bug", or "OpenOffice
problem", or "Help".

I can certainly provide the subjects.  But I wonder what would happen
if we took the entire text of each root post (skipping responses)?
That might give greater context.

Twitter tweets are somewhat intermediate, longer than a typical post
title, but shorter than a post.

-Rob

>
>
>> >> In this particular cloud, I used all posts, including responses.  So
>> >> if a term was used in a thread that had many responses, it would have
>> >> additional weight in this chart.
>> >>
>> >> Technologies used:
>> >>
>> >> Python's mailbox API to extract the post titles.  Could have done this
>> >> with any number of command line text tools as well, but it is trivial
>> >> in Python as well:
>> >>
>> >> import mailbox
>> >>
>> >> box = mailbox.mbox(fileName)
>> >>
>> >> for message in box:
>> >>     print message['Subject']
>> >>
>> >>
>> >> Then I used Wordle.net to generate the graphic.
>> >>
>> >> Based on the reaction given to the previous word cloud, I know that
>> >> some list subscribers are curious to see how often we write about
>> >> LibreOffice.  So I'll help you find it in this graphic.  Look for the
>> >> big "AOO", then under that see the "COMMIT".  Under COMMIT you can
>> >> make out LIBREOFFICE, to the left of USERS.
>> >>
>> >> Regards,
>> >>
>> >> -Rob
>> >
>> > Somehow not as stylish in this font.
>> >
>> > "Bug" is visible in this one.  No one tweets about bugs?
>> >
>>
>> I think this is a user/developer difference.  Users talk in more
>> direct terms, about how bugs impact them.    So very few mention a
>> "bug".  But 18 mentions on Twitter of some form of
>> crash/crashed/crashing.   On the ooo-dev list we call these "bugs" or
>> "issues".  Users "lose all their work".  We "debug an exception".  The
>> army "pacifies the village", etc.
>>
>> It is good to remember the difference in impact our work (good or bad)
>> has on others, even though we use more clinical terms on this list.
>>
>> -Rob
>>
>>
>> > Don
>>

Mime
View raw message