esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vassil Dichev <vdic...@apache.org>
Subject Re: ESME-267 - Pooled links in popular links list
Date Fri, 03 Sep 2010 05:11:03 GMT
Well, it's obvious that if the XML couldn't parse the XML, then it's
not valid XML. Add to this the fact that it contains a "meta" tag and
it's clear that what's at the other end of the link was actually HTML.
If you go to http://www.nytimes.com/rss, you'll notice this yourself.

The real feed can be found by following the browser feed icon in the
URL bar. It takes you to what the page describes as a feed in the tag
<link rel="alternate" type="application/rss+xml" ... />. If you follow
it, you will get to this URL:

http://feeds.nytimes.com/nyt/rss/HomePage

Try it, it should work.

Vassil


On Fri, Sep 3, 2010 at 2:35 AM, Imtiaz Ahmed H E <in.imtiaz@gmail.com> wrote:
> I have an action
>
> every 1 mins   rss:http://www.nytimes.com/rss
>
> feed link got thro' right-click on RSS icon at bottom of nytimes.com and
> "copy link location" in windows context menu and paste into esme with rss:
> prepended.
>
> and,
>
> every 1 mins atom:http://twitter.com/statuses/user_timeline/esjewett.atom
>
> and I get in the Tomcat Window,
>
> WARN - Going to buffer response body of large or unknown size. Using
> getResponse
> BodyAsStream instead is recommended.
> :37:31: expected closing tag of meta
>               <a
> href="http://www.nytimes.com/pages/sports/index.html">Sports<
> /a>
>                             ^
>
>
> and, apparently for the twitter feed,
>
> WARN - Going to buffer response body of large or unknown size. Using
> getResponse
> BodyAsStream instead is recommended.
> :37:31: expected closing tag of meta
>               <a
> href="http://www.nytimes.com/pages/sports/index.html">Sports<
> /a>
>                             ^
> WARN - Cookie rejected: "$Version=0; k=122.167.31.233.1283470309076286;
> $Path=/;
> $Domain=.twitter.com". Illegal domain attribute ".twitter.com". Domain of
> origi
> n: "twitter.com"
> WARN - Cookie rejected: "$Version=0;
> _twitter_sess=BAh7CDoPY3JlYXRlZF9hdGwrCNq2y
> tQqAToHaWQiJWRjMTNkMjQ2ZGRiYjA2%250AOTU1ZGZjMTc1NjMxMTZhN2I4IgpmbGFzaElDOidBY3Rp
> b25Db250cm9sbGVy%250AOjpGbGFzaDo6Rmxhc2hIYXNoewAGOgpAdXNlZHsA--f65405470eedc4a64
> defa69a0e78d22bd676cc0c; $Path=/; $Domain=.twitter.com". Illegal domain
> attribut
> e ".twitter.com". Domain of origin: "twitter.com"
>
>
>
> No feed updates in Esme.
>
> Vassil, would you want to fix this or shall I look into it.
>
> Imtiaz
>
> ----- Original Message ----- From: "Vassil Dichev" <vdichev@apache.org>
> To: <esme-dev@incubator.apache.org>
> Sent: Friday, September 03, 2010 1:29 AM
> Subject: Re: ESME-267 - Pooled links in popular links list
>
>
>> Fixed. Now if you post the same link to a pool and to the public, you
>> will notice that the href attribute points to the internal shortened
>> URL in the former case and to the target URL in the latter case. This
>> means that popularity statistics will only be gathered when links on
>> public messages are clicked.
>>
>> An unique ID is still generated for all URLs but for links in pooled
>> messages they're not visible.
>>
>> This should fix the problem. Does someone want to verify that we have
>> indeed the correct behavior?
>>
>> Vassil
>>
>>
>> On Wed, Sep 1, 2010 at 8:18 AM, Richard Hirsch <hirsch.dick@gmail.com>
>> wrote:
>>>
>>> Sounds like a god idea.
>>>
>>> D.
>>>
>>> On 8/31/10, Vassil Dichev <vdichev@apache.org> wrote:
>>>>
>>>> Right, we just don't generate and store a unique ID for links in pools
>>>> and will generate a different object on parsing. This way links which
>>>> come from pools will point directly to the target URL and links from
>>>> public messages will be redirected through the internal shortened URL,
>>>> which will allow statistics to be collected. This won't break any
>>>> functionality and I think it could be done fairly easily.
>>>>
>>>> I will assign ESME-267 to me if nobody objects to the proposed solution.
>>>>
>>>> Vassil
>>>>
>>>>
>>>> On Tue, Aug 31, 2010 at 9:20 PM, Richard Hirsch <hirsch.dick@gmail.com>
>>>> wrote:
>>>>>
>>>>> Leave original link but just don't add it to PopularLinks.
>>>>>
>>>>> On 8/31/10, Ethan Jewett <esjewett@gmail.com> wrote:
>>>>>>
>>>>>> Oh, I see. Yes, that would make sense. So we would just leave the
>>>>>> original link in there, right?
>>>>>>
>>>>>> Ethan
>>>>>>
>>>>>> On Tue, Aug 31, 2010 at 8:12 PM, Richard Hirsch
>>>>>> <hirsch.dick@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I agree with the solution of just removing those links that originate
>>>>>>> in
>>>>>>> pools.
>>>>>>>
>>>>>>> D.
>>>>>>>
>>>>>>> On 8/31/10, Vassil Dichev <vdichev@apache.org> wrote:
>>>>>>>>
>>>>>>>> OK, I think this is a worse example, because there are many
ways to
>>>>>>>> find a list of URLs in a wiki (which were generally just
not
>>>>>>>> designed
>>>>>>>> with privacy/security in mind).
>>>>>>>>
>>>>>>>> If you're willing to sacrifice convenience for security,
the easiest
>>>>>>>> change is not to parse URLs in messages in pools- it will
appear as
>>>>>>>> normal text, not as a hyperlink. The next thing we can do
is set up
>>>>>>>> a
>>>>>>>> different type of URL which doesn't take you to the shortened
URL,
>>>>>>>> but
>>>>>>>> directly to the target URL.
>>>>>>>>
>>>>>>>> If one really insists on shortening URLs in pools, then there
must
>>>>>>>> be
>>>>>>>> one set of shortened URLs per pool. I don't think anyone
will claim
>>>>>>>> that this idea makes sense.
>>>>>>>>
>>>>>>>> Vassil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 31, 2010 at 11:35 AM, Ethan Jewett <esjewett@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I agree in theory with your assessment of the google
docs
>>>>>>>>> situation,
>>>>>>>>> but I still think we're violating the expectation of
security
>>>>>>>>> around
>>>>>>>>> pools.
>>>>>>>>>
>>>>>>>>> Take another example: An HR department is using a secure
wiki to
>>>>>>>>> discuss and organize an upcoming layoff. The wiki page
is titled
>>>>>>>>> "October layoff planning" and the URL is
>>>>>>>>> https://hrwiki.corp.internal/October-layoff-planning.
Someone posts
>>>>>>>>> this URL to the layoff-planning pool on esme (the same
group of
>>>>>>>>> people
>>>>>>>>> with access to the wiki page) and a bunch of people in
the pool
>>>>>>>>> click
>>>>>>>>> on it. Suddenly, the upcoming layoff has been announced
to every
>>>>>>>>> esme
>>>>>>>>> user in the corporation. Whoops!
>>>>>>>>>
>>>>>>>>> The point is, maybe that private information shouldn't
be in the
>>>>>>>>> URL,
>>>>>>>>> but a lot of applications do this whether or not it is
a good idea.
>>>>>>>>> I
>>>>>>>>> think we need to take that reality into account and change
the way
>>>>>>>>> this works to avoid the possibility of these scenarios.
>>>>>>>>>
>>>>>>>>> Ethan
>>>>>>>>>
>>>>>>>>> On Tuesday, August 31, 2010, Vassil Dichev <vdichev@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Ethan, this defeats the purpose of having an URL
shortener and it
>>>>>>>>>> only
>>>>>>>>>> gives you a false sense of security. Read my previous
mail.
>>>>>>>>>>
>>>>>>>>>> Links have no notion of a pool. A link could come
from messages in
>>>>>>>>>> different pools or it might not be clicked "inside
a message" at
>>>>>>>>>> all.
>>>>>>>>>>
>>>>>>>>>> Let me know what you think.
>>>>>>>>>>
>>>>>>>>>> Vassil
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 31, 2010 at 9:44 AM, Ethan Jewett <esjewett@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> [Changed subject to start a new thread. Was:
"New issues - a
>>>>>>>>>>> couple
>>>>>>>>>>> of
>>>>>>>>>>> blockers for 1.1 release"]
>>>>>>>>>>>
>>>>>>>>>>> That's correct. The "Popular messages" functionality
just keeps a
>>>>>>>>>>> counter of how many times a message has been
resent. If you look
>>>>>>>>>>> at
>>>>>>>>>>> the UserActor.scala, lines 197 & 198, you'll
see that the
>>>>>>>>>>> statistic
>>>>>>>>>>> "ResendStat" is incremented when a message is
resent, but only if
>>>>>>>>>>> the
>>>>>>>>>>> message is not in a pool. Then when we want to
find out what the
>>>>>>>>>>> most
>>>>>>>>>>> popular messages are, we ask the PopStatsActor
- for example in
>>>>>>>>>>> the
>>>>>>>>>>> "popular" method of UserSnip.scala - line 213.
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, the "LinkClicked is incremented
in
>>>>>>>>>>> UrlStore.scala
>>>>>>>>>>> -
>>>>>>>>>>> line 40. Here there is never a check to see if
the link came from
>>>>>>>>>>> a
>>>>>>>>>>> message in a pool. (This counter is used in the
"links" method in
>>>>>>>>>>> UserSnip.scala, after the "popular" method.)
>>>>>>>>>>>
>>>>>>>>>>> I think we need to check if a link came from
a pool before
>>>>>>>>>>> incrementing the counter, but in order to do
this we need to
>>>>>>>>>>> record
>>>>>>>>>>> what pool a link belonged to, so I think we need
to make pool
>>>>>>>>>>> part
>>>>>>>>>>> of
>>>>>>>>>>> the key of the UrlStore object and then populate
this field when
>>>>>>>>>>> a
>>>>>>>>>>> new
>>>>>>>>>>> link is created.
>>>>>>>>>>>
>>>>>>>>>>> Ethan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 31, 2010 at 8:11 AM, Imtiaz Ahmed
H E
>>>>>>>>>>> <in.imtiaz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> In the home when I type in a message sharing
it with one pool
>>>>>>>>>>>> and
>>>>>>>>>>>> click
>>>>>>>>>>>> resend it does not show up in Popular Messages.
But if the
>>>>>>>>>>>> message
>>>>>>>>>>>> is
>>>>>>>>>>>> public
>>>>>>>>>>>> it shows up on resend in Popular Pessages.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you explain. Haven't gotten to Popular
Links yet.
>>>>>>>>>>>>
>>>>>>>>>>>> Imtiaz
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message ----- From: "Ethan
Jewett"
>>>>>>>>>>>> <esjewett@gmail.com>
>>>>>>>>>>>> To: <esme-dev@incubator.apache.org>
>>>>>>>>>>>> Sent: Tuesday, August 31, 2010 11:37 AM
>>>>>>>>>>>> Subject: Re: New issues - a couple of blockers
for 1.1 release
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> The issue doesn't happen with Popular Messages,
only with
>>>>>>>>>>>> Popular
>>>>>>>>>>>> Links.
>>>>>>>>>>>>
>>>>>>>>>>>> I need to look into the implementation, but
I have a feeling the
>>>>>>>>>>>> Popular Links issue is going to be a headache.
I believe that
>>>>>>>>>>>> for a
>>>>>>>>>>>> given link there is no way to tell what message
it shows up in,
>>>>>>>>>>>> which
>>>>>>>>>>>> would make it impossible to tell if it is
a link from a pooled
>>>>>>>>>>>> message
>>>>>>>>>>>> or not. We may have to modify the data model
for storing links
>>>>>>>>>>>> to
>>>>>>>>>>>> flag
>>>>>>>>>>>> the ones that started out in a pooled message...
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding Pubsubhubbub, as Dick said, there's
no hurry. I don't
>>>>>>>>>>>> think
>>>>>>>>>>>> I'll be working on it over the next couple
of weeks.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for all your efforts!
>>>>>>>>>>>>
>>>>>>>>>>>> Ethan
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 31, 2010 at 4:20 AM, Imtiaz Ahmed
H E
>>>>>>>>>>>> <in.imtiaz@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re https://issues.apache.org/jira/browse/ESME-267
>>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't tried this but plan to fix
it right away.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tell me, is it only the links showing
up in 'Popular Links' or
>>>>>>>>>>>>> is
>>>>>>>>>>>>> that
>>>>>>>>>>>>> a
>>>>>>>>>>>>> problem with the message itself also
showing up in
>>>>>>>>>>>>> 'PopularMessages'
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like I'll never get going with
pubsubhubub ! First there
>>>>>>>>>>>>> was
>>>>>>>>>>>>> Dick's
>>>>>>>>>>>>> Release Planning mail with the pending
1.1 issues and now here
>>>>>>>>>>>>> are
>>>>>>>>>>>>> some
>>>>>>>>>>>>> more. Plan to get going after all 1.1
ending issues are
>>>>>>>>>>>>> resolved.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, Ethan it was your issue originally
and if you feel you
>>>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>> take
>>>>>>>>>>>>> it back again to push it to closure faster
or something please
>>>>>>>>>>>>> do,
>>>>>>>>>>>>> otherwise
>>>>>>>>>>>>> I'll re-start on it once 1.1 is done...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Imtiaz
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- Original Message ----- From: "Richard
Hirsch"
>>>>>>>>>>>>> <hirsch.dick@gmail.com>
>>>>>>>>>>>>> To: <
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>

Mime
View raw message