drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua" <ku...@apache.org>
Subject Re: New UDF for processing Twitter text
Date Mon, 23 Jul 2018 04:47:17 GMT
Saw this on Twitter... looked interesting. We've been thinking of revamping the site a bit
to help around folks with leveraging Drill in different applications... this could sit nicely
in a Cyber-related domain. 

As for the Ctrl+Enter / Meta-Enter  ... let's keep that as a separate PR. Always easier to
trace back any bugs that might sneak in. 
On 7/22/2018 3:08:13 PM, Bob Rudis <bob@rud.is> wrote:
Sir Givre!

I confess I hadn't (thought it might be a bit niche) but I can drop a Jira q abt it to see
if the team is interested in it. We (i.e. Rapid7) have a few more UDFs coming over the next
few weeks (hoping for a Drill 1.14.0 release with an updated/non-acient guava JAR before doing
so) as well, most of which are "cyber-oriented", so may be of interest for you especially.

I've got a PR coming to enable Cmd-Enter (Meta-Enter, I guess in non-Mac-speak) for query
submissions on the web admin query page so I cld bundle both together, too.


> On Jul 22, 2018, at 5:38 PM, Charles Givre wrote:
> Hey Bob,
> This looks pretty cool. Have you thought about submitting this as a PR for Drill? I’d
be happy to help with that.
> — C
>> On Jul 22, 2018, at 17:36, Bob Rudis wrote:
>> This post -- https://rud.is/b/2018/07/22/new-apache-drill-udf-for-processing-twitter-tweet-text/
-- introduces a UDF package for Drill with 5 functions for extracting [meta]data from Twitter
tweet text, including:
>> - hashtag extraction
>> - URL extraction
>> - @-mentions extraction
>> - reply-to (if it's a reply)
>> - tweet metadata, including if the tweet is "valid"
>> Tested with Drill 1.13.0.
>> Get it via:
>> - GitLab: https://gitlab.com/hrbrmstr/drill-twitter-text or
>> - GitHub: https://github.com/hrbrmstr/drill-twitter-text
>> Hopefully it's useful for some folks and don't hesitate to file issues or PRs for
problems/new features.
>> -Bob

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message