hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JB Rawlings <jrawli...@societyconsulting.com>
Subject RE: Sessionize using Hive
Date Tue, 09 Feb 2016 19:54:29 GMT
Ryan,

Thank you!  Our Amper framework is built around Hive so the Hive Windowing and Analytics functionality
fits very well.  I had used T-SQL ROW_NUMBER function previously but had no idea of the full
power of the SQL windowing function before you pointed me to the Hive documentation.  We have
only one small gap to implement sessionization which was the need to persist memory of the
last SessionID assigned when we're stitching records together, which we're in the process
of testing now using a "Persist" UDF.

Again, thanks for the pointers, they were right on point!

J. B. Rawlings
Senior Consultant
C: 425.233.1315
www.societyconsulting.com<http://www.societyconsulting.com/>

From: Ryan Harris [mailto:Ryan.Harris@zionsbancorp.com]
Sent: Friday, February 5, 2016 12:58 PM
To: user@hive.apache.org
Subject: RE: Sessionize using Hive

I don't have a textbook example to point you to, but you should be able to handle the problem
either using:
a) a UDF
b) an external TRANSFORM script in a language of your choosing
c) using Hive Windowing and Analytics functions (Lead/Lag, over, etc) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

All depending on the version of hive you are using as well as your programming language preferences.

From: JB Rawlings [mailto:jrawlings@societyconsulting.com]
Sent: Friday, February 05, 2016 1:53 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Sessionize using Hive

Ryan,

Can you perhaps point me to example(s) of how this is done in Hive?

Thanks,

J. B. Rawlings
Senior Consultant
C: 425.233.1315
www.societyconsulting.com<http://www.societyconsulting.com/>

From: Ryan Harris [mailto:Ryan.Harris@zionsbancorp.com]
Sent: Monday, February 1, 2016 6:19 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Sessionize using Hive

it can be done in hive...whether or not it is the "best choice" depends on whether or not
you have any other reason for your data to be in hive.
If you are wondering whether Hive is the best tool for accomplishing this one task....it would
probably be easier to do in pig.

From: JB Rawlings [mailto:jrawlings@societyconsulting.com]
Sent: Monday, February 01, 2016 7:11 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Sessionize using Hive

We are considering whether Hive is the best choice for "sessionizing" a set of data given
the following parameters:

*         Input data set:  A series of records with userID, startTimstamp, EndTimestamp, recordType,
etc.

*         Output data set:  Same records (no aggregation) with an added SessionId based on
time difference between endTime of previous record and startTime of current record plus satisfying
other criteria of the type current.recordType = previousRecordType.  As long as a series of
records meet the criteria for sessionization they would all have the same SessionId appended
to each record.

Briefly based on my analysis it appears that this problem would be better suited to MapReduce
using Java, but would be interested in hearing from those with more experience in this area.

J. B. Rawlings

________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.

Mime
View raw message