hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JB Rawlings <>
Subject Sessionize using Hive
Date Tue, 02 Feb 2016 02:10:34 GMT
We are considering whether Hive is the best choice for "sessionizing" a set of data given the
following parameters:

*         Input data set:  A series of records with userID, startTimstamp, EndTimestamp, recordType,

*         Output data set:  Same records (no aggregation) with an added SessionId based on
time difference between endTime of previous record and startTime of current record plus satisfying
other criteria of the type current.recordType = previousRecordType.  As long as a series of
records meet the criteria for sessionization they would all have the same SessionId appended
to each record.

Briefly based on my analysis it appears that this problem would be better suited to MapReduce
using Java, but would be interested in hearing from those with more experience in this area.

J. B. Rawlings

View raw message