storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "唐思成" <>
Subject How to implememt distinct count in trident topolgy?
Date Mon, 14 Jul 2014 08:14:04 GMT
Use case is simple, count unique user in for in a window slide, and I found the common solutions
over the Internet is to use HashSet to fliter the duplicated user,like this 

public class Distinct extends BaseFilter {
    private static final long serialVersionUID = 1L;
    private Set<String> distincter = Collections.synchronizedSet(new HashSet<String>());
    public boolean isKeep(TridentTuple tuple) {
        String id = this.getId(tuple);
        return distincter.add(id);
    public String getId(TridentTuple t) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < t.size(); i++) {
        return sb.toString();

However, the HashSet is stored in memory, when the data grows to a very large level, I think
it will cause a OOM.
So is there a scalable solution?


View raw message