storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "唐思成" <jadetan...@qq.com>
Subject How to implememt distinct count in trident topolgy?
Date Mon, 14 Jul 2014 08:14:04 GMT
Use case is simple, count unique user in for in a window slide, and I found the common solutions
over the Internet is to use HashSet to fliter the duplicated user,like this 

public class Distinct extends BaseFilter {
    private static final long serialVersionUID = 1L;
    private Set<String> distincter = Collections.synchronizedSet(new HashSet<String>());
    @Override
    public boolean isKeep(TridentTuple tuple) {
        String id = this.getId(tuple);
        return distincter.add(id);
    }
    public String getId(TridentTuple t) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < t.size(); i++) {
            sb.append(t.getString(i));
        }
        return sb.toString();
    }
}

However, the HashSet is stored in memory, when the data grows to a very large level, I think
it will cause a OOM.
So is there a scalable solution?

2014-07-14 



唐思成 
Mime
View raw message