hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: multiple sessions on single MultiThreadedHttpConnectionManager
Date Tue, 11 May 2010 20:46:18 GMT
On Tue, 2010-05-11 at 08:20 -0700, Miguel De Anda wrote:
> i have an app that needs to crawl for data from another server. the service 
> requires login, and it keeps track of the session via a cookie. i need the app 
> to be able to login as thousands of different users to crawl for data.
> the implementation is as like this:
> thread 1:
> add a crawl job to the queue (threadpool of size N)
> job in queue (runs in threadpool):
> if not logged in as this user yet, login
> fetch data for user
> index data for search
> the dataset is huge. each user has thousands of records, and there are 
> thousands of users. i initially had a separate instance of httpclient with 
> it's own MultiThreadedHttpConnectionManager but as the number of users grew, 
> the number of open connections grew, and stay open. this eventually caused 
> "too many open files" errors. 
> option 1:
> a new non-threaded httpclient that i can terminate the connection to once i'm 
> done, manually manage cookies to prevent excessive logins. each job in the 
> threadpool will create a new instance and shut it down when it's done. i 
> implemented this, but couldn't figure out how to get it to shutdown fast. the 
> only thing i found was setSoTimeout and i set it to 2000 but if the app goes 
> really fast, it still doesn't prevent the lsof count from getting really high. 
> i'm afraid to lower it even more as it might have other side effects that i'm 
> not aware of. i also imagine this method creates lots of overhead.
> option 2:
> create a global static MultiThreadedHttpConnectionManager and httpclient 
> instance and manually set the cookies to each httpget method that goes out. 
> this method seems to have the least overhead as it can continue to reuse the 
> connections and the maximum number of concurrent connections shouldn't really 
> go higher than the threadpool size. sounds perfect but i don't know how to 
> have it keep cookies separate. while using wireshark it seems that it's mixing 
> up the cookies from multiple users (the one it remembers on it's own, and the 
> one i add).
> here is a code snippet of what i have. i'm using httpclient v3.1 and i'm 
> afraid to upgrade it as this system is already in production and i'm just 
> trying to fix the "too many open files" issues.
>                         if (!loginCookies.isEmpty()) {
>                                 String cookieValue = "";
>                                 for (String cookie : loginCookies.keySet()) {
>                                         if (!"".equals(cookieValue))
>                                                 cookieValue += "; ";
>                                         cookieValue += cookie + "=" + 
> loginCookies.get(cookie);
>                                 }
>                                 httpget.addRequestHeader("Cookie", 
> cookieValue);
>                         }
> it results in this:
> GET 
> /index.aspx?class=Subscriber&proc=List&action=listbyname&format=xml&page=1&page_limit=200

> HTTP/1.1
> User-Agent: mycustomuseragent
> Cookie: 
> TASEREVIDENCECOM=DxNZ9PZkP4xPwhwhKPenSSDjrRtyoSr/Jlyn3iBe6B06lQdGkt+aQ/VXgn0A9GcQVyZ2RscFOW486IJiRX9XH1+0B9k1dbKfQhQVYApXS6ATfuKyOS0l2Qv7OoM/DQrb6VIJveLFt+FkbVAV9BkmfW2IWlGu2g89NvBJFNn8OHNpsZXLUBf7ph/qsLyUTwOYENW7y3xGYD9jOoroiqgj/4joNhP5DYKr+hQKCIS3Gzflo2r+Nq1aTqZ2EMHOcqzIseqB+qtFWikp/UMcouM9ZPN/fOuiYGTm1vT3E/Kt1j9peUDvd1ZbUlJ+YxuZANBl9fwFzSEGOwSi3Bt5Ai8WybAHJiTg4oP5lmnkgc/E65CqQa5nqhhO0irS7bR6Jk1Jh+7h3WS3ytNyqUTRUZAnpWZG+lP6Efv0cPYz4Nc8Aupt8l8HxDeV/cHV2JpcL7HkUr/mNcT1lwaUPeBfO7Gc4S+AbpObj5I7y+sFQ1DU67O3Wj30UImB9M8i2RNTTYH9aQtRfR+a4ZbzMQfN2MV0le5/W5/7/QZbkLzUVJu1yzfgaaFfkjVgTbo3qywQgUJhcBJ+DgHOVYjYYZR81wJia/s06odo/mTuhtRA857ctwP2k+37J2Zf0NzWi0+3tEV6rg+2o2HtVDxMBmwWFVqLWrWb1bFC0XxBGVHZVtvsjgqyk7Q8YjqWZl0CtVJUa5Z2xEybWjolII1zFMRnTe7XZ3/qfq1txMJXkGlL0VJy9fVVzcMDZ3ZaYoO/+i5ly6fDm6fGO02/Wpk09gkFc7V45Q==
> Host:
> Cookie: $Version=0; 
> TASEREVIDENCECOM=DxNZ9PZkP4xPwhwhKPenSSDjrRtyoSr/Jlyn3iBe6B06lQdGkt+aQ/VXgn0A9GcQVyZ2RscFOW486IJiRX9XH1+0B9k1dbKfQhQVYApXS6ATfuKyOS0l2Qv7OoM/DQrb6VIJveLFt+FkbVAV9BkmfW2IWlGu2g89NvBJFNn8OHNpsZXLUBf7ph/qsLyUTwOYENW7y3xGYD9jOoroiqgj/4joNhP5DYKr+hQKCIS3Gzflo2r+Nq1aTqZ2EMHOcqzIseqB+qtFWikp/UMcouM9ZPN/fOuiYGTm1vT3E/Kt1j9peUDvd1ZbUlJ+YxuZANBl9fwFzSEGOwSi3Bt5Ai8WybAHJiTg4oP5lmnkgc/E65CqQa5nqhhO0irS7bR6Jk1Jh+7h3WS3ytNyqUTRUZAnpWZG+lP6Efv0cPYz4Nc8Aupt8l8HxDeV/cHV2JpcL7HkUr/mNcT1lwaUPeBfO7Gc4S+AbpObj5I7y+sFQ1DU67O3Wj30UImB9M8i2RNTTYH9aQtRfR+a4ZbzMQfN2MV0le5/W5/7/QZbkLzUVJu1yzfgaaFfkjVgTbo3qywQgUJhcBJ+DgHOVYjYYZR81wJia/s06odo/mTuhtRA857ctwP2k+37J2Zf0NzWi0+3tEV6rg+2o2HtVDxMBmwWFVqLWrWb1bFC0XxBGVHZVtvsjgqyk7Q8YjqWZl0CtVJUa5Z2xEybWjolII1zFMRnTe7XZ3/qfq1txMJXkGlL0VJy9fVVzcMDZ3ZaYoO/+i5ly6fDm6fGO02/Wpk09gkFc7V45Q==;

> $Path=/
> i set the first Cookie, the second is set by httpclient.
> thanks,
> miguel de anda


You should be using a separate HttpState per individual user and let
HttpClient manage cookies. 


PS: 3.1 is effectively end of life. It will become more and more
difficult to get any help for it on this list  

To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

View raw message