Return-Path: Delivered-To: apmail-hc-httpclient-users-archive@www.apache.org Received: (qmail 39689 invoked from network); 11 May 2010 20:46:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 May 2010 20:46:53 -0000 Received: (qmail 74534 invoked by uid 500); 11 May 2010 20:46:53 -0000 Delivered-To: apmail-hc-httpclient-users-archive@hc.apache.org Received: (qmail 74505 invoked by uid 500); 11 May 2010 20:46:53 -0000 Mailing-List: contact httpclient-users-help@hc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "HttpClient User Discussion" Delivered-To: mailing list httpclient-users@hc.apache.org Received: (qmail 74497 invoked by uid 99); 11 May 2010 20:46:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 20:46:53 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [92.42.190.144] (HELO ok2cons2.nine.ch) (92.42.190.144) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 20:46:44 +0000 Received: from [192.168.1.106] (178-83-227-183.dclient.hispeed.ch [178.83.227.183]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ok2cons2.nine.ch (Postfix) with ESMTPSA id B3E3A4BA23D for ; Tue, 11 May 2010 22:46:23 +0200 (CEST) Subject: Re: multiple sessions on single MultiThreadedHttpConnectionManager From: Oleg Kalnichevski To: HttpClient User Discussion In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Date: Tue, 11 May 2010 22:46:18 +0200 Message-ID: <1273610778.17618.8.camel@ubuntu> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Tue, 2010-05-11 at 08:20 -0700, Miguel De Anda wrote: > i have an app that needs to crawl for data from another server. the service > requires login, and it keeps track of the session via a cookie. i need the app > to be able to login as thousands of different users to crawl for data. > > the implementation is as like this: > > thread 1: > add a crawl job to the queue (threadpool of size N) > > job in queue (runs in threadpool): > if not logged in as this user yet, login > fetch data for user > index data for search > > > the dataset is huge. each user has thousands of records, and there are > thousands of users. i initially had a separate instance of httpclient with > it's own MultiThreadedHttpConnectionManager but as the number of users grew, > the number of open connections grew, and stay open. this eventually caused > "too many open files" errors. > > option 1: > a new non-threaded httpclient that i can terminate the connection to once i'm > done, manually manage cookies to prevent excessive logins. each job in the > threadpool will create a new instance and shut it down when it's done. i > implemented this, but couldn't figure out how to get it to shutdown fast. the > only thing i found was setSoTimeout and i set it to 2000 but if the app goes > really fast, it still doesn't prevent the lsof count from getting really high. > i'm afraid to lower it even more as it might have other side effects that i'm > not aware of. i also imagine this method creates lots of overhead. > > option 2: > create a global static MultiThreadedHttpConnectionManager and httpclient > instance and manually set the cookies to each httpget method that goes out. > this method seems to have the least overhead as it can continue to reuse the > connections and the maximum number of concurrent connections shouldn't really > go higher than the threadpool size. sounds perfect but i don't know how to > have it keep cookies separate. while using wireshark it seems that it's mixing > up the cookies from multiple users (the one it remembers on it's own, and the > one i add). > > here is a code snippet of what i have. i'm using httpclient v3.1 and i'm > afraid to upgrade it as this system is already in production and i'm just > trying to fix the "too many open files" issues. > > if (!loginCookies.isEmpty()) { > String cookieValue = ""; > for (String cookie : loginCookies.keySet()) { > if (!"".equals(cookieValue)) > cookieValue += "; "; > cookieValue += cookie + "=" + > loginCookies.get(cookie); > } > httpget.addRequestHeader("Cookie", > cookieValue); > } > > it results in this: > > GET > /index.aspx?class=Subscriber&proc=List&action=listbyname&format=xml&page=1&page_limit=200 > HTTP/1.1 > User-Agent: mycustomuseragent > Cookie: > TASEREVIDENCECOM=DxNZ9PZkP4xPwhwhKPenSSDjrRtyoSr/Jlyn3iBe6B06lQdGkt+aQ/VXgn0A9GcQVyZ2RscFOW486IJiRX9XH1+0B9k1dbKfQhQVYApXS6ATfuKyOS0l2Qv7OoM/DQrb6VIJveLFt+FkbVAV9BkmfW2IWlGu2g89NvBJFNn8OHNpsZXLUBf7ph/qsLyUTwOYENW7y3xGYD9jOoroiqgj/4joNhP5DYKr+hQKCIS3Gzflo2r+Nq1aTqZ2EMHOcqzIseqB+qtFWikp/UMcouM9ZPN/fOuiYGTm1vT3E/Kt1j9peUDvd1ZbUlJ+YxuZANBl9fwFzSEGOwSi3Bt5Ai8WybAHJiTg4oP5lmnkgc/E65CqQa5nqhhO0irS7bR6Jk1Jh+7h3WS3ytNyqUTRUZAnpWZG+lP6Efv0cPYz4Nc8Aupt8l8HxDeV/cHV2JpcL7HkUr/mNcT1lwaUPeBfO7Gc4S+AbpObj5I7y+sFQ1DU67O3Wj30UImB9M8i2RNTTYH9aQtRfR+a4ZbzMQfN2MV0le5/W5/7/QZbkLzUVJu1yzfgaaFfkjVgTbo3qywQgUJhcBJ+DgHOVYjYYZR81wJia/s06odo/mTuhtRA857ctwP2k+37J2Zf0NzWi0+3tEV6rg+2o2HtVDxMBmwWFVqLWrWb1bFC0XxBGVHZVtvsjgqyk7Q8YjqWZl0CtVJUa5Z2xEybWjolII1zFMRnTe7XZ3/qfq1txMJXkGlL0VJy9fVVzcMDZ3ZaYoO/+i5ly6fDm6fGO02/Wpk09gkFc7V45Q== > Host: 172.22.1.55 > Cookie: $Version=0; > TASEREVIDENCECOM=DxNZ9PZkP4xPwhwhKPenSSDjrRtyoSr/Jlyn3iBe6B06lQdGkt+aQ/VXgn0A9GcQVyZ2RscFOW486IJiRX9XH1+0B9k1dbKfQhQVYApXS6ATfuKyOS0l2Qv7OoM/DQrb6VIJveLFt+FkbVAV9BkmfW2IWlGu2g89NvBJFNn8OHNpsZXLUBf7ph/qsLyUTwOYENW7y3xGYD9jOoroiqgj/4joNhP5DYKr+hQKCIS3Gzflo2r+Nq1aTqZ2EMHOcqzIseqB+qtFWikp/UMcouM9ZPN/fOuiYGTm1vT3E/Kt1j9peUDvd1ZbUlJ+YxuZANBl9fwFzSEGOwSi3Bt5Ai8WybAHJiTg4oP5lmnkgc/E65CqQa5nqhhO0irS7bR6Jk1Jh+7h3WS3ytNyqUTRUZAnpWZG+lP6Efv0cPYz4Nc8Aupt8l8HxDeV/cHV2JpcL7HkUr/mNcT1lwaUPeBfO7Gc4S+AbpObj5I7y+sFQ1DU67O3Wj30UImB9M8i2RNTTYH9aQtRfR+a4ZbzMQfN2MV0le5/W5/7/QZbkLzUVJu1yzfgaaFfkjVgTbo3qywQgUJhcBJ+DgHOVYjYYZR81wJia/s06odo/mTuhtRA857ctwP2k+37J2Zf0NzWi0+3tEV6rg+2o2HtVDxMBmwWFVqLWrWb1bFC0XxBGVHZVtvsjgqyk7Q8YjqWZl0CtVJUa5Z2xEybWjolII1zFMRnTe7XZ3/qfq1txMJXkGlL0VJy9fVVzcMDZ3ZaYoO/+i5ly6fDm6fGO02/Wpk09gkFc7V45Q==; > $Path=/ > > > i set the first Cookie, the second is set by httpclient. > > thanks, > miguel de anda > Miguel, You should be using a separate HttpState per individual user and let HttpClient manage cookies. Oleg PS: 3.1 is effectively end of life. It will become more and more difficult to get any help for it on this list --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org For additional commands, e-mail: httpclient-users-help@hc.apache.org