incubator-gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Gora CassandraStore is not thread safe?
Date Wed, 26 Oct 2011 09:25:46 GMT
Hi,

I think were at a stage where you're right Chris. Further to Alexis' commit,
I feel that this has been bottomed out. Further to this, we are now at
Cassandra version 0.8.1.
Are you happy with this Alexis?

Thanks

On Sat, Oct 1, 2011 at 6:33 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Great work, thanks Alexis! Maybe it's time to close out GORA-22 then
> and leave any future things that crop up as new issues.
>
> Cheers,
> Chris
>
> On Oct 1, 2011, at 4:07 AM, Alexis wrote:
>
> > Last revision 1177960 should now fix the thread-safe issue:
> >
> >
> http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java?r1=1177960&r2=1177959&pathrev=1177960
> >
> > Please comment on https://issues.apache.org/jira/browse/GORA-22 if
> > there is anything else.
> >
> > Alexis
> >
> > On Sun, Sep 4, 2011 at 10:43 AM, Alexis <alexis.detreglode@gmail.com>
> wrote:
> >> Hi,
> >>
> >> I submitted the patch for peer review by just attaching it to the
> >> issue: https://issues.apache.org/jira/browse/GORA-22
> >>
> >> See this article about concurreny and hashmap to read about the topic:
> >> http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html
> >>
> >> I ended up calling toArray over the key set to get around the
> >> ConcurrentModificationException thrown by defaut with
> >> java.util.HashMap when iterating over the keys.
> >>
> >> Not that many times I encountered Cassandra crashes and Hector
> >> exceptions (usually because of GC triggered by Cassandra daemon?) with
> >> my poor 5-year-old laptop while running Nutch parse command, which is
> >> very CPU and IO intensive. In mapred-site.xml, see attached config, it
> >> worked out when you make the read batch reasonable (400 rows at a
> >> time) and try to separate it from the write batch (for example 843
> >> written rows per batch) so that they don't happen simultaneously.
> >>
> >>
> >> Alexis
> >>
> >> On Tue, Aug 30, 2011 at 1:24 AM, Alexis <alexis.detreglode@gmail.com>
> wrote:
> >>> Hi Tom,
> >>>
> >>> Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious
> >>> bug. I must say there is not a very active development and testing on
> >>> Gora & Nutch, but at least there is some.
> >>>
> >>>
> >>> 1. As regards your ConcurrentModification issue, it looks like it
> >>> happens when flushing the store. From your exception stacktrace:
> >>> (Line 192 in org.apache.gora.cassandra.store.CassandraStore)
> >>>    for (K key: this.buffer.keySet()) {
> >>>
> >>> while there are other threads adding new keys to the HashMap:
> >>>
> >>> (Line 266)
> >>>    this.buffer.put(key, p);
> >>>
> >>> "it is not generally permissible for one thread to modify a Collection
> >>> while another thread is iterating over it."
> >>>
> >>> Let me try to reproduce the bug and fix it with this in mind:
> >>> How about introducing some mutex / lock mechanism witch
> >>> java.util.concurrent.locks.Lock or easier, using a thread-safe
> >>> implementation such as java.util.concurrent.ConcurrentHashMap?
> >>>
> >>>
> >>> 2. Regarding the OutOfMemory error, maybe decreasing the flushing
> >>> frecuency as described here?
> >>>
> http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html#I_O_Frequency
> >>>
> >>> I like to use the jvisualvm utility from the JDK that monitors the
> >>> memory usage and tells you how this evolves during the execution of
> >>> the class...
> >>>
> >>> Alexis
> >>>
> >>> On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <tdavidson@covario.com>
> wrote:
> >>>> Hi Lewis,
> >>>>
> >>>> I was running Nutch deployed with a dedicated Cassandra cluster.
> Frankly, I have given up on using Nutch 2 at this time as it seems highly
> unstable and not really in active development. Your effort to address this
> is encouraging. Because Nutch uses multithreading in the fetchers, I was
> getting ConcurrentModification errors and OutOfMemory errors on a regular
> basis in the CassandraStore. As far as I recall, the caching/flushing
> implementation is just not thread safe. If the CassandraStore caching was
> completely removed it may work, but would probably not be very efficient.
>  If I were to fix this class, I would try to rewrite it to use Hector
> batched mutations instead.
> >>>>
> >>>> Tom
> >>>>
> >>>> -----Original Message-----
> >>>> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> >>>> Sent: Monday, August 29, 2011 1:41 PM
> >>>> To: gora-dev@incubator.apache.org; dev@nutch.apache.org
> >>>> Subject: Re: Gora CassandraStore is not thread safe?
> >>>>
> >>>> Hi Tom,
> >>>>
> >>>> Apologies for cross posting, this would not usually be the case but
> I'm
> >>>> hoping that if any results come from the thread then both communities
> can
> >>>> benefit.
> >>>>
> >>>> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0
> and
> >>>> Gora 0.2 myself and seem to be having some nasty problems.
> >>>>
> >>>> Some questions for you
> >>>>
> >>>> 1) How are you running Nutch local or deploy?
> >>>> 2) How are you running Cassandra, local or deployed in a cluster?
> >>>>
> >>>> The obvious thoughts are that this is a bug and that there are
> >>>> method(s)/object(s) which are not safe.
> >>>>
> >>>> Have you gotten any further with this?
> >>>>
> >>>> Lewis
> >>>>
> >>>>
> >>>> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <tdavidson@covario.com>
> wrote:
> >>>>
> >>>>> Has anyone tested the CassandraStore in gora 0.2 using multiple
> threads?
> >>>>>  The nutch 2 fetcher architecture has many threads writing to one
> >>>>> GoraRecordWriter and I am getting concurrent modification errors
like
> below.
> >>>>>
> >>>>> Caused by: java.util.ConcurrentModificationException
> >>>>>               at
> java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> >>>>>               at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> >>>>>               at
> >>>>>
> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192)
> >>>>>               at
> >>>>>
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Lewis*
> >>>>
> >>>
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message