SDK code coverage with JaCoCo

>> Monday, March 14, 2011

Today, I had to pleasure of closing  bug 241254 which has been open for over two years.  It was a request to run a code coverage tool against the SDK during the build.

We used JaCoCo which is an EPL licensed code coverage library brought to you by the same team that created EclEmma.  Why did we chose JaCoCo?

  1. Help from JaCoCo committers. At EclipseCon 2010, Marc Hoffman (JaCoCo committer) indicated to Olivier Thomann that he was interested in helping us to implement JaCoCo in our build.  
  2. Performance.  Other code coverage tools made the JUnit tests take significantly longer.  Our tests take long enough to finish as it is!  Olivier tested JaCoCo and found that it only took the JDT core tests about 2% longer to complete with JaCoCo enabled. Our 68,000 JUnit tests take about seven hours to run in parallel on three platforms.  The JDT Core tests comprise 37,000 of these tests, thus we estimated the impact of running  JaCoCo during the build would only increase the test run by about eight minutes.
  3. Classes are instrumented on the fly and and remain untouched after the code coverage data is generated.  Other code coverage tools require you to instrument the classes for code coverage. Given the huge code base of the SDK, this didn't seem like a feasible alternative.
  4.  JaCoCo reports look awesome.

How did we implement JaCoCo in the build?
  1. Olivier tested an earlier version of JaCoCo and worked with the JaCoCo committers to address some issues.  Once these issues were resolved, we opened IPZilla bugs to consumed JaCoCo as part of the Indigo release.
  2. The Eclipse SDK JUnit tests are run with the assistance of a project called org.eclipse.test.    An Ant script called libary.xml is used to invoke the test suites for each test bundle.  We added the JaCoCo agent and Ant jars to a new library directory of this project.These jars are then added to the extraVMargs property when the tests are run. The libary.xml also specifies the output directory of the coverage data (*.exec files). 
  3. When the tests are run, corresponding exec files are created for each test bundle.
  4. We modified the test.xml that is used to run all the tests suites to generate a code coverage report upon completion of the JUnit tests. All the *.exec files are merged into a single file.  A coverage report for each bundle in the SDK is run.  (excluding Orbit and doc bundles as well as fragments). The Ant for the report generation looks like this:

One problem that we encountered was that that JaCoCo report task was running out of heap space when the source files were passed to it as as zipfileset instead of a fileset.  This problem occurred especially when attempting to generate source reports for large bundles such as jdt.core. We overcame this problem by unzipping the source bundles into a temporary location and passing the fileset.

In addition to code coverage percentages, the report generates html of the source files that highlights the code coverage within the code itself.  For example:

All this information consumes about 1GB per build.  (Sorry Denis and Matt).  Thus I have only enabled it for our weekly integration builds, and the data will not be sent to the mirrors.

The code coverage reports  for 3.7M6 are now available. Check out your favourite Eclipse SDK bundle!

JaCoCo project
JaCoCo mission and design goals
JaCoco Ant tasks
Bug 241254 - run code coverage tool during build
Marc Hoffmans's JaCoCo talk at Eclipse Summit Europe 2010


Thoughts on Making Software

>> Tuesday, March 01, 2011

My husband (aka Mr. Releng) and I were driving home from his office party in mid-December and had the following conversation.

Me:  A publicist from O'Reilly emailed me today and asked if I would like a free copy of Making Software.

MR:  Huh.  Why did they offer you a free book?

Me:  I don't know. I mentioned to one of my Eclipse colleagues on Twitter  that I thought it would be an interesting book to read.  Also, one of the editors is Greg Wilson, who's also the editor of the upcoming The Architecture of Open Source Applications.  (I contributed a chapter on Eclipse to AOSA).

MR:  Next time, mention on Twitter that you'd like a free car. What's the book about?

MR: Free car from Twitter?  Don't get your hopes up.  The book investigates the evidence that supports common software engineering practices. In medicine, a clinical trial is conducted to see if a new drug has a statistically significant effect compared to placebo or an existing treatment. The same evidence based principles can be applied to software engineering to determine if TDD or Agile methods are actually effective.  I've always thought that there was a lot of shouting but scant evidence to support different software development practices.

MR:  Yeah, undergraduate computer science is engineering.  Graduate level computer science is math.

From xkcd

As you may have guessed, I'm married to a mathematician.

In addition to being a fan of open source, I also enjoy reading about different scientific disciplines.  I love reading scienceblogs and have read many great books over the past year about evidence based medicine. So when I heard that there was a book available that examined what software engineering practices actually work, I was intrigued.

In any case, I finished reading it recently.   It was very interesting and I learned a lot!  The book is split into two sections.  The first section dealt with different research methods.  For instance,

  • How to conduct a systematic literature review
  • Empirical methods that can be applied to software engineering studies and why software engineering is inherently difficult to measure quantatively.
  • How to decide on the papers to include in a meta-study. 

From xkcd

The second half of the book examines the evidence for different questions in software engineering.  For instance:

  • How do you measure programmer productivity?  Are some developers really an order of magnitude more productive than their team mates?
  • Does test driven development result in better software?
  • What's the cost of discovering a bug early in the development cycle versus after the product has shipped?
  • What working environment (cubicles/offices/open concept etc) are the most productive for software developers?
  • Is there a measurable difference in software quality between open and closed source software?
  • What are the reasons for the low proportion of women working in the software industry?
  • Does the use of design patterns improve software quality?
  • How do mine data from open source repositories for your own studies (Chapter 27 uses the Eclipse project as an example :-)
All in all, I found it a very interesting book that examined the actual empirical evidence to support or refute some of the sacred cows in software engineering. I think this this is a refreshing step forward for our profession.  If there aren't numbers to support that the way we work is effective, shouldn't we alter our path to use better methods?

    Further reading
    Software Carpentry: Empirical results

    As an aside, some great books on evidence based medicine are
    Bad Science by Ben Goldacre
    Trick or Treatment by Simon Singh and Edzard Ernst
    Snake Oil Science by R. Barker Bausell


        © Blogger template Simple n' Sweet by 2009

      Back to TOP