November 22, 2010

Maven Tip: Finding Default Project Settings

Originally published 25 Nov 2009

Have you ever had to override a default project setting in Maven and didn't know the exact setting?  A google search could do the trick, but here I'll describe another way.

As an example, suppose you're converting an existing Ant build that doesn't follow the standard Maven project structure.  Maybe your project puts its source and test code right under src and test.

You start the conversion by creating a bare bones POM:

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany</groupId>
  <artifactId>someproject</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.7</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Then, you think: "Okay.  My project structure doesn't follow the defaults.  I'll want to change this eventually, but I want to see if this POM is okay.  I don't like to go long periods of time without seeing something working.  How do I tell Maven to change where it should look for source and test code?"

By running mvn help:effective-pom, you can find this quickly:

<project xmlns="http://maven.apache.org/POM/4.0.0"
  ...
  <build>
    <sourceDirectory>.../someproject/src/main/java</sourceDirectory>
    <testSourceDirectory>.../someproject/src/test/java</testSourceDirectory>
  ...

"Oh yeah.  There are the settings.  All I need to do is change this in my POM"

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany</groupId>
  <artifactId>someproject</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.7</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
  <build>
    <sourceDirectory>src</sourceDirectory>
    <testSourceDirectory>test</testSourceDirectory>
  </build>  
</project>

The help:effective-pom goal is a trick I use to quickly look up a project setting as well as to see all the settings together for a given project or module.

Java vs. Scala Ceremony

Originally published 17 Nov 2009

Man, every time I go to write some Java code these days, I just cringe at all the effort.

public class Person {
    private final String firstName;
    private final String lastName;
    private final int age;

    public Person(String firstName,
                  String lastName,
                  int age) {
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    
    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public int getAge() {
        return age;
    }
}

Sure, it didn't take me too long to write the Java code because of the handy dandy source code generation features in my IDE. But the real problem is the maintenance. I or someone on my team will have to read this cluttered code many more times than the one time I wrote it.

Take a look at the equivalent Scala code:

class Person(val firstName: String,
             val lastName: String,
             val age: Int)

Which version would you rather maintain?

97 Things Every Programmer Should Know

Originally published 8 Sep 2009

The web site for the 3rd book in the 97 Things series recently went public.  This one is targeted at programmers.  There are 88 contributions that have been edited (mine is #57) and the exact 97 entries that will go into the final book have yet to be identified.

Some of our industry's thought leaders have already made contributions.  If you'd like to contribute and possibly see your name listed alongside them, see How to Become a Contributor.

Risk Homing Metrics

Originally published 28 May 2009

Recently, I attended a talk by Neal Ford.  He was talking about a couple of metrics you can combine to identify areas for refactoring: cyclomatic complexity and afferent coupling.  He used the ckjm tool to determine what classes were both complex and used by lots of other classes.  Start refactoring these was his recommendation.

I immediately thought of Crap4j; another tool that combines a set of metrics for identifying the riskiest areas of the code base to maintain.  Crap4j implements the CRAP metric, which combines cyclomatic complexity and test coverage, but at a method level.  If a method is both complex and not very well tested, then it's risky to change.

This all led me to a new, more ultimate set of metrics that could be combined to home in on the riskiest areas of a code base:
  1. Code coverage
  2. Cyclomatic complexity
  3. Code execution frequency in the real world
Complex code, executed very often, with low test coverage.

For practical purposes, I like sticking with the granularity of a method.  I can use tools like Cobertura to find the test coverage and JavaNCSS to find the cyclomatic complexity.  (Isn't cyclomatic complexity best applied to the method level anyway?)

That just leaves me with the which-methods-execute-the-most-in-production problem.  This is hard because I can run the other two as part of a continuous build, but I won't be able to identify the hot methods until I get to production and measure true usage.  So the static and dynamic metrics will always be out of sync at some level even if I could get estimated usage through continuous functional and higher level testing.

So what can I do?  I want this to run as part of a continuous build to get feedback as soon as possible that a method is getting a little risky (or with a legacy code base, is already risky).  So, I'll fall back on afferent coupling for practicality.  But afferent coupling is typically measured at a package level.  The finest granularity that I'm aware of with current tools is measuring at a class level with ckjm.  That's a good starting point for identifying highly used code.

So here's my plan.  Use the CRAP metric to find the methods and then factor in the afferent coupling of the those methods' classes to give a prioritized list of methods to go clean up.  I'll see how this goes and consider factoring in method execution frequency from higher level testing runs.

97 Things Every Software Architect Should Know

Originally published 2 Mar 2009

I recently received my copy of 97 Things Every Software Architect Should Know.  I'm honored to be included among some great minds in the software development field.  Richard Monson-Haefel, Neal Ford, Udi Dahan, Kevlin Henney, and Gregor Hohpe are just a few of the thought leaders who made contributions.

My contribution was about starting with a Walking Skeleton and building it out.  I was lucky enough to learn this early in my career while working alongside a gentleman named Bernie Thuman.  Bernie applied this technique as he led a team developing a 3-tiered distributed enterprise application.

Anyway, I'm glad to be a part of this project.  The book is filled with pearls of wisdom and is a must read for any software architect or really any professional looking to develop better software.

Git And Continuous Integration

Originally published 13 Nov 2008

Subversion is the de facto version control system of the day, but Git is the rising star.  More and more people are using Git, but I'm a bit concerned about its effects on Continuous Integration (CI).

First some background...

Subversion follows a centralized repository model.  So does CVS and many others.  There's one server that everybody commits to.  Git can follow suit or be configured in a distributed fashion.  In a fully distributed model, each developer has a private copy of the repository.  Mercurial is an example of a distributed version control system.

My concern is really with the distributed model.  One of the appeals of this model, and particularly with having your own private repository, is the ability to experiment and check-in/commit at a much finer-grained rate than with a centralized model.  With a centralized model, you need to be more careful of your commits.  Otherwise, you could break the build and everybody else.  So consequently, you commit less frequently here.

But what's better in the context of CI?  I have to be honest and admit that I've never used the distributed model on a project, so I'm being theoretical here.  My gut feel is that with a distributed model, integration will be less frequent.  I believe people will check in to private repositories more often, but will push those changes to the main integration branch less frequently than people committing straight to the main branch in a centralized model.  In a centralized model, integration is in your face.  You can't commit without thinking about it.  In a distributed model, you can get carried away in your own little world.  And we've learned in the agile community that we should be integrating early and often, haven't we?

I compare the distributed model with multiple repositories to a centralized model with multiple branches.  If you can accept this analogy, you can probably see how integration would be less frequent.  I fear with Git that people will adopt more of a distributed model and thus, CI will suffer.  This is pure speculation on my part, and I'm interested to see how things play out.

So my point here is to be mindful of using lots of repositories with Git and its potential negative consequences on CI.

For some more comments on using Git and CI, particularly on larger teams, see this.

Essence Over Ceremony in Unit Testing

Originally published 9 Aug 2008

There's been some talk recently about essence and ceremony, particularly regarding JVM programming languages. My first remembrance of this discussion was reading this blog entry from Stu Halloway.  Java is a ceremonious language because there's a lot of extra, requried typing that blurs the essence of what you're trying to communicate in the code.

I want to talk about essence vs. ceremony in unit testing.

If I give you this:

OrderTest
    testCalculatePrice
        1 20.00
        2  5.00
          30.00

Can you tell what this is about? This is a specification of the calculatePrice() method on an Order object, where there are two LineItem objects and the expected price is $30.00. Did you figure that out without the explanation? This is essence.

Instead, what you often see is something like this:

public class OrderTest {
    @Test
    public void testCalculatePrice() {
        Order order = new Order();
        Product product1 = new Product(20.00);
        LineItem lineItem1 = new LineItem(1, product1);
        order.add(lineItem1);
        Product product2 = new Product(5.00);
        LineItem lineItem2 = new LineItem(2, product2);
        order.add(lineItem2);
        assertEquals(30.00, order.calculatePrice());
    }
}

Here, you're blinded with irrelevant details that obscure what this test is all about. The essence of the specification is buried in details. What you want is clarity at this level.

You can factor out the obscurity with something like this:

public class OrderTest {
    private Order order = new Order();

    @Test
    public void testCalculatePrice() {
        lineItem(1, 20.00);
        lineItem(2, 5.00);
        expectPrice(30.00);
    }
}

That's gets us about as close as we can in Java.

A consequence of this approach is that you have more methods overall, but I've always felt that clarity trumps in specifications. Others have agreed.

What's interesting with these helper methods is that I can change how I implement them and not change the higher level, essence methods at all. For example, in lineItem() I can inject real production objects (as I did in the original example with LineItem and Product), or I can inject a mock or stub LineItem.

So my suggestion is when you are specifying these high level methods, attempt to only show the essence of the test and factor out the ceremony.

Quotes from No Fluff Just Stuff

Originally published 28 Apr 2008

I just returned from this past weekend's No Fluff Just Stuff conference in Reston, Virginia.  As always (this was my fifth show), I had a great time conversing with my peers and the excellent speakers.

The following are some memorable quotes/paraphrases in chronological order.  Some disclaimers: 1) you should always take quotes in context and 2) I could have misinterpreted the intent of the speaker.  I'll try to provide some comments for additional context.

David Hussman: "You need to respect the pomodoro," quoting an Italian manager of a team who used a tomato timer.  They would work hard for 25 minutes, then take a 5 minute break.  When one developer wanted to keep working...

David Hussman: "Sometimes stand-up meetings turn into stand-there meetings."  This is when nobody is saying anything and we're just going through the motions.  I've definitely experienced this.

Venkat Subramaniam: In the context of Guice, when comparing annotation and xml configuration, "Both XML and annotations are evil.  Annotations are the lesser evil."  My opinion is that annotations are a good addition to the Java language, but can certainly be overused.  This may be what Venkat was talking about.

Andrew Glover: "Some people say, 'Behavior-Driven Development (BDD) is Test-Driven Development (TDD) done right', but I say BDD is Customer focused TDD."  BDD is closer to the customer's language.

Neal Ford: "If there were a book written today about real world software development, it would be called Accidental Complexity, Ceremony Over Essence, Ensuring Your Job Security written by the Enterprise Architecture Team."  Neal had tons of good quotes from his excellent keynote.

Venkat Subramaniam: "We constantly create whack-a-mole systems.  Fix the code in one place and this other, seemingly unrelated part of the code breaks."  Venkat was comparing software development with the children's game.

Brian Sam-Bodden : "Remember when you had to go outside your IDE to access your version control system?  And then, that functionality was integrated with Eclipse?  That's what Mylyn does for task management."  Less context switching keeps you focused.

Jared Richardson or Neal Ford: "Ted Neward is the Las Vegas of speakers.  You have to go see him at least once."  I always enjoy Ted's sessions.

Ted Neward: Sarcastically, "The code is perfect when it leaves my desk.  Something mystical happens afterwards that introduces bugs."

Ted Neward: "The Teddy Bear Technique has the added advantage of keeping people away, particularly when you're caught talking to it by a manager."  The Teddy Bear Technique is the act of explaining your problem to a stuffed animal, and then suddenly solving it as you start to question your assumptions.

Ted Neward: "Oh Great Debugger, tell me where the bug is."

Jared Richardson: "Take a shortcut here, another shortcut there - you get to a point where you're so busy paying interest, you don't have time to pay the principal."  This was in his talk about credit card software development (Technical Debt ).

Jared Richardson: Quoting Watts Humphrey, "Developers are caught in a victim's mentality."  We never think it's our fault, it's always somebody else's.

Expert Panel: When asked for two words about SOA, some of the phrases were, "WSDL sucks", "Consider REST ", "Overly complex"

Expert Panel: When asked for two words about closures, one of the experts said, "Use Groovy"

Mark Richards: "Java has become over-bloated and way past its usefulness as a general purpose language."  I think he was the one who said, "Use Groovy."

Jeff Brown: "If we could start [Java] from scratch today, Java would look like Groovy.  Groovy is the preferred general purpose language."

Jay Zimmerman, Symposium Director: "If you want management approval to use Groovy, don't call it Groovy, call it Next Generation Java."  Managers freak out when they hear you want to use something called 'Groovy.'

Ted Neward: "Your peer group is more important than any tool or book we can recommend."  Great advice.

David Bock: "The existence of the system changes the requirements of the system."  Dave was talking about how seeing the system run changes the customer's mind of what he really wants.

David Bock: "Never believe someone who tells you he's 90% done."  Ever notice how that last 10% takes a long time?

I look forward to next time.

Hudson CI Game Plugin

Originally published 17 Apr 2008

redsolo has implemented a version of The Continuous Integration Build Game.  It's a Hudson plugin and is described here .  Way to go redsolo!

Spring Configuration Per Environment

Originally published 30 Jan 2008

It's common that configuration changes between environments (e.g., development, test, production).  For example, you'd definitely want a different DataSource URL depending on what environment you're running in.  Do you really want to hit the production database while developing?  Spring doesn't seem to support this per environment configuration out of the box (at least I've never been able to find it).  There's been a request for this feature for some time now (SPR-1876), but it's still open.

The reality is that properties files are the simplest and easiest way to configure an application, particularly for non-developers.  What I'd like to see is a PropertyPlaceholderConfigurer that can configure an application per environment.  This is what SPR-1876 is all about.  However, it suggests there be a file per environment.  As an alternative, I suggest a single properties file.  A single file containing all the properties would be the simplest to maintain.  Think of it.  I've got a new property.  Do I want to add it to three files or one?  If I want to see all the possible values of a property, do I want to open and hunt through three files or one?

Grails supports the idea of grouping all the properties together, but it uses multiple configuration files (Config.groovy and DataSource.groovy) depending on the property.

Okay, so here's what I propose the properties file look like:

# this is the default value for all environments
my.prop=defaultValue
# but in the production environment, it is different
[production].my.prop=productionValue
 
# here's another property with no default
# (adapted example from the Grails user guide)
[development].dataSource.url=jdbc:hsqldb:mem:devDB
[test].dataSource.url=jdbc:hsqldb:mem:testDb
[production].dataSource.url=jdbc:hsqldb:file:prodDb;shutdown=true
 
The context definition simply uses the property name without the environment and the custom per environment PropertyPlaceholderConfigurer provides the appropriate value:

<bean id="dataSource"
      class="org.springframework.jdbc.datasource...">
  <property name="url" value="${dataSource.url}" />
</bean>

So what are the downsides to this approach?  I suppose you can argue that a single properties file could be cluttered with details you don't care about.  For example, a production administrator may not care to see all the development and test values.

The other downside I can think of is forgetting to override a property.  That is, nothing reminds you that you have to override the DataSource URL for production.  It's content with using a default value.  The separate file approach would handle this because the property would be undefined.

So if I were reading this I'd say, "Spring is open source dude.  And those Spring guys are busy.  Go implement it and attach it to SPR-1876."  And if I get some time...

Law of Demeter and Unit Test Setup

Originally published 30 Nov 2007

Have you ever seen or even developed a long, complex setup method in JUnit?  Maybe it was because the production code was violating the Law of Demeter.  Here's another Robert Martin Craftsman style blog.

Apprentice:  Journeyman, I'm having trouble writing the fixtures for my JUnit tests.  They take too long.  I spend about 10 times longer writing tests than the production code.
Journeyman:  Really?  Unit test expert Gerard Meszaros says that tests should take only 10% to 20% of development time.
Apprentice:  Well, I'm experiencing the exact opposite.
Journeyman:  Let me take a look.
Apprentice:  Sure.
Author Note:  This is a trivial example for this blog entry.  I've seen some really obscure test setup methods.  The example is based on the sequence diagram chapter in Martin Fowler's UML Distilled book.
 public class OrderTest {
 
     private Order order;
 
     @Before
     public void setUp() throws Exception {
         Product product1 = new Product(11.00);
         OrderLine line1 = new OrderLine(2, product1);
         
         Product product2 = new Product(22.00);
         OrderLine line2 = new OrderLine(3, product2);
 
         OrderLine[] lines = new OrderLine[] {line1, line2};
         order = new Order(Arrays.asList(lines));
     }
     
     
     @Test
     public void priceShouldBe88() {
         assertEquals(88.00, order.getPrice());
     }
 }
  
Journeyman:  Okay, can I see the Order.getPrice() method?

     public double getPrice() {
         double price = 0.00;
         
         for (OrderLine orderLine : orderLines) {
             Product product = orderLine.getProduct();
             double productPrice = product.getPrice();
             int quantity = orderLine.getQuantity();
             price += productPrice * quantity;
         }
         
         return price;
     }
 
Journeyman:  Ah, I see the issue.  Have you ever heard of the Law of Demeter (LoD)?
Apprentice:  Huh?
Journeyman:  The LoD has a few different names and related principles, but basically it means that an object should only deal with and only with its collaborators.
Apprentice:  Huh?
Journeyman: You see how Order iterates through its OrderLine objects?
Apprentice:  Yeah.
Journeyman:  Well, that's fine.  But then you grab the Product and then grab its price.  You've violated the LoD.  Order should only know about OrderLines.  You've got more of a procedural design here, where you reach down into all the objects in the graph, grabbing all the data you need, and then do your thing.  A more object oriented design would be to distribute the work.  Do a little bit of work and delegate the rest to collaborating objects.
Apprentice:  Oh, I think I'm beginning to see.
Journeyman:  You should go read up on Craig Larman's GRASP principles.  He calls the LoD Don't Talk to Strangers.
Apprentice:  Okay, will do.
Journeyman:  Have you been following test driven development (TDD)?
Apprentice:  Uh..., well..., not really.  But I do write the tests afterwards.
Journeyman:  Complex test fixtures jump right out at you when you're doing TDD.  You realize something is wrong right away.  In fact, many times your testing style changes into more of an interaction based style.  But I'm getting ahead of myself.  Let's refactor this together.
Apprentice:  Okay, cool, pair programming.
Journeyman:  Right.  Remember, Order should only deal with OrderLines.
Now, the setUp() method looks like:

     @Before
     public void setUp() throws Exception {
         OrderLine line1 = stubOrderLineGetPriceToReturn(22.00);
         OrderLine line2 = stubOrderLineGetPriceToReturn(66.00);
 
         OrderLine[] lines = new OrderLine[] {line1, line2};
         order = new Order(Arrays.asList(lines));
     }
 
And the Order.getPrice() method looks like:

     public double getPrice() {
         double price = 0.00;
         
         for (OrderLine orderLine : orderLines) {
             price += orderLine.getPrice();
         }
         
         return price;
     }
 
Apprentice:  Okay, now I really see.  Order and its test are simpler, but don't we have the same amount of work anyway?  We had to write OrderLine.getPrice() and test that.
Journeyman:  Yes.  The work is distributed.  But I prefer lots of simpler objects with simpler test specifications than more complex objects and tests.
Apprentice:  What do you mean, more objects?  We didn't create any new objects.
Journeyman:  I meant generally speaking, not necessarily in this case.  Like I said, do a little work and pass the buck.  That might mean creating Data Clumps, aggregate objects for collections, objects that represent abstract data types, and so on, but again, I'm getting ahead of myself.
Apprentice:  Should I always follow the LoD?
Journeyman:  Well, there are of course consequences.  Classes tend to have larger APIs because sometimes they need to wrap the functionality of collaborating objects.  And you need to step through more objects to understand an algorithm as a whole because it's distributed.  But most of the time, I follow LoD because dependencies are reduced.
Apprentice:  All right, I'll give it a shot.  Thanks.

Parallel JUnit Ant Task

Originally published 16 Oct 2007

Over time, how do you maintain a maximum 10 minute build? This is an important agile practice for continuous integration.

It's inevitable. As more and more features are added to an application, the code base grows. The build has more to do: more compiling, more tests and metrics to run, more reports to generate. We've done numerous things to keep the build time down. In this entry, I'd like to describe one of them: running standard JUnit tests concurrently.

A colleague of mine came up with this idea. He first did a preliminary search to see if anything already existed.  Pretty much everything he found was intrusive, e.g., requiring extending a specialized TestCase.  He then attempted to combine Ant's Parallel and JUnit tasks. That is, he could simply nest N number of JUnit tasks inside a Parallel task. The problem here was coming up with a good way to divide the tests up into equal parts beforehand. My colleague quickly scrapped this idea in favor of a custom Ant task that wrapped the JUnit task.

The thought was to leverage the JUnit task as much as possible, but fronting it with the ability to run tests in parallel. The conceptual design looks like this:


There are N number of TestExecutors. A TestExecutor runs in a single JVM. Thus, there are N JVMs. Say we have a dedicated, 4 processor integration machine running builds. We may choose to configure our custom task with 5 TestExecutors (we'll add one assuming we're not 100% compute bound). Note: if you're familiar with the JUnit task, you may be wondering how its fork/forkmode attributes fit in. The answer is that they are eliminated in favor of this jvm-count attribute.

There's a single TestProvider. His job is to gather all the tests and hand them out one at a time to whichever TestExecutor is ready. A simple protocol exists to let a TestExecutor tell the TestProvider that he's done and ready for another test.  This should maximize parallelism.

The biggest negative with this approach is that since all tests run in a fixed number of JVMs, there's some loss of isolation here. Class level/global state set from a previous test could affect later tests. We're willing to trade this for increased performance.

The other possible downside is the use of a non-standard Ant task.

Using this approach we've successfully reduced our build time to usable levels.

The Continuous Integration Build Game

Originally published 14 Sep 2007

I was reading Alistair Cockburn's Agile Software Development book the other night. In it, he described a game developed by Darin Cummins to reinforce good development practices. The game is described here. This inspired me to create a fun game for continuous integration builds.

As I described in a previous blog entry, I've had problems in the past with people breaking the build. To help, I've used techniques like the rotating Build Nazi and the put a dollar in the broken build jar, but these are all negative focused. How about something that rewards developers that don't break the build? How about rewarding developers for following the best practice of breaking their work into smaller chunks and checking in early and often?

So I'm thinking of a game where a developer gets, say 1 point for getting his name on a successful build. The number of files checked in is ignored. We want to discourage big check-ins, so you get more points for smaller grained check-ins since your name will show up on more successful builds. And on the other side, you get points taken away when you break the build. Now, we want to keep things simple, so we could probably stop right here, but I'm thinking that some failures are worse than others. So maybe we have something like:
 DescriptionReward Points 
 Check-in and build passes
 One or more unit tests failed-10 
 Compiler error (come on now)-20 
 Big check-in caused build to remain in a broken state for hours-40 









It's a game, so we need something for the winner. Bragging rights is too lame, so maybe lunch on the team, or some kind of trophy kept on the winner's desk to remind everybody of his champion status.

Now, there are some negatives. Perhaps not everybody would want to play. Particularly, notorious build breakers wouldn't want everybody (specifically management) to see their poor results. In that case, I suppose we could only publicly display the leaders, or top half, but that wouldn't be as fun.

People could easily cheat too. Maybe, write a cron job that every hour checks out a file, changes a comment and checks back in. We'd have to look out for that kind of thing.

What about the analysis time required to keep score? I could easily see how a post processing ant task could be developed to update the points for developers on a successful build. But for a failure, I think you'd need human analysis. That's a negative because it requires time, so the job could be rotated. On the plus side, what I've noticed is that analyzing why the build failed brings awareness to issues. Issues like some people needing training, or a test requiring change because it's non-deterministic, or a hole in the pre-check-in process used to ensure a successful build.

To keep the game fresh and the developers motivated, we'd have to reset the points periodically. Iteration boundaries seem appropriate here.

Well, maybe I'll give it a shot and see what happens...

The Difference Between a Property, Field, Attribute

Originally published 3 Aug 2007

I’ve been asked a similar question at least three times this year.  It goes something like, “What’s the difference between a field and a property?”  Or, “Is there a difference between an attribute and a property?”  Given that many inexperienced developers automatically generate getters and setters for all their fields, I can understand the confusion.

I always give them the example of a circle.  Here’s some Groovy code:

class Circle {
   def radius
   
   def getDiameter() {
      2 * radius
   }
   
   getArea() {
      Math.PI * Math.pow(radius, 2)
   }
   
   def getCircumference() {
      2 * radius * Math.PI
   }
}

radius is a field, also known as an instance variable, also known as a member variable in C++.  It’s an implementation detail of the Circle, i.e., it’s a private variable that gets stored as part of the Circle object.

Properties are more public aspects of an object.  In Java, following JavaBeans conventions for getters and setters allows you to expose properties.  Those properties don’t have to be backed up by fields, but in many cases are.

Coming back to the Groovy Circle, what are its properties?  We’ll you get radius for free as a read-write property.  But because we’ve got some getters that compute other values based on radius, we also have diameter, area, and circumference properties.  These happen to be read-only properties since we didn’t provide setters, but we could have (which would update the radius field accordingly).  You can dump all of a Circle object’s properties in Groovy like this:


println myCircle.properties

The point is Circle could be implemented in different ways: by using a diameter field instead of radius, for instance, but it would have the same properties.

Now, let’s talk about attributes.  I typically think of UML when I hear attributes.  Here’s UML for Circle:



Notice that I’ve distinguished the derived diameter, area, and circumference attributes with a “/”.  This is standard UML.  radius is a regular, non-derived attribute.  We can distinguish fields from other computed properties like this.  However, because we’re modeling, and thinking in a higher level of abstraction, I like to think of attributes as more synonymous with properties.  I’d rather not make the assumption that every attribute is implemented with a field.

A similar discussion of these terms can be found here.

How to Refactor Many Arguments

Originally published 25 May 2007

The following conversation is based on a true story.  The roles are taken from Robert Martin’s Craftsman series.

Apprentice:  Hey, can I ask you something?
Journeyman:  Sure.
Apprentice:  I was looking at a big constructor on one of our domain objects and I saw some groupings and was wondering whether it would be good to create objects for those groupings.  Is there some kind of pattern for that?
Journeyman:  Well first, how many parameters are there?
Apprentice:  72.
Journeyman:  72!?  Wow.  Talk about a code smellPMD's ExcessiveParameterList rule will flip out on this one.  I guess the author mapped those parameters straight from the out-of-our-control XML schema, whose document instances get unmarshalled in order to create the object.  We don’t have to follow suit and create such a flat domain model that matches the schema.
Apprentice:  What do you mean?
Journeyman:  Well, what I really mean is that I'd much prefer distributing the logic among some richer objects.  That is, create more little objects with little methods that do a little of the work.  Not one big object that does it all.  There would be an impedance mismatch between XML and the domain model, but that’s typical because of the technology differences.  I suppose the only benefit of the big constructor approach is the simpler mapping... maybe, but the negatives are much too great.
Apprentice:  What are the negatives?
Journeyman:  Well, for one, trying to list all 72 parameters in the correct order would be a pain.  And trying to understand the object as a whole with all those fields would be difficult.  Here [grabbing his Refactoring book].  You were initially asking about the “Introduce Parameter Object” refactoring.  Read that refactoring.  It can probably give you better details.
Apprentice:  Martin Fowler.  Does he know what he’s talking about?
Journeyman:  Oh yeah.  He’s one of the Masters.  Let’s take a look at the class.  [Brings up the class in the IDE]

public ImportantDomainObject(
   ...
   int x, int y,
   ...
   double min, double max,
   ...)
Well, I can see a few data clumps already.  You see the x and y?
Apprentice:  Yeah.
Journeyman:  That’s probably a Point.  Also, that min and max, that looks like a Range.  What I would do is search for data clumps like these and create objects for them.  Then, see where they’re used in ImportantDomainObject and start moving behavior into the new objects.  Distribute the logic.  We can create a richer domain model this way.  You’ll probably find clumps of these clumps and can follow the same process.
Apprentice:  Okay.  This sounds great.  Thanks.  I’m on it.
Journeyman:  Thank you for recognizing this and taking the initiative to improve the code base.

DBC Precondition/Postcondition Subclass Rules

Originally published 12 Apr 2007

I was recently in a discussion regarding the precondition and postcondition covariance and contravariance rules of design by contract (DBC) in an inheritance/implementation hierarchy. In order to obey the Liskov substitution principle (LSP) , a subclass or interface implementation can only meet or:
  • Weaken the preconditions of the base class, not strengthen them (contravariance).
  • Strengthen the postconditions of the base class, not weaken them (covariance).
There was some confusion as to why this was the case. So I came up with an analogy to explain. If you think in terms of the client requesting services of an object specifying the preconditions and postconditions, it all makes sense.

So the analogy? Have you ever had to request a help desk ticket? Maybe you need administrative rights or something. And you hope that a specific someone is assigned to that ticket because she's really good and can get the job done quickly and accurately. Let's use that as an example.

The general IT support group contract is:
  • Precondition: Submit a help desk ticket filling in all the forms on the web site.
  • Postcondition: We'll get back to you within two hours.
Now, suppose Suzy Support works in the IT support group. She's really good and efficient. You can think of her as a subclass of the IT support group base class. In the past, whenever I've had a problem and I just happened, purely by chance, to pass her in the hallway and mention it to her, she could solve it in minutes. From my perspective, as a client of the IT support group, that's acceptable.

Suzy's contract is:
  • Precondition: Just tell me the problem. You don't have to fill out a help desk ticket for me. This is weaker than the IT support group's precondition because I have less work to do.
  • Postcondition: I'll solve it in minutes. This is a stronger postcondition. I get better response time.
Make sense?

By the way, for all the IT support group managers out there, this is just an analogy. I know bypassing the help desk ticketing system messes up metrics.

Groovy or JRuby?

Originally published 24 Mar 2007

Which dynamic programming language should I introduce to a group of relatively young, inexperienced Java developers? The choice is between Groovy and JRuby.

Ruby and particularly Ruby on Rails has been hot for awhile now. Ruby in the enterprise is becoming more accepted. As Martin Fowler pointed out nearly a year ago in this blog entry , some top minds in our field including Bruce Tate, Justin Gehtland, and Stuart Halloway have moved Beyond Java . Even one of my old mentors, Mike Gaffney , has repeatedly told me to bail on Java. But I agree with Scott Davis as he points out in this podcast , Java is not dead.

The growing acceptance of Ruby is a strong reason to consider JRuby. Being able to run Ruby on Rails apps on the JVM is very appealing. But remember the context of the decision here. We're talking about a group of developers whose only development language has been Java, and whose projects for the foreseeable future will run on the Java platform. Groovy wins in this context.

There are lots of reasons to get excited about Groovy. It too runs on the ever-improving JVM. In fact, it was the first scripting language officially approved by Sun to run on the JVM. It can be compiled to byte code or run in script form. The interoperability is fantastic. Developers can still use Java, its tools and familiar libraries such as Spring and Hibernate , where appropriate, and mix in Groovy, where that is more appropriate.

But what really makes this decision easy is the similarity in language syntax. You can take a snippet of Java code, drop it into a Groovy file, and it will nearly work. Then, you can take that snippet and start making use of Groovy language improvements such as closures to make the code more readable and intention revealing .

So the decision is easy here. Take the seamless step forward and introduce Groovy.

Bad PMD Rules

Originally published 9 Mar 2007

PMD is an excellent tool for finding potential bugs and improving code quality, but it can generate a lot of false positives. Here's my list of the top 10 rules I turn off immediately, in alphabetical order, with a short comment explaining why. Descriptions for the rules can be found here.
  1. AtLeastOneConstructor: Why? Code is smaller without an empty, no-args contructor.
  2. AvoidInstantiatingObjectsInLoops: This one just seems to generate too many false positives. In addition, for short-lived objects, garbage collection is essentially free.
  3. CallSuperInConstructor: super() is called implicitly and requiring it just adds more lines of code.
  4. JUnitAssertionsShouldIncludeMessage: Most of the methods in Assert already generate good messages. I do include a message for those that don't (assertTrue(), assertFalse(), fail()).
  5. LocalVariableCouldBeFinal: final for variables is usually overkill and it adds line length.
  6. LongVariable: Clarity rules!
  7. MethodArgumentCouldBeFinal: Same as LocalVariableCouldBeFinal.
  8. OnlyOneReturn: I think it's clearer to exit early. It reduces the amount of things to think about below the exit.
  9. PositionLiteralsFirstInComparisons: This really isn't that bad, but myString.equals("x") just reads better than "x".equals(myString).
  10. ShortVariable: There are just too many cases where it's acceptable to have short variables in small methods.
 These two just missed the cut:
  • ShortMethodName: I try to make my names expressive. I've never seen this rule fire.
  • SignatureDeclareThrowsException: This is an okay rule, but for JUnit tests, who cares?

The todo Test Category

Originally published 26 Jan 2007

Recently, Elliotte Rusty Harold blogged about committing test cases for bugs to the build cycle before fixing them right away. He enumerated several good reasons for doing this such as the fix being difficult and time consuming. His point was that breaking the build (where breaking the build includes passing all the unit tests) is okay in some cases.

I disagree with this stance, but I think Harold has a really good idea there, which I'll get to in a moment. I think breaking the build does include passing all the unit tests and should be part of the continuous integration process. I've always liked the idea of the tests serving as a safety net when I add new functionality or refactor existing code. The tests not only specify the desired behavior, but act as a regression suite.

So what's the idea? I first learned of the concept of test categorization from Cedric Beust's TestNG framework and the writings of disco king Andrew Glover . Essentially, the concept is to break tests into groups such as unit, component, and system and run them at different intervals. Unit tests are run many times a day as code is developed, component tests less frequently, and system tests the least frequently of all. You can categorize tests in several different ways such as by file naming conventions, placing them in different directories, or using Java annotations.

Harold brings up a new category - the todo category. (I was originally thinking the fixme category, but I thought todo could serve more purposes.) The todo category contains all tests that serve as a reminder of work to do. They obviously fail or there would be nothing to do. When it's finally time, the test moves out of the todo category and into the unit, component, or system category as appropriate. Although you could have a todo category for each of the original, regression categories (todoUnit, todoComponent, and todoSystem), I think that would be overkill. Notice that I slipped the word regression into "original categories" above. I'm advocating that the original categories should pass at all times - they should break the build if they fail (well, at a minimum the unit category). Once tests get into the regression categories, they never go back to todo.

The obvious downside of this approach and of test categorization in general is the added complexity. It's more work to maintain categories and in this case, move tests into different categories than a simple one category fits all approach. Also, there's more work (at least initially) to setup the running of those categories. As always, I recommend weighing the options and deciding what's best for your particular situation. See Glover's writings for more benefits of test categorization.

Behavior Driven Development with JUnit 4

Originally published 14 Dec 2006

JUnit 4 makes Behavior Driven Development (BDD) style testing (or specification) easier.

Let's quickly look at an example, the idea stolen from one of David Chelimsky's blog entries about specifying the behavior of a stack. We're focused here on the specification of an empty stack:

// notice the class name specifies the context
public class EmptyStack {
    private Stack stack = null;
    
    @Before
    public void setUp() {
        // set up the context
        stack = new Stack();
    }

// notice the name focuses on the context
    @Test
    public void shouldBeEmpty() {
        assertTrue("not empty", stack.isEmpty());
    }

    @Test(expected=EmptyStackException.class)
    public void shouldComplainOnPeek() {
        stack.peek();
    }
    
    
    // more specification focused methods
}

The point here is that you can get many of the benefits of BDD (a focus on specification rather than testing) using the familiar JUnit framework.

Now, if you're a hardcore BDD'er, then you might complain that you still have to use a test-centric vocabulary. You still need those Test annotations and method calls like assertEquals rather than Dave Astels' preferred shouldEquals calls.

Also, from the legacy side of the fence, you lose the convention of method names starting with testSomething and class names ending with Test. It's sometimes hard to let go of that if you've been writing tests for a long time and it's super clear to spot those test methods and test classes if you're following a naming convention. Furthermore, Ruby on Rails has taught us that convention is a good thing. The test method naming convention doesn't really bother me so much. The @Test annotation makes things clear enough and shouldSomething really gets us focused on specification. However, I haven't let go of ending the class name with Test. Maybe it's more because Ant can pick those tests up easier if there's a standard naming convention.

The last negative I can think of is that of consistency. Having a mix of old style test centric tests and BDD style tests could bother some.

So, let's be honest. What am I really doing? Well, I am incorporating more and more BDD into my work. However, I still shy away from creating new classes. The BDD style really lends itself to many test classes (contexts) per class under test. In the case of Stack, as David Chelimsky points out, you'd also need AlmostEmptyStack, AlmostFullStack, and FullStack classes to fully specify the behavior. I just can't commit myself to writing all those. But I am focusing more on the set up of a context and the specification methods of that context. I just may cheat a little and combine contexts into a single class. So perhaps in the Stack example, I'd combine the empty and almost empty stack contexts under one test class. You know, I'd set up an empty Stack in setUp and for the almost empty context, I'd do a little more set up (push something) in the appropriate shouldSomething methods to make it almost empty.

I also revert back to legacy style testing sometimes. This is usually when I'm modifying an existing, old style test. It's just simpler and faster, and typically the context isn't set up too well for the BDD style.

So my advice is to use the right tool for the job. Use BDD style when that makes things clearer and you want to focus on specification. Use test centric style when that's easier. Try to do a better job of focusing more on specification and less on verification, especially when you're adding new behavior.

Testing Private Methods

Originally published 18 Nov 2006

How should I test my private helper methods? Well, I used to just expose the method by making it accessible to the unit test (typically by making it package scoped).

So if I had a method like this:

   private void someHelperMethod() {
   }

I'd expose it like this:

   /** Exposed for testing purposes only. */
   void someHelperMethod() {
   }

I'd laugh at people who used the <code>PrivateAccessor</code> of the JUnit-addons project or directly used reflection to get the method and do a <code>method.setAccessible(true)</code>. I'd think, "What's the big deal? Just change the access of the method. Then, you'll make things easier for your IDE when you want to do a method rename refactoring or just find references to this method."

I've recently hopped the fence and I'm now campaigning for the other side.

Now, before I get into the why, I should point out that red flags should have gone up after you read the first question of this entry. Testing private methods? Why would you do that? Believe me, as I've mentioned here , I think unit tests should focus on black box testing as much as is practically possible and therefore, you shouldn't be testing private methods. I'm also familiar with the argument, "You could perhaps move those private methods to a helper class and therefore, they wouldn't be private." But I still believe there are cases where it makes sense to (have and) test private methods. And in this blog, I'm talking about these exceptional cases.

So what are some exceptional cases? Think of a high level method specifying the steps of an algorithm or a method on a mediator/controller-like class collaborating with multiple objects. You did break that method up into smaller, easier to understand, more communicative methods as described here , didn't you? So which is easier: setting up the object under test and its collaborating objects (the fixture) and calling the private helper method directly; or setting up the fixture and calling the public, higher level method that does some processing before calling the private helper method? In the latter case, you surely had to do more set up to navigate down to the same entry point in the private method. If you didn't or it wasn't much more work, then I'm talking about a different context.

All right, so why the switch? Well, first off, the real intent is to make them private. Private leaves no doubt that the method isn't meant to be used outside the class.

Along with that is encapsulation. As multiprocessor systems become more mainstream, exploiting concurrency by multi-threading adds complexity to software development. Encapsulating as much as possible can only simplify reasoning about thread safety.

One final reason is that making things private helps static analysis tools like FindBugs and PMD better identify dead code that can be eliminated. "Hey, this method is private and it's not being called!" Keeping the code base lean and clean is a key to maintainability.

The biggest downside, as I've hinted at above, is doing things like renaming/changing the signature of the method, or just finding references to it. The IDE won't make the appropriate changes to the test or find the reference in the test. But you'll find this pretty quickly when you run the test and it fails. Hint: Maybe you should have changed the test first anyway.

So join me on this side and remember that testing privates is better meant for special cases.

Comments on The Danger of Mock Objects

Originally published 24 Sep 2006

Rarely do I disagree with Uncle Bob.  I’ve learned a lot from reading his writings over the years.  He recently blogged about mock objects in response to a blog from Cedric Beust.  Check it out, read my comments on Uncle Bob’s blog, and decide for yourself.

Break, Then Leave and Pre-Check In Checklist

Originally published 16 Sep 2006

My build/SCM blog series (see part 1 and part 2) ends with a look at one of the most annoying things a member can do to his team and a checklist to follow for reducing the chances of breaking a build.

So what can really irritate the rest of team? How about checking in a bunch of changes that break the build and then leaving for the day? This is bad enough for small teams, but consider bigger, distributed teams developing complex applications. You could have team members who start their day after you've left. If the team follows the rule of not checking in on a broken build, what's the team supposed to do? Well, to continue progress, somebody else has to step in and fix the build. This can really slow a team down.

A good practice is to wait for a successful build after your last check in before leaving. After all, you know the most about the changes that would have broken the build. This practice implies some prerequisites: a continuous integration system that runs often and builds that run quickly.

Now that I've ranted enough, what can you do to reduce the chances of breaking a build in the first place? Listed below are steps to follow. It starts at the point where you're ready to check in. Essentially, you mimic the continuous integration system on your development workstation.
  1. Update your view so you have the latest versions of all the files.
  2. If you use configuration management software that allows hijacking and reserving files (like Rational ClearCase), now is the time to "unhijack" and reserve existing files with changes. By existing files, I mean files the CM software knows about (not new). You only need to reserve files if your team follows that practice. If nobody on the team reserves files, then skip this part, but it's important to reserve at this point if your team follows this practice to ensure all of your changes can get checked in.
  3. If necessary, merge any existing files that have been updated and checked in by others since you checked out. You want your files to represent what will be checked in. Now, your files will contain your changes merged with the most recently checked in versions.
  4. Make note of any new files that need to be added to source control and any deleted files that must be removed from source control. Forgetting to do this can cause a build failure.
  5. Run the application to make sure your changes function and integrate well.
  6. Do a clean build, compile and run all the unit tests. Consider a single ant target that does this for you.
  7. Check in. Don't forget those new and deleted files!
Do I follow all of these steps on every check in? No way. That wouldn't be practical. You need to consider the size and impact of your changes. The bigger they are, the more rigorous you should be. If I made just a small change, I would probably just make sure the code compiled and unit tests for that change passed.

Well, that's it. Hope you enjoyed.

After the Build Breaks

Originally published 28 Aug 2006

The second part of the build/SCM blog series (see part 1) deals with what happens when a build breaks. This happens, hopefully infrequently, but it happens. Getting the build back on track should be one of the highest priorities - your build box teammate is down and you need to get him back on his feet.

Ideally, what should happen is the following:
  1. Notification of a build failure is distributed by e-mail to the team. This is part of the publishing functionality of CI systems like CruiseControl.
  2. Particularly, whoever checked in since the last successful build analyzes the results.
  3. The person who broke the build mans up and replies to all, “This is me. I’m on it.”
  4. Nobody checks in until the build is fixed unless the check in is for the purpose of fixing the build.
  5. Somebody fixes the build, obtaining any help necessary.
  6. (Optional) Once the build is believed to be fixed, the person from step 3 sends an e-mail saying the build should be fixed.
Let’s talk more about step 3. I know it could be embarrassing to break the build. The step is not to call anyone out or lay blame. The point is to communicate to the team, who may be large and distributed, that somebody is taking responsibility for fixing the build. Several others may be analyzing and trying to fix the problem and this communication is to reduce that wasted effort. I have been on teams where the build has been broken for hours and nobody knows if anybody is doing anything about it.

Sometimes nobody is doing anything about it. If this is a recurring problem, I recommend the use of a Build Nazi (in honor of the Seinfeld Soup Nazi). The job of the Build Nazi is to make sure somebody is responsible for fixing a broken build. The job is not to fix the build (unless help is needed). The Build Nazi job is not a fun one and therefore should be rotated. It is also a controversial role in an agile environment where the practice of collective code ownership is being followed. Ideally, the team is following the steps above and a Build Nazi is totally unnecessary. I’ve found I’ve had to resort to the role for short term periods in times of chaos when the build is breaking more than it is succeeding. Once things get back on track, the Build Nazi role typically becomes dormant.

The last step I’d like to elaborate on is step 4. One of the worst things to do is check in a bunch of changes that don’t have anything to do with fixing the build. Checking in causes the CI system to start another build that will result in another failure. Piling on check ins during a broken build prolongs the broken state of the build and therefore the feedback cycle that is all important in agile development. Another risk is that the fix necessary to correct the original problem must be applied to the check ins that occurred after the first failure, further prolonging the broken state.

I plan on finishing this series next time. Stay tuned.

Anti-Practice: Reserving Files

Originally published 20 Aug 2006

Here is the first blog in a series discussing some obsolete practices and just plain annoyances regarding builds and source code management. The context is an agile development environment following the best practices of continuous integration (CI) and collective code ownership. The team is of medium to large size, possibly distributed, and working different hours, possibly in different time zones.

The first obsolete practice is reserving (locking) a file by default when checking out the file. (I’m using ClearCase terms here). This is like pessimistic locking in the database world. The thought is, “I’m going to reserve this so that no one else will affect me. I’ll make my changes, then check in. If anybody else wants to modify the same file, they’ll have to wait for me to finish, then deal with merging in their changes.” That doesn’t sound like collective code ownership to me. Sharing the code base means sometimes having to deal with merge issues.

Reserving files prohibits others who are ready to check in. In the worst case, suppose a person with reserved files leaves for the day (or for vacation). Others are likely working different hours, so what do they do? They try to get in touch with you to see what’s up. They can’t get a hold of you, but they are ready to check in and move on. So after some time not hearing from you, do they unreserve the files? In some stricter environments, what if they don’t have rights? What if they can’t get in touch with the select few that have rights to unreserve files? (By the way, if you do unreserve a file, make sure you tell the person who reserved it that you did that.) The point is reserving files slows down the team.

There are of course exceptions. An important change or a change that has a big impact to the system might merit an early reserve on a check out. In this blog, I’m talking about reserving files by default.

The biggest positive of reserving early is finding out that somebody else is working on the same file. For example, I try to check out a file and reserve it right away and find that I can’t because Joe’s got the file reserved. Depending on the circumstances, I may want to know exactly what Joe’s doing. I’m not familiar with another way of getting this notification as easily without getting spammed by unwanted notifications. In other words, I might want to know, upon check out of a file, whom else has the file checked out.  For me, this advantage doesn't outweigh the disadvantages mentioned above.


Next time I’ll talk about an annoyance or two that occurs when a build breaks.

Checklist for Finding Test Cases

Originally published 4 Jun 2006

Suppose I'm focused on specifying the behavior of a particular method. How do I know if I've specified everything? I'd like to share a checklist of things I think about to help accomplish this. The goal is to thoroughly specify the behavior of the method by building up a collection of tests following test-driven development.

I'm going to base my example on Spring's BeanFactory interface. Suppose I'm developing a Map backed BeanFactory and I am currently focused on implementing the following method of the BeanFactory interface:

Object getBean(String beanName) throws BeansException

Let's call the class I'm developing MapBackedBeanFactory. When getBean() is called, the implementation will look up the bean in its beanMap field.

When I think about test cases, I think about:
  1. The state of the object under test (each field), i.e., the fixture. In the example, this would be the beanMap.
  2. The state of the parameters of the method I'm calling on the object under test. In the example, there's only one: the beanName.
That was pretty obvious. Who doesn't think about these things? But what is the checklist? The checklist consists of thinking about the following for each of the above:
  1. A good value. This will execute the normal flow and is the most obvious test. For beanMap, this means the beanMap will have a good bean mapping. For the beanName, a good value is a name that will be found in the beanMap. There may be multiple good values, not really in this case, but think about the classic bowling game. You'd want tests for a non-mark frame, a spare, and a strike. I'll also consider multiple good values for things like collections, maps, and arrays. That is, maybe I'm dealing with a  List and I need a List of two good things. However, I find that most times, one good value in the collection, map, or array is sufficient.  This is because I write the iteration code for the single case and a multiple collection test case wouldn't make me write any new code.  If I find I need multiple good values, I'll start with the simple cases and work towards the harder.
  2. A negative case. In #1, I set up the fixture and parameters to get a good result.  Now, I want to think of the negative side.  So I found beans in #1, but now I want to not find a bean.  Pass in an unknown beanName.
  3. A null value. What if the beanMap were null? That's probably not realistic, so I'm going to skip that. What if the beanName were null? That's possible and according to the BeanFactory JavaDoc, it looks like the contract specifies to throw a NoSuchBeanDefinitionException. I better write that test. Many times, if you don't handle null, you'll get a NullPointerException and that might be acceptable. In those cases, I wouldn't write a test (since it wouldn't require any code).
  4. An empty value. The beanMap could have no mappings or the beanName could be "". I think #2 covered us here, so no test is required.
  5. An invalid value. What if beanMap were invalid? What if it were a Map of Integers to bean Objects? I'm going to assume this is unrealistic in this case. We would have handled such a situation when the map was loaded.  What if the beanName were "invalid"? Well, we covered that in #2.
  6. An exceptional case. I'm pretty much talking about handling exceptions here. Maybe we need to write a test case to see if we handle an exception properly. The example doesn't have such a case, but suppose the method under test were doing some I/O. Maybe we'd need to write a test to see if we handled that properly. In all honesty, even though I think about this, I don't usually write a test to handle exceptions. I won't get 100% test coverage, but I usually ignore writing the test because checked exceptions force me to write the necessary handling code.
That's a lot of stuff to think about for each object field/parameter. That's the biggest negative - thinking about all those things, all the possibilities, all the time it takes. Many times, as you saw in the example, most of those things just aren't realistic.

Sometimes I'll take a different approach. I'll start with the simple, good case. I'll write the test and get it to pass by writing the minimal code to do so.  In the example, I'd have:

return beanMap.get(beanName);

Then, I'll look at the method I just wrote word by word (or maybe token by token).  At each word, I'll think about special cases (things like in the checklist) and jot down test cases. Then, I'll go back to the test, write one of the jotted-down tests, get it to pass, and continue until I'm done. It's a different perspective that goes a little faster at the risk of not quite catching everything as the first approach.  I like this second approach and have been doing it more and more lately.

How to Pick a Conference Session

Originally published 14 May 2006

I just recently attended a conference and had to make some tough choices among some concurrent sessions and tutorials.  To help, I came up with an objective scoring system.  The system consists of evaluating each session against three categories.  Each category is given a score from 1 to 3.  Then, the total is tallied to give each session a final score ranging from 3 to 9.  The session with the highest score wins.

Quickly, the three categories are:
  1. Current knowledge.  How much do I currently know about this?
  2. Needs.  Do I use this now or will I use this in the near future?
  3. Wants and other miscellaneous factors.  How much do I want to go to this?
The first category addresses breadth of knowledge.  The pragmatic programmers would call this diversifying your knowledge portfolio.  In Scott Ambler terms, attending a session on a topic you don’t know much about might help you on your way to becoming a generalizing specialist.  It’s the factor that addresses learning something new.  Give yourself a score of 3 if you know little or nothing of the topic.  Give a score of 1 if you’re an expert or really proficient.  For example, suppose you’re a Java guy and don’t know much about .NET.  Give .NET related classes a score a 3.

The second category addresses your immediate and near term needs.  Do you use this now?  Will you use this in the next six months?  This category addresses the depth factor.  Attending such a session may shed some light on something you never knew about something you currently use.  You may learn how to use a technology better, some best practices, or some tips and tricks.  Give a score of 3 if this is something you use or will use soon.  Give it a 1 if you don’t see it in your near term future.  Give it a 2 if you’re not sure.

The last category addresses everything else.  How much do you want to go?  How much do you want to learn this technology?  Maybe it’s AJAX or Ruby on Rails and you give it a score of 3 because you really want to learn that hot technology.  Maybe Uncle Bob is speaking and you really want to go see him perform.  Score it a 3.  Or maybe you’ll give the category a score of 1 because you think you won’t get much out of it.

Add the scores of the three categories to come up with a score for the session.  Then pick the session with the highest score for that time slot.  What if there’s a tie?  Well, you could look at the categories and pick the session with the highest score for the category most important to you.  I think conference sessions in general are better at the breadth category.  How much can you really get out of going to a 90 minute session anyway?  Longer tutorials are better at depth.  Consider that for helping to break a tie.

What I found is that many times the first two categories cancel each other out.  Logically, if you don’t know much about something (score 3), then you must not be using it and may not need too in the near term (score 1).  On the flip side, if you know a lot about something (score 1), it’s probably because you’re using it (score 3).  However, what’s interesting is when you don’t know anything about a technology and you’re going to use it real soon.  That’s 6 points right there.

The other negative is that scoring sessions is time consuming and we programmers are notoriously lazy.
 
However, learning is a career investment and investments should be carefully chosen.
I hope this scoring system helps you.  I know I was pretty happy with the results.

Collection Utility Methods on the Object

Originally published 16 Apr 2006

I’ve been experimenting lately with a technique that I’m still unsure about. It has a little smell to it, but it has some benefits. The technique is to add static collection-related utility methods on an object. That is, I’ll add a static method that takes a collection of that object on the object itself.

Take the classic Shape example:

   public class Shape {
      // here’s a typical instance method
      public void draw();

      // here's what I’m talking about.
      public static void draw(Collection shapes) {
         for (Iterator it = shapes.iterator(); it.hasNext; ) {
            Shape shape = (Shape) it.next();
            shape.draw();
         }
      }
   }

Now, I haven’t done this kind of thing much; maybe once or twice. The first time I did it was because I noticed some duplication. The exact looping code was shared by multiple clients. I wanted to remove the duplication, but I didn’t know where to put it, so I stuck it on the object itself. The clients became (in the context of this example):

   someShapeClientMethod() {
      Shape.draw(myShapes);
      ...
   }

I think this actually improves clarity (although I could have accomplished the same thing by creating a helper draw() method in the client). This also has the benefit of grouping the iteration logic close to its source, e.g.,Shape here.

The real world example where I first used this was a query.  I had a collection of objects and I wanted to find a specific one. Say, I had a collection of Shapes and I wanted to know who was the biggest:

public static Shape getBiggest(Collection shapes);
 
I think this technique is most applicable under the following circumstances:
  1. For good reason, multiple clients share the code and you want to remove the duplication.
  2. The number of such utility methods is very small. Otherwise, it would be better to group the methods in another, separate utility class (e.g., Shapes or ShapeUtils).
What bothers me most is that I can’t recall seeing this technique used anywhere. That could mean that there’s something really bad about it. One problem is that it clutters the object with additional methods. That’s why I think it should be used judiciously. Also, from a purist view, do these methods really belong on the object?

What do you think?

Test Smell: Test Breaks After Good Refactoring

Originally published  17 Mar 2006

Goal: Write a thorough unit test that doesn’t require any changes after mercilessly refactoring the object it’s testing.

I’ve been noticing that I sometimes feel a need to go back and modify my test to be more in line with the object it’s testing. And I’m not alone. David Chelimsky recently wrote a related blog. That blog, in the context of my goal, focuses more on changing tests because they start drifting away from the code they’re testing.

In this blog, I want to talk about changes required because the tests break as a result of good refactoring. This isn’t something that occurs often, but it does happen, and I think it’s a code smell. One of the main advantages of having a test suite is that you can refactor to make the code better or try a different algorithm without worrying whether the final result broke existing functionality. When the tests break as a result of correct refactoring, I lose that safety net kind of feeling.

So what does it mean when a test breaks as a result of good refactoring? Well, what jumps out is that the test is too coupled to the main object it’s testing. However, it is a unit test and unit tests are traditionally white box tests and white box tests imply knowledge of the code and that implies coupling.

What about a black box approach to unit testing to reduce coupling? Elliotte Rusty Harold recently posted a comment on one of his blogs Experimental Programming, “The class is a black box which is accessed solely through its public interface. The internal details are deliberately opaque, and thus can be changed as necessary to improve performance or other desirable characteristics without breaking client code.”

Yeah, that’s what I’m talking about, but I think a total black box approach would be too coarse grained, particularly for ensuring 100% test coverage. Specifically, I’m thinking that touching every conditional statement would require a lot of black box test code. I think the first tip then is to find a balance between black box and white box testing; the sub-goal being to find the appropriate coupling to get 100% coverage with the least amount of test code. I’m trying to keep this in mind when I write tests.

Another technique to help achieve the goal is to shy away from mock objects, particularly strict mock objects. Martin Fowler wrote an article called “Mocks Aren’t Stubs”. In it, he describes the difference between mock objects and stubs. He also describes two styles of testing: state-based and interaction-based. Interaction-based testing uses lots of mocks, which really couples tests to the implementation details. This is not what I want if I want to achieve my goal. So I prefer the state-based approach Martin talks about.

Now, having said that, I do use mocks in certain situations. They’re good at making sure your object made a call on a secondary object:

   public void testInitRegisters() {
      // setup a mock object that expects a register call
      // and inject it into the object under test
      objectUnderTest.init();
      // verify the mock
   }

Typically, I find that I use mock objects when I’m testing void methods, just like the init() method above. In fact, thinking more about it, it might actually be when the method I need to call on the object I’m mocking is void, just like register() method above. I’ll have to pay more attention, but that seems to make sense. The point is to use mock objects only when they are more practical than the state-based approach, preferring the state-based approach otherwise to reduce test coupling.

Stubs are great for tests whose primary object calls methods that return values from secondary objects. I use mock libraries to create stubs, particularly for Interfaces and expensive-to-create classes, since it’s so easy to do so. I reduce test coupling by creating lenient stub objects that either ignore unexpected method calls or return empty values (0, null, false) for unexpected method calls. In EasyMock terms, I’m talking about nice controls.

I also use the Object Mother pattern as a factory to create the stub objects. This is so that the stub creation code can be shared among many test classes. I pass the values I want returned into object mother. The object mother creates the stub object with those values. In this way, the test has all the knowledge of the canned results. Then I use a state-based approach to verify everything. Here’s a quick example, putting this all together:

   private Order order = new Order();

   public void testGetTotal() {
      // LineItem is an interface.  LineItemMother uses a
      // mock library for creating a stub LineItem
      LineItem lineItem = LineItemMother.newItem(3, 2.50);
      order.add(lineItem);
      assertEquals(7.50, order.getTotal(), 0.001);
   }

A Logical Ordering of Import Statements

Originally published 3 Mar 2006

Do you care about the order of your Java import statements?  Well, maybe not, but I do.  The reason is because I like to do a visual check of my dependencies.  According to the Acyclic Dependencies Principle, there should be no cycles in packages.  Robert Martin wrote here, "Remember that the prime motivation for using Object Oriented Design is to manage module dependencies."  One way I check my dependencies is by scanning the import statements of a class or interface to make sure it depends on other, appropriate classes and interfaces.

The rule I use is that you can go up a package hierarchy and over to the right, but you can’t go down or to the left.  (Props go to Michael Gaffney for working with me to come up with this rule years ago).  Over to the right?  I’m working in the context of a layered architecture and when I say “over to the right,” I’m talking about something like this:

presentation -> business -> persistence -> general utilities

So starting on the left, dependencies flow to the right.  You wouldn't want to go back to the left.  That would introduce a cycle.  Also, within a layer, you could go up the package hierarchy, but you wouldn't want to go down.

Here’s an example.  Suppose I’m developing a web application and I’m working on a class in the presentation layer.  I might have something like (contrived):

package com.mycompany.myapp.web.struts;
 
import java.util.List;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.apache.commons.beanutils.PropertyUtils;
import org.apache.struts.action.ActionForm;
import org.apache.struts.action.ActionForward;
import org.apache.struts.action.ActionMapping;

import com.mycompany.myapp.utils.SomeUtil;
import com.mycompany.myapp.domain.SomeDomainObject;
import com.mycompany.myapp.domain.SomeOtherDomainObject;
import com.mycompany.myapp.web.SomethingHigher;
import com.mycompany.myapp.web.utils.SomethingToTheRightWithinModule;

Now, ideally, I’d like to list those imports in the exact opposite order.  That way, I could start at the package statement and look down and make sure the imports flow in an acceptable, directed acyclic graph kind of order, following the rule above.  In fact, I used to list them in this order, but everybody else lists them in an order more like the above, and so, I accepted defeat and now follow suit.  I just start at the bottom and look up.

I only care about ordering among packages here.  I don’t try to list the imports in dependency order among classes at the same package level.  That would be crazy.  However, I am strict within the layer of the class I’m working with.  In the example, that would be the web layer.  I’m looking at a class in ...web.struts and I want to make sure that I’m going to the right first, then up, all within com.mycompany.myapp.web in an appropriate package order.

Once I’m past the web layer, I loosen up.  I just want to make sure all the imports for the next layer are grouped together, and so on.  So in the example, all the domain imports would be together, then all the utils, and so on, all the way up to java.util.

You can set up this ordering standard in Eclipse under the Organize Imports preference.

How else could you check your dependencies?  There are useful tools out there like JDepend, and I use them.  But I still like this import ordering practice.  I typically do the manual scan right before check in time.

A Common Set of Test Refactorings

Originally published  19 Feb 2006

Today, I want to blog about what I think might be my most common set of low level Eclipse refactorings used together when I’m developing unit tests. I use this set to parameterize tests as discussed in Tabular Tests. Lately, I’ve been trying to use the Eclipse refactorings as much as possible, thinking that the automation would be less likely to break things than if I made the changes manually. It’s also interesting to make code changes without doing any typing. The set consists of:

  1. Extract Local Variable (possibly multiple times).
  2. Extract Method
  3. Inline the local variables created in step 1.
So let’s suppose I’m about to start work on a new method. I like to start with a brain dump of most, if not all of the test cases. This is to get me thinking about the desired behavior and also because it clears my head, so that when I start writing the code, I can focus. Once I’m ready, I start with a simple case.

Let’s take the Range.equals() example from the previous blog. My brain dump would have resulted in a table like this:

Min Max Equals?
same same true
different same false
same different false

Yes, I would have jotted down other, exceptional cases for things like null and a non-Range object, but that’s out of scope here. For this blog, I’m just concentrated on comparing Range objects. I start with the positive case and come up with something like:

   private static final int MIN = 1;
   private static final int MAX = 7;

   private Range range = new Range(MIN, MAX);


   public void testEqualsAnotherRange() {
      Range anotherRange = new Range(MIN, MAX);
      assertEquals(range, anotherRange);
   }



To be honest, I probably would have had the MIN and MAX constants hardcoded in there, but at some point, I would have performed Extract Constant to come up with the above.

After I get that test to pass (by returning true), I’m ready to write the next test. I want to pick something simple again (of course in this example they’re all simple). But before I write the next test, I do something that is probably considered cheating by the hardcore test-driven development crowd. I’m not too concerned because I know from experience that I’m going to end up with tabular tests. In fact, the table is specified above!  So here goes the set of refactorings:

I first Extract Local Variable on MIN and MAX:

   public void testEqualsAnotherRange() {
      int testMin = MIN;
      int testMax = MAX;

      Range anotherRange = new Range(testMin, testMax);
      assertEquals(range, anotherRange);
   }



Now I know from my table that I need a boolean for my expected equals. There's no simple Eclipse refactoring so I manually make the change:


   public void testEqualsAnotherRange() {
      int testMin = MIN;
      int testMax = MAX;
      boolean expectedEquals = true;
      Range anotherRange = new Range(testMin, testMax);
      assertEquals(expectedEquals, range.equals(anotherRange));
   }



I highlight the last two lines and perform Extract Method:

   public void testEqualsAnotherRange() {
       int testMin = MIN;
       int testMax = MAX;
       boolean expectedEquals = true;
       testEqualsAnotherRange(testMin, testMax, expectedEquals);
   }

   private void testEqualsAnotherRange(int testMin, int testMax,
           boolean expectedEquals) {
       Range anotherRange = new Range(testMin, testMax);
       assertEquals(expectedEquals, range.equals(anotherRange));
   }



I want to get back to what it looks like in the table, so I Inline each local variable:

   public void testEqualsAnotherRange() {
      testEqualsAnotherRange(MIN, MAX, true);
   }

   private void testEqualsAnotherRange(int testMin, int testMax,
         boolean expectedEquals) {
      Range anotherRange = new Range(testMin, testMax);
      assertEquals(expectedEquals, range.equals(anotherRange));
   }



I just created the first row of the table by applying the set of refactorings that is the topic of this blog. At this point, I usually look at the parameterized method and make sure I’m happy with the signature. I’ve already done most of this when I extracted the method, but Eclipse doesn’t let me make the method static, if that’s appropriate (in this case it’s not). Eclipse also lists all the checked exceptions, which for tests, I’d rather generalize to Exception.  This doesn't apply in this case either.

During this set of refactorings, I would have ran the tests after a change or two just to make sure I didn’t break anything. I also do that because it feels good to get that green bar.

This is a pretty trivial example. Most of the time much more is going on in the parameterized method, but this still shows the removal of duplication.

I could have made these changes manually, but like I mentioned above, I like the automation; I’m less likely to break things.

Okay, I can write the next test now:
   public void testEqualsAnotherRange() {
      testEqualsAnotherRange(MIN,     MAX, true);
      testEqualsAnotherRange(MIN - 1, MAX, false);
   }


I’ve got the red bar. Time to make it green.