Feb 22, 2011

Running Neo4j on Azure

For those of you who don’t know what I’m talking about, Neo4j is a graph DB, which is great for dealing with certain types of data such as relationships between people (I find it ironic that relational databases aren’t great for this) and it’s perfect for the project I’m working on at the moment.

However this project also needs to be deployable on Azure, which means if I want a graph DB for storing data, I need something running on Azure.  I’d originally looked at Sones for doing this since it was the only one around that I knew could run on Azure, but my preference was to run Neo4j instead, because I know it a little better, and because I know that the community around it is quite helpful.

The great news is that as of just recently (a week ago at time of writing) Neo4j can now run on Azure! Happy days!! and “Perfect Timing”™ for my project.  I just had to try it out.  P.S. Feel free to go ahead and read the Neo4j blog post because it has a basic explanation of how to set it up, has some interesting notes on the implementation and also talks about the roadmap ahead.

Here’s a simple step-by-step on how to get Neo4j running on you local Azure development environment:

1. Download the Neo4j software from http://neo4j.org/download/.  I grabbed the 1.3.M02 (windows) version.

2. You’ll also need to download the Neo4j Azure project, which is linked to from the Neo4j blog post.

3. I’m assuming you have an up to date Java runtime locally, so go to your program files folder and zip up the jre6 folder.  Call it something creative like jre6.zip :-)

4. If you haven’t already done so I’d also suggest you get yourself a utility for managing Azure blob storage.  I’m using the Azure Blob Studio 2011 Visual Studio Extension.

5. Spin up Visual Studio and load the Neo4j Azure solution.  Make sure the Azure project is the startup project

image

6. Next, upload both the Ne04j and JRE zip files you have to Azure blob storage.  Note that by default the Azure project looks for a neo4j container, so it’s best to create that container first:

image

7. Now it’s time to spin this thing up!  Press the infamous F5 and wait for a while as Azure gets it act in order, deploys the app and starts everything up for you.  You’ll know things are up and running when a neo4j console app appears on your desktop.  You may also get asked about allowing a windows firewall exception, depending on your machine configuration.

Behind the scenes, the Neo4j Azure app will pull both the java runtime and neo4j zip files down from blob storage, unzip them and then call java to start things up.  As you can imagine, this can take some time.

If you have the local Compute Emulator running, then you should be able to see logging information indicating what’s happening, for example:

image

8. Once everything is up and running you should be able to start a browser up and point it to the administration interface.  On my machine that was http://localhost:5100.  You can check the port number to use by having a look at the top of the log for this:

image

You should then see something like this:

image

That chart is showing data being added to the database.  Just what we want to see!

So that’s it.  The server is up and running and it wasn’t too hard at all!  The Neo team is looking to make the whole process much simpler, but I think this is a great start and it is very welcome indeed!

If you do have problems, check the locations in the settings of the Neo4jServerHost in the Azure project to make sure they match your local values.  You might just need to adjust them to get it working.

P.S. If you have questions, then ask in the Neo4j forums.  I’m not an expert in any of this ;-)  I’m just really pleased to be able to see it running on Azure.

Feb 14, 2011

The Anti-Region Campaign

For the benefit of my twitter followers who know I have an ongoing issue with the abuse of #regions in the code I have to work with, let me take some time to explain why I dislike them so much.  In fact, let this be considered yet another step in my ongoing anti-#region campaign! (similar to the anti-if campaign)

This is not to say #regions aren’t useful, but that they are far too over used, and until people can consume #regions responsibly, prohibition laws should be enacted.

Got an Opinion? O‘course You Do!

Even without reading this post in full I’m sure you already have an opinion and like most people on internet time you’re likely to skim the first paragraph or two and then skip the rest, so to save you time here’s how to leave feedback:

If You Agree with Richard: If you too dislike the abuse of #regions then leave a comment on this post. If there’s enough support I might think about starting a proper anti-region campaign.

If You Think Richard is a Raving Lunatic: If you think I’m over-reacting, slightly unhinged, don’t understand your special needs and why #regions make the world around you glow, or if you count yourself as a #region lover then I would ask that you leave your comments and feedback on this post here instead.

Why Do #Regions Even Exist?

So why in the name of all things good and holy, do #regions exist in the first place?  Don’t they kill kittens?  Aren’t they pure evil?

Let’s have a look at what the MSDN documentation says, shall we?

#region lets you specify a block of code that you can expand or collapse when using the outlining feature of the Visual Studio Code Editor. In longer code files, it is convenient to be able to collapse or hide one or more regions so that you can focus on the part of the file that you are currently working on.

OK.  That seems nice on the surface, but isn’t the code I’m working on a class file?  Don’t I need to see it in context? Hmm.  Let’s have a look at the reasons I don’t like #regions shall we:

Reason 1 – It Hides Code

So, according the the spec, it’s to hide chunks of code to help us focus more on the bits we’re changing.  Do you see the problem with that?  Why would I, as a developer, want to actually see code?  Are you out of your mind!  Code is the last thing I want to see!  I want to write code, not read it!

I should be able to look at a piece of class and get an understanding of it straight away and how it fits in with the rest of the class.  #regions can prevent that from happening.

How can I actually tell what this code is doing? And apologies for the Hungarian notation VB code – I hope it doesn’t burn your eyes too badly!

image

I know there’s something going on in there (the line numbers give me a hint), but can I know at first glance what it is? No chance!  This is a bad thing.

P.S. One valid reason to allow #regions to exist is designer/code generated code.  However the availability of partial classes since Visual Studio 2005 means that a we’ve been able to pull all that designer generated crap out and keep it in a separate file for a long time now.  There’s no need to have that junk cluttering up my own hand crafted junk! #regions are redundant in this case.

Reason 2 – It Hides What You Think Might Be Code

I just love seeing this sort of thing:

image

There’s gotta be something in that code there right?  Something important!

So let’s open the region and have a look?

image

Say what!? There’s nothing useful there at all?! It’s just commented out code?

So the #region is “helping” us by hiding commented out code!  Surely I don’t need a region to help hide that from me, do I?  NO! Just delete the code! Don’t comment it out and wrap it in a #region!  What a great way to waste my time!

Reason 3 – It Hides Nothing At All

I don’t know wether to blame horrible coding standard that mandate the presence of specially names regions for segment class files or not but I’m sure you’ve never seen this little gem:

image

Only to open the #region to see what’s inside and realise there’s nothing there!

image

It’s like being given an empty box for your birthday.  All nicely wrapped up and making you wonder what special surprise is hiding inside only to open it and find that your kid brother has been playing a horrible, cruel joke on you.

What’s even better is when you see a class with all the mandatory regions in it (like the code we saw in Reason 1) and yet when you open all the regions up, there’s just a single static method in there and the rest is blank. Or even better, the whole class is empty!

Regions like this fail completely in their mission to help you focus on just the code you need to see, and instead add only noise and confusion to the understanding-what-the-class-does process, and make you doubt any other #regions you see elsewhere in the codebase.

Reason 4 – It Hides Just One Thing

What about this #region:

image

There’s going to be some really complex code in there, right?  Why else would I have a region there, especially since regions are meant to hide code that might distract me!

Open it up and what do I see?

image

Oh wow! Just as well we used a region to hide a single property!

Again, we’ve added noise to our code base for no reason.

Reason 5 – It Hides Butt-Ugly Code

So what about this situation?  You’re looking through a code base and see the method from hell and it’s associated demonic friends.  We don’t want to see that code do we?  No way! We want to hide that complexity, so let’s wrap it up in a region! Great thinking!

This is what we are really doing with our code: We’re sweeping all that ugly under the carpet.

As a general principle, classes should have a single responsibility and methods should be 20 lines of code or less so they stay readable.  If methods are longer than that then you’re likely working at multiple levels of abstraction in the one method or trying to do too much, and you should consider refactoring your code to improve it’s design and readability.

It’s better to keep the ugly visible so that you can deal with it.  Hiding bad code behind regions only serves to make bad code acceptable and it discourages efforts for improving it.

Reason 6 – It Hides Copy/Paste Efforts

The DRY principle means logic should exist in one place and one place only, not be repeated all over the place, because a change to the logic means finding all the spots that logic exists and fixing them all.  It’s much easier to make a fix once than it is to fix it in multiple places.

If I have a code base covered in regions I can’t tell if one section of code is similar to any other code I’ve seen before, because in all likelihood I’ve not seen it!  It’s probably been hidden away behind #region statements that mask all of the complexity and character of the code, instead making it a big block of beige.

Here’s what code looks like when #regions have been liberally applied to it:

It all looks the same.  And you’re not sure if it’s laughing at you or about to turn on you!

What About Grouping Your Code?

Let’s say you think #regions are great because they allow you to group your code and hide all those messy internal field and property declarations.  If you’ve only got a handful of them, do you really need the #region?  Probably not.  There’s not much to hide.

If you’ve got hundreds of the suckers do you need a #region?  Maybe – though at that point you should be considering the design of your system.  Do you really need a class with that many properties or fields?  Can you alter the design to improve things? Is your class doing too much?  That’s a judgment call you’ll have to make.

My gripe is with #regions is not with the directive in and of itself, but with it’s abuse.  If a #region is being used as I’ve outlined in the reasons above, then it’s time to stop, think about your code and why you’re using a #region and to then do yourself a favour.  Remove the #region and start writing better code instead.

That will be all.

Feb 10, 2011

Presenting at CodeProject’s Agile Virtual Tech Summit

The title does give it away a little I suppose…  I’ll be presenting two sessions at the Agile Virtual Tech Summit run by the Code Project.  I’m quite honoured to be asked to present and for reference in my 2 sessions I’ll be keynoting the Agile track with a talk on Agile from a Developers Perspective, with the other session being about Lessons Learned from Agile Implementations.

You can register for the free event on the site and the summit will get underway on Feb 23rd, 2011 at 12pm ET (for those in Australia that’s a 4AM start on the 24th so don’t forget the coffee!)

Feb 5, 2011

Branch per Story Pattern and TFS

The TFS branching guide is a very detailed run through of the various situations and scenarios under which branching can be performed and the different strategies that exist, however because the guide covers so many scenarios it’s difficult for some people to know what approach they should follow, and when they look at some of the advanced diagrams they freak out at the complexity.

For this post I’m going to keep it simple and just show one approach, the one that I feel is best for agile teams implementing using user stories.  It’s effectively the feature branch approach, and I’ve been tempted not to write this post, however I find that that feature-branch approach can be a little confusing for some people.  “What is a feature?” “Is it multiple stories?” “Is it an epic?” “Is it a bug fix?” “Should a feature live across multiple sprints?” These are all questions I hear and it’s probably because the terminology is a little too generic and becomes open to interpretation and thus misunderstanding.  So, let’s make it easy and use Agile terminology.

Why Story Branches

So before we get into it, what exactly is wrong with working in trunk or using a branch per sprint approach? If you can make it work then there is nothing wrong with that approach, but it does come with some with hidden dangers.

Consider the following situation.  A team accepts 3 product backlog items for a sprint and only completes 2 of those items.  The third story was in progress when the sprint concluded but was far from complete and the team now has a section of unfinished code in their code base.

What should the team do with that unfinished code? Should they go back and comment it out or delete it?  Maybe, but how will they be sure they got it all?  What if some of the changes they were making affected the system architecture and those changes had been used in the implementation of the other two stories? What if the code was also merged to another branch in preparation for a release?

What if they take a different approach and try wrapping all changes in if-blocks during development to try and isolate the changes?  Sure, this can work.  But again it has challenges.  The team needs to maintain a set of variables or pre-processor tokens related to each item that are used to decide if a feature is available or not.  There’s also going to be a large number of if statements that will add noise to the code and in a complex system you can easily imagine how out of control this could get.

What if they just ignore these challenges (or forget the code) and leave the unfinished code as is? At best they have code that is unused and simply adds bloat to the system, at worst it results in the system being released with broken functionality and open security attack vectors that have never been checked or tested.

If we also remind ourselves that sprints should produce production ready, potentially releasable increments of the system, then we really, really don’t want to have work in progress code in the system at sprint conclusion.

For all of these reasons we should ensure our development of a feature is isolated from the rest of the development team and only made generally available when complete and ready for integration testing.

There’s an interesting side effect to taking this approach.  If we develop features in isolation and consider Conway’s law which when paraphrased says that system architectures reflect the way the organisation communicates, then by nature the systems we develop using a branch per story model will likely be architected and designed as componentised and modular systems, mirroring the development approach we are using.

Downsides

What are the downsides, then? Surely the administration overhead goes up and chances of screwing up merges increases, and yes, there is an element of that, but it isn’t as bad as you think.

Administration time goes up because we now have to remember to create a branch, and we have to remember to switch branches when we work on different stories in the same sprint.  Is this really an issue though? Maybe it is.  However maybe it’s a hidden blessing in disguise.  When working in an agile manner we want to minimise wasted effort and keep our work in progress as low as we can. We know that the costs of multi-tasking, working on different stories at the same time, means we’re go slower overall than if we just work on one story at a time because of the mental context switch and yet the temptation to multi-task is high, especially if the stories we work on are somewhat boring.

By following a branch per story strategy we discourage ourselves from multitasking since the simple annoyance of switching to a solution file on a different branch gives us a natural prevention mechanism.  Note that this annoyance doesn’t work with DVCS systems like git and mercurial since branch switching is very quick and easy, but in TFS, SubVersion and similar branches are represented as folders and switching means swapping folders, and thus closing and re-opening the solution.

What about merging? Branching is easy but a great deal of pain can be had in the merging of branches.  This pain is most often encountered when the two branches have diverged significantly.  However, in a branch-per-story model most branches are short lived and thus merging fairly straightforward.  It doesn’t mean merge conflicts can’t occur, just that the merges are likely to have small amounts of conflicts.

Visual Branching Model

So let’s have a look at what we’re doing with the branching model.  For each story in the sprint we work on we’re create a story branch by branching the main integration branch.

Let’s say the team picks up three stories for the sprint.  We create the story branches and the team commences work:

image

Let’s say Story 1 gets completed first.  The code for story 1 is merged back to the integration branch, the integration code is checked and verified and when all the tasks for the story are complete the branch is killed off.

Next, Story 2 is completed so the developers pull from the integration branch, merge the changes, confirm things are OK locally, then push their changes back to the integration branch.  Again, the developers check the code in the integration branch is OK, and when it is the story 2 branch is killed off.

Finally Story 3 is code complete.  Again, the developers pull the branch code down to their story branch, do the merge and make sure it’s OK.  When it looks good they push their changes back to the integration branch, make sure the integration branch is OK and kill of the Story 3 branch.

Pretty simple process, right?

Doing it with TFS

So, enough with the explanations let’s see how we do this with TFS.

Creating branches is easy enough, but how do you name the branches?  We don’t want to try and embed the story name in the branch name since that will increase folder length and makes it far more likely that we’ll hit the file path length limit (yes, there is one and it’s smaller than you think).

The strategy I prefer is simple.  Because we are using TFS and stories are just TFS work items, I like to use the story number as the branch name.  Here’s an example:

image

Now let’s do some work on story 2639 (aka “Make this system more awesome!”) and check it in. Now we need to make sure that we’re on par with the integration branch by doing a merge.  We simply right click the Integration branch and select Merge:

image

Then merge to the appropriate story branch

image

Select the latest version to make sure we’re current and hit the button

image

It appears there are no changes to merge.  Excellent, we’re current.

image

So now we merge our changes from the story to the integration branch (same process, just start by right-clicking the story branch).  Also, don’t forget that merges are made as local changes and still have to be committed, so add a check in comment and do just that.

Wait for the integration build to pass and check the acceptance tests pass to let us know that the story is complete.  Once it is we can now kill the story branch.  To do this, simply right click the branch in source control explorer and select delete!

image

Again the delete’s are pending changes until we check them in.  So again, add a comment and check in the changes.

Our branch hierarchy now changes from this:

image

to this:

image

Nice and simple.  And not a lot of overhead.

As for the other stories we follow the exact same procedure when they are complete:

1. Make the changes needed for the story in the story branch
2. Pull and merge changes from the integration branch into the story branch
3. Deal with any merge conflicts
4. Check that the story is still OK
5. Push and merge the changes back up to the integration branch.  You shouldn’t have any merge conflicts for this merge.
6. Check that the Continuous Integration build passes
7. Run any other tests on the integrated code as required.

And we’re done!

When all stories are complete in the sprint there should be no story branches left. The only time this won’t be true is when the team finished the sprint with incomplete stories.

What About Bugs?

Bugs in an agile team are just the same as stories.  Create a bug fix branch each time you need to fix a bug just the same as we do for stories.

About the only time you might not do this is when you are doing really simple bug fixes such as spelling mistakes where the changes are just cosmetic and there are no real logic or functional changes.

What About the Build Server?

So a question you may have is what do we do with builds and the build server? Answer: it’s up to you.  Given how short lived the story branches are likely to be it’s probably not worth creating build definitions for each story, but again, it’s completely up to you. Regardless of wether you have a build per story or not, you should have a continuous integration build tied to the integration branch.  When each story is complete and merged back to the integration branch a build should be triggered that compiles the code, runs all the unit tests and so forth to confirm that the code in the integration branch is OK and nothing went wrong during the merge.