Archive for the ‘Programming’ Category

5 Epiphanies in Software Development

Monday, November 26th, 2007

Cedric’s post regarding software headaches got me thinking about the more pleasant learning experiences I have had in the last 15 or so years I have been writing code (for real machines not Species and STs). So I have pulled together a short list of Epiphanies that most mortal developers go through sooner or later.

  • Object Orientation: now on the face of it OO isn’t that complicated, you create a set of classes that represent both real world objects and behavioral entities. This is a simple enough concept however it take a while for it to click, it takes a while for you to understand why that is a good thing
  • Functional Programming: who’d have thought that Lambda Calculus wouldn’t be intuitive? Seriously though, Functional Programming is a category of programming languages that is essentially based on mathematic functions and their evaluation without the need for state. This isn’t such an easy concept to grasp like OO and even when you do the benefits can be non-obvious until that epiphany moment.
  • Recursion: is a very powerful thing, used well and it can achieve great things, used badly and it will crash your program.
  • Do one thing well: certain companies are famous for their feature rich products. People love their features, product managers get paid for adding extra features and features are what (lazy) marketers concentrate on. However more features means unecessary complexity of code and complex code leads to bugs. More features means that the important features get less attention. The trick is to do one thing well and make sure your code, program, service can interoperate with others that also do one thing well.
  • Proprietary Software is only good for the vendor: if you give a company that you do not control, monopoly provision over the support of a piece of software, that you rely on, you expose yourself to significant risk. Unless you have a very big stick to hit them with they will not be responsive to your needs (in my experience even if you do have a very big stick they still won’t be very responsive). If you need to fix a bug in a piece of software and you need to fix it immediately then you need the source code - end of story. Also you probably don’t need all the features they are offering (see Do one thing well).

I’m sure there are more… would love to hear other peoples epiphanies

PS I almost titled this 101 Epiphanies in Software Development, but thought better of it

Adam Smith vs. Fred Brooks

Thursday, June 28th, 2007

It seems to me much of the two essays, Surgical Team and Mythical Man Month in Fred Brooks important work are elaborations on Adam Smith’s theories of Specialisation and Division of Labour and production line optimisation but applied specifically to software development. Which is excellent and worthwhile cause; as software engineering still lacks the rigour associated with other engineering professions.

I need to re-read both the great men’s works but I think there is millage in applying more of Adam Smith’s principles to software engineering.

See:

JRuby developers join Sun

Monday, September 11th, 2006

The two lead developers on the Open Source project JRuby have been hired by Sun. Charles Nutter and Thomas Enebo will be working on JRuby fulltime now at Sun.

Sun seem to be putting a lot of effort into supporting other languages on the VM and specifically compiling to Java bytecode. It’s a truely interesting development although Visual Basic on the VM still makes me shudder.

Charles has some more on his blog.

(via Tor Norbye)

Technorati Tags: , ,

tabs vs spaces

Tuesday, July 4th, 2006

I for one am sick of the debate over whether tabs or spaces should be used to indent code (there are more important things to worry about). We all want the same thing and that is for code to format right no matter what editor we are using. It looks like someone has come up with a simple yet brilliant solution for this, and whats more they have code (which is open source).

Link: Elastic tabstops (via Joel)

Installing Java on Gentoo

Monday, July 3rd, 2006

This is the message I get when I naively try to install Sun’s J2SE on Gentoo. Now I thought that Sun had gone to lengths to reach out and make distributing Java easier for Linux Distros or was that just Ubuntu, either way it makes me pissed off every time I have to download Java separately.



>>> Emerging (4 of 4) dev-java/sun-jdk-1.5.0.07 to /
!!! jdk-1_5_0_07-linux-i586.bin not found in /usr/portage/distfiles

!!! dev-java/sun-jdk-1.5.0.07 has fetch restriction turned on.
!!! This probably means that this ebuild's files must be downloaded
!!! manually. See the comments in the ebuild for more information.

* Please download jdk-1_5_0_07-linux-i586.bin from:
* http://javashoplm.sun.com/ECom/docs/Welcome.jsp[truncated]
* Select the Linux self-extracting file
* and move it to /usr/portage/distfiles

Update: Looks like Gentoo and Sun have been talking and this restriction will be removed soon! Fantastic news. Now if they just open source Java… (via Kit Peters)

Technorati Tags: , , , ,

Does Sun have any grace?

Thursday, June 22nd, 2006

Will Sun have the grace to reverse their bad decision?

Technorati Tags: , ,

YAGNI Development Assistant

Friday, May 26th, 2006

I know a few people who could use this, I’m not usually one to post short posts with just a link but this one made me laugh so much with recognition I’ll be sneezing coffee for the rest of the day.

IDE Feature Request: The Yagni Development Assistant

FYI: YAGNI: You Aren’t Going to Need It

Pick a format any format

Friday, May 26th, 2006

So it seems everyone is saying it now, christ, people are even meta saying it pick one feed format and stick with it. All the aggregators worth their salt support all of them so it really doesn’t matter. I know when I add a feed to Bloglines I hate it when I get presented with 4 versions of the same thing just delivered to a different specification.

As the old ironic adage goes ‘the great thing about standards is that there are so many to choose from’. Having just reimplemented The Humor Archives Feed using Rome I fully sympathise with Sam, Nick and Dion’s position. My first implementation of a feed simply cranked out RSS 2.0 directly using JDOM, however with Rome one is encouraged to use an abstraction to populate the feed and then specify the feed type. Once this is done the specific feed specification compliant XML is cranked out to a Writer. This is all very good, however, it takes the minimal subset approach, so for example on an RSS 2.0 Feed type it is impossible without going to the specific implementation to set the TTL field as it isn’t a minimal subset feature of feeds.

I looked at using just the RSS 2.0 beans within the Rome framework but couldn’t see any obvious way to write it to an output stream or writer - I only look briefly however, if anyone knows please let me know. As I only want RSS 2.0 perhaps Rome isn’t for me.

VB6 for the Java Platform

Friday, May 19th, 2006

No I’m not joking, sun have actually gone and done this! Some screenshots, but very little in the way of justification can be found here. I’m hoping it’s to try and get people away from VB6 by providing a migration path to a sensible platform, but who knows.

I can already predict what will happen. I’ll be working away happily on a client application and suddenly the Java/Groovy/JRuby/Whatever code will drop out into some VB6 code that someone thought would be a good idea to copy/paste across from their spreadsheet macro. I’ll then have to spend hours fighting my way though a never refactored, organically fudged wad of VB6 spoodge. All we need is more excuses for people to hang on to their sacred code.

Reflective Access to Parameter Names

Friday, April 7th, 2006

On the Mustang JSR there has been a lot of lively discussion on the benefits and drawbacks of providing reflective access to method parameter names. Currently in Java using the standard APIs there is no way to access the parameter names of a compiled java classes methods. In fact unless a Java class is complied using the -g flag with javac then the parameter names are completely discarded from the resultant bytecode. If the class is compiled with -g then they are preserved in bytecode and can be access by bytecode reading libraries (this is what IDE sometimes do in order to present more useful method information when source and javadoc are absent).

There have been calls prompted by this RFE for the addition of reflective methods to the java.lang.reflect.Method class to access the names of parameters as specified by the developer at coding time. These calls have been made by some members of JSR 270. To try and summaries a very long thread here are the arguments for and against:

  • code that used the parameter names would become dependent on that parameter name remaining the same across revisions

Actually that’s it as far as I can see, if anyone can think of others please let me know.

Arguments for:

  • Use for dynamic language support in java; many dynamic languages such as Python and Ruby benefit from having access to parameter names for certainly language features such as method calls like this (pseudo code):

    calculateArea(width := 10, height: = 20):

    This is cool because you don’t need to remember which way around the parameters go and it is also more readable.

  • Logging: if you have reflective access to the parameter names then you can record them for the purposes of logging.
  • IDE support: at the moment if you only have access to the compiled binary that hasn’t been compiled with the -g option then the ide will display generated variable names such as string1, int0 and so on which don’t aid the developer at all
  • External add-ons to a Java application such as a rules engine could certainly make use of parameter names, enabling the user to define rules and validation using the same names as the methods parameters, this without having access to the source or specially compiled binaries
  • AOP frameworks could certainly use these features too
  • Binding of Java classes to webservice by the generation of WSDL from classes files would create much more meaningful/readable WSDL with the parameter names included
  • It will be easier to add features such as closures to Java in the future using a well understood method of reflective parameter name access with the appropriate syntatic sugar

I would been keen to have this features in Java SE but as far as I’m aware it won’t be in Mustang (Java SE 6) but will be put forward as a recommendation to the Dolphin (Java SE 7) JSR

The Language Arms Race

Monday, March 13th, 2006

It seems at the moment that there is an arms race on with language specifications. I’m particularly referring to Java and C# but there are languages pushing them (the spec leads) on to include more and more language features. C# is introducing a raft of new features in C# 3.0, this is old news and the fact that Java introduced a set new features in J2SE 5.0 is even older news. I don’t think anyone would disagree that Java is behind on this race and it was really only the emergence of features such as autoboxing and generics in C# that push the guys at Sun to introduce these features in Java.

Now there is even more momentum to introduce even more language features to Java, things such as closures. Now from a selfish point of view I would love closures and many more features in the language when I’m coding. This is because I have been writing code now for almost 10 years and coding Java for about 8 years, plus working with other languages that have these features (Ruby, Python and even Perl) and so more features doesn’t pose much of a learning curve to me. However what it will do is make Java more inaccessible to people new to coding. It will also introduce more ways to do one things which for a commercial power house and main stay language is almost certainly a bad thing.

I have dealt with many code bases produced by inexperience or misguided developers throughout my career and one thing that has saved these code bases from unmitigated disaster is the fact that there is only a few ways for achieving things in Java. It keeps the developer under control, stops them straying too far from the path. Also for those who are experienced developers the simplicity and lack of features is also a good thing. A concept (program) based upon a few simple rules (features) is often more elegant than one based on many complex ones. I think studying mathematics has given me an appreciation of the beauty in keeping things simple and this principle apply just as well to coding and language specifications.

I’m, after intial hesitation, now keen to add scripting support to Java, this will hopefully stem the tide of additions to the Java language features and keep it clean. Obviously the use of these languages will need to be governed correctly to ensure projects don’t turn into the tower of babel but in the end I think the Java platform if not the language will be a richer technology.

Oh and C# can add what they want because I’ll be able to code in JRuby which is nice ;)

Ivy Dependency Management

Thursday, March 2nd, 2006

Came across Ivy recently, which seems to have complete slipped under my radar until now. Ivy is an incredibly configurable dependency management tool. Now hold on it’s not one of these stupid head in the clouds tools it is an actual practical tool for managing extremely complex sets of dependencies.

It hooks in with ant or runs standalone and allows the declarative statement of your Java (or anything else actually, Ruby, .NET, C++ whatever) dependencies. It resolves these dependencies from repositories that could be on your local machine, in a subversion repository or even in one of the main Maven repositories.

So why is this tool better than Maven? Well it comes from the old Unix school of thought, do one thing well. It does dependency management extremely well and it doesn’t worry about anything else. The other stuff is left to applications better suited to that job (mostly ant actually). It can be used straight out of the box for modest requirements or configured to totally custom and/or complex requirements. [TBH I really don’t like Maven2 it’s: poorly documented, lacks flexibility and it’s just too much work to get going properly.]

Ivy also does a nice graphs for your dependencies along with HTML reports which help plan new releases of your dependent projects.

It’s documentation could be a little better but hey you can’t have everything.

Java IDEs and Web Development

Saturday, February 11th, 2006

Why does every Java IDE that supports Web development force the structure of your coding artifacts on disk to be the same as a war file? To my mind it is an extremely un-user friendly structure. I’ve just been looking at the latest copy of Webtools for the Eclipse IDE and it is no different.

Take a look at the attached screen shot to see what I mean.

web-inf

Now not only does it force me to keep everything so it is diplayed in the horrible war file format, but because of this is doesn’t leave me anyway to organise my extraneous artifacts. I mean what the hell does WEB-INF mean to anyone intuatively? Do I have to dump all my hibernate configuration with my log4j configuration with my application configuration and my urlfilter configuration, quite quickly we are going to end up with a mess of crap in the classes directory. Where as I would like to have a configuration directory with subdirectory for each component that needs configuration files.

Where do I put my unit tests? Where do I put classes and jars that I depend upon at compile time but not at runtime, for example JAXB compiler jar files. I could have them as “external jars” but that stops my web project being self-contained, I like my projects be self contained. I want to checkout from source control and be able to compile without spending two weeks tracking down dependancies.

The solutions to my mind is fairly simple, some form of mapping needs to be allowed that enables you to declaritively instruct the IDE what should be where for the war but still allow you to structure things in the way you wish. This is what I usually do with ant but Eclipse isn’t yet clever enough to understand my ant file.

So although I’ll be evaluating Webtools some more I’m immediately turned off by it.

Waterfall 2006

Monday, February 6th, 2006

<sarcasm>
As we all know the Waterfall approach is probably the best project delivery mechanism ever conceived and it is ideally suited to custom software projects.
</sarcasm>

More in this vein can be found on the Waterfall 2006 homepage.

As a bit of fun send this site to the some old guard PMs and see how long it takes them to realise it’s a spoof.

toevery ‘loop’

Sunday, January 22nd, 2006

As multicore processors become more prevalent it seems to me that developers are going to have to get smarter toward writing the applications to exploit this fact. Most programming languages do work well for parallel processing simple tasks (other than the ones used on Supercomputers).

However it has occured to me that it wouldn’t be too hard to add some basic feature to languages to denote that things can be done at the sametime.

For example in Java (5.0) I could write a for loop as follows:


for (Customers : customer){
   customer.giveDiscount();
}

However wouldn’t it be good if I could run this in parallel (where possible, the VM can work out when) by writing this.


toevery (Customer : customer){
   customer.giveDiscount();
}

In a number of cases the data structures being iterated through in classic for loops don’t guarentee order anyway and so this new ‘loop’ would be better. This construct should be a signal to the VM that the coder doesn’t mind if the task in the code block executes in parallel, it makes no odds to the use case he is coding. The VM should monitor where there is resouce available which would be best spent performing this task.

Just a thought.

Syntactical Sugar

Sunday, October 30th, 2005

Now I like Ruby as much as the next guy, I really do, but there are some advocates out there that give it a bad name. This blog entry for example seems to imply that the fewer lines of code a particular method can be written in the better the language. Whilst I’m all for brevity in code I also like my code to be readable. The first piece of code whilst terrible code could be figured out by a junior developer whilst the second piece written in Ruby is a little more cryptic IMO. Whilst it’s hard for me to judge as I have been programming both languages for sometime I reckon anyone who has programmed in any language would be able to figure out the Java code whilst I’m not so sure about the Ruby. I certainly wouldn’t implement this method in this way in Ruby (and for that matter I wouldn’t write the Java the way the Java is written).

Apparently someone called Fred wrote that lines of code is proportional to time to develop a project and to overall maintenance costs. Now I can’t find a copy of the paper but I’m guess he is using SLOC (statement lines of code) as an indicator of complexity within the same language and with some assumptions about levels of clarity. So if a project has 2000 line of code and another has 4000 lines of code built in the same technology with the same level of readability then yes it take twice as long to develop and maintain. This is somewhat of a no brainer.

Anyway for fun I thought I would see how many lines of code in Java I could write the function in… I got two (excluding method sig and braces and unnecessary one before the System.out.print call so that it would fit on the page):


static void fmtString(String format, String number){
  int k = 0;
  for(int i = 0; i < format.length(); i++)
    System.out.print((format.charAt(i)+"").equals("#") ? number.charAt(k++) : format.charAt(i));
}

Now this is horrible and completely unmaintainable but according to the logic of the above mentioned blog post this will reduce my delivery time and maintenance costs - BOLLOCKS.

Whilst I believe in certain situations Ruby can bring down delivery and maintenance cost as well as delivery time it is no silver bullet and SLOC has nothing to do with anything. Ruby is not always the right solution and Java is not always the right solution.

State of the Monkey

Wednesday, October 5th, 2005

Here is an interesting and funny satire on the state of Java.

Do I agree with it? A bit, quite a lot actually, although I don’t think a whole new programming language is the solution. Maybe I’m too attached to Java, maybe I’m falling foul of A Dozen Ways To Sustain Irrational Technology Selections.

I must make a note to checkout Ruby but somehow I can’t see it being used on the commercial projects I’m involved with in the near term.

Some Hibernate Optimization Rules of Thumb

Sunday, September 25th, 2005

So I decided to have a go at further optimizing The Humor Archive Hibernate code. First thing I did was to investigate Query Caching. Query Caching is different to the 2nd level cache in Hibernate (which defaults to EHCache which I have blogged about before) in that the Query Cache keeps a store of previous queries and the results return. Internally it is not dissimilar to a HashMap keyed on the SQL Query String and valued on the object graph returned. Although it is slightly cleverer than this as it knows when to update stored object graphs when other Hibernate queries modify the object within said graph.

So I implemented Query Cacheing for some of the more expensive queries on the site and set some timers up on the code to let me know how long they were taking. One expensive query dropped from taking 500 millis down to about 50 millis. This was an enormous win for me.

Now, you maybe thinking that 500 millis ‘that’s a long query’ and you would be right; basically the category pages where returning all the articles within that category (100-500 articles), not only that but as the article–>category relationship was many-to-many, so an article can be in many categories and a category can contain many articles. Many-to-many relationships are notoriously expensive due the fact that there is potentially a cartesian of a cartesian (indexing avoids this) but still it’s not cheap. Compounding this, I must have been in a hurry when I wrote the query, as it used a sub-query. Well actually it was using the elements ‘function’ of Hibernates HQL, which in the PostgreSQL dialect manifests itself as a subquery.

Rewriting this query to use the ‘left join fetch’ mechanism sped the query up from around 1000 millis (yeah, I know) to a more reasonable 150 millis (still too slow in my book).

So back to the huge amount of articles being returned. I decided, or rather got around to, implementing pagination (pagination is putting a list of things onto many pages and listing the page numbers at the bottom - like search engines do - goooooogle). The pain with pagination is you need to know: the number of pages, the page you are on, whether its the first page or that last page and the number of results on a page. To know the number of pages you need to find the floor of the number of results divided by the number of results per page. You could implement this using two queries, one to count the number of results and one to return the page (using offsets and limits in PostgreSQL). However it’s possible you use a Scrollable result set to do this - performance is about the same as two queries but code complexity is lower. This scheme I implemented and performance improved again! Now we were down to just 10-20 millis for this query.

Interestingly, the list of articles on the homepage don’t need to be joined with the categories and so the query is a lot simpler. They do however need to contain attachment (a one-to-many relationship). Firstly, I though that the left join fetch would give me a speed up - it did with the category queries. However it actually slowed the query down. To understand why we must understand how Hibernate works. If we have an object that has an associate list as a property hibernate by default queries the object and then does a separate query for each item in the associated list of objects. So if we have an article with 5 attachments it will do a query to return the article and the ids of all the attachments and then it will do 5 queries for each attachment.

Now this behaviour can be circumvented by using the ‘left join fetch’ mechanism mention above. This way Hibernate only does one query with, you’ve guessed it, a left join; this will have only one round trip to the database (network IO is the bottleneck usually). So why isn’t this faster than the default multi query method. Well, as I was using the Query Cache it seems that a large query with a large object graph (i.e. the left join fetch) was slower to be drawn from the Query Cache than the a set of smaller queries. My empirical, unscientific, evidence suggests a 2x difference when using the Query Cache.

So in short here are the rules of thumb:

- If you have often repeated queries use the query cache
- If you are using the query cache and you have a one-to-many you will probably be best not using the ‘left join fetch’
- If you have a many-to-many then try the left join fetch method, it should be an improvement even with the query cache
- Scrollable results sets don’t have much advantage over two queries

As with any optimization work your mileage will vary. All applications are different but I hope this has give you some ideas of what you can play with.

This was meant to be a short entry and look what happened - a long rambly entry with no firm conclusion.

The Humor Archives

Saturday, March 12th, 2005

So I have finally got around to updating my oldest (surviving) website The Humor Archives. This website has been around in some shape or form for 10 years and has up until two days ago been running on the same back end code; written in Perl with a hand rolled database (yes, I wrote my own database for it).

So what did I use to rewrite the site? Well to understand my decisions we have to look first at the none functional requirements of the site. It must be able to run on a single processor Pentium III with 384Mb of main memory, not only that but the sites operation must not negatively impact the other 20 or so sites running on the same box! Further the redevelopment must not take me more that a week of evenings to complete. As a final requirement all the sites URL must remain the same (I’ll come back to this).

OK, these requirements seem fairly doable but what volume of traffic do we have to serve? To be honest it’s not a super high traffic website; the figures break down like this: daily visitors: 6-7,000; page requests: 70-80,000. These are averages and so as a quick thumb in the air guess I would say about 30 maybe 40 concurrent user sessions at peak.

So with these requirements I obviously choose Java as the base technology, firstly because I know it well but more importantly Java provides me with a means of limiting the impact of this site on the overall performance the physical machine. By limiting heap size and various other jvm teaks I can get the most out of the box by pooling and reusing resources. The pooling and reuse starts at Sun J2SE 1.5’s super cool garbage collection and general optimisation and works up through Apache Commons DBHP to Hibernate’s EH Cache and beyond - it all works transparently and it works well.

First of all I started with a proof of concept using Tapestry (3.0.1), which I have to say I really liked, the component model is fantastic and something that I hope other frameworks will learn from. However there were a number of things I wasn’t completely happy with. Firstly I found the configuration files to be overly complex and under-ly documented, secondly I didn’t like loosing control of some of the lower level concerns such as session management, creation and so forth. For my site to be truely efficient I aimed to have it truely stateless but a number of Tapestry components were using sessions. Secondly I was having to seriously hack the framework to get ‘friendly urls’ or rather the same urls as the old site. Getting the old urls to work was as simple as configuring and installing urlrewrite the problem was getting tapestry to generate within links the old urls. Tapestry for very noble reasons manages all url creation which is a good thing in most cases but not for mine. I understand that Tapestry 3.1 will bring in more support of ‘friendly urls’ but that is currently in Alpha.

I took a look at Spring after my adventure with Tapestry but found the documentation to be lacking and so abandoned that fairly quickly - I also wasn’t liking the monolithic configuration file it seems to favour. So in the interests of timeliness I converted the tapestry site to Struts (you gotta love the ASF). I did think about using JSF but had already spent two evenings farting around with Tapestry and Spring and wanted to get something up and running ASAP.

So to conclude (about time it has take almost as long to write this blog as it did to write the site) these are the technologies in use (roughly from the bottom of the stack to the top):

Oh and I also managed to use CSS for all positioning which was sweet.

Nutch patches accepted

Tuesday, August 31st, 2004

I’m currently spending some of my free time developing patches for Nutch an OSS search engine. Recently a number of these patches were accepted into the main line of development. The patches I submitted do various things from parsing MS Word properties such as title and author to allowing MP3 audio files to be searched. There is still one outstanding patch of mine to be accepted and that is a rewrite of the HTTP fetching routines (the protocol side of the spider/robot).

Nutch is a pretty exciting project in my opinion with big players like Yahoo and Tim O’Reilly (of O’Reilly books fame) participating. Yahoo have a demo of Nutch on their site.