iTunes Export Scala 0.1 Released

I am launching a new project: iTunesExport Scala. This is a port of the original .Net application to Scala. Scala is a hybrid Object Oriented/Functional language that compiles to Java class files and executes on the JVM.

I am launching this port to enable Mac OS X users to access iTunes Export features. The original iTunes Export application was written in .Net and does not run on Mac OS X.

The project home page is here: http://www.ericdaugherty.com/dev/itunesexport/scala/

The project is hosted at Google Code. You can check out the Google Code project home page if you want to browse the source tree or track issues, etc.

This is my first Scala application so I would appreciate any feedback on the source code. My goal is to continue to add features until parity is reached between this and the original.

Maven versus Ant

The Java world has two popular build tools, Ant and Maven. I've been using Ant for as long as I can remember. Maven is a newer tool aimed at addressing some of the pains of Ant and also providing an more complete experience.

For a long time I avoided Maven. I tried out Maven 1.0 early on and was frustrated with the lack of control and underdeveloped eco-system. With its dependency managment system, dealing with dependencies that are not in public repositories is annoying, and early on most libraries didn't have versions in the public repositories.

I'm working on a Scala port of my iTunes Export application and I had to decide how I wanted to build the project. Here are the pros/cons I saw:

Ant
  • Well Known - I know how it works, and there are tons of examples
  • Malleable - I can make it do what I want. I never have to fight the tool to make it work the way i want.
Maven
  • Less boilerplate code - Maven provide you many features, like compling and assembling packages 'for free'.
  • Dependency Management - Project dependencies are automatically downloaded from public repositories and your releases can be uploaded to public repositories for distribution.
  • Its the new Black - It is the current 'standard'.
For iTunesExport Scala I'm using Ant, although I am providing a Maven pom file as well for developers that prefer Maven. The basic Maven project was easy to setup, but I found myself struggling to make other things work, like unit testing (It doesn't work out of the box with ScalaTest) and packaging. In Ant, I already have predefined targets for most of what I want, and it takes just a few seconds to tweak to my exact desires.

For larger projects, I think Maven is worthwhile. The dependency managment is important as projects grow and the Maven ecosystem is large and growing. For small projects, I'm sticking with Ant.

iTunes XML Parsing - .Net vs. Scala - UPDATE

In an earlier post I compared a .Net and Scala implementation of an iTunes Music Library XML parser. The Scala version took a long time to 'load the XML file from disk'. Since then I realized that the default behavior of the parsers is to validate the Schema. I had disabled this behavior in the .Net version but not the Scala/Java version. Much of the 3 seconds to 'load' the XML is really an external HTTP request to download the xsd file and perform the validation. However, using the Scala library I do not see an easy way to disable the Schema validation (Under the covers the Scala library is delegating to existing Java APIs).

Oracle buys Sun?

Oracle and Sun announced an agreement for Oracle to acquire Sun. What do I think?

It's better than IBM. And it is aimed at IBM.

IBM and Sun seemed to have a lot of overlap. Certainly IBM has a well defined hardware business, and acquiring Sun's hardware business doesn't really help. IBM is heavily invested in Java so I'm sure they would have enjoyed more influence over its growth and direction, but it didn't really seem compelling.

Oracle on the other hand puts the final piece of the puzzle together. Oracle is now a credible Enterprise IT partner. With the addition of Sun's hardware business, Oracle can compete with IBM at nearly every level. If I'm a large company, I can turn to Oracle for:

Storage Solutions
Servers
Database
Enterprise Software Packages
Custom Enterprise Software Development (Java, WebLogic, etc.)

If you are a large company, you can either hand your checkbook over to Oracle or IBM. Or Microsoft+HP.

As a user of Java and MySQL, I'm hopeful that they will at least continue on as is and hopefully improve. I don't expect any real changes in either for a while though.

iTunes XML Parsing - .Net vs. Scala

As part of my iTunesExport utility I wrote a (C#.Net) module to parse the iTunes XML file and provide two simple collections, Playlists and Tracks. As part of my effort to learn Scala, I rewrote this module in Scala. While the libraries don't provide the exact same interface and functinality, they are effectively the same. The stats:

(Physical) Line of Code count:

.Net: 459
Scala: 226

The Scala version accomplished the same functionality in 1/2 the lines of code. While LOC count is a bit arbitrary, I think it is an important point. The Scala code is much more concise, but maintains or even improves the readability (if you have a reasonable familiarity with Scala). In this specific case Scala's handling of public properties provided a big reduction. Ex:

Scala:

val trackId: Double = ...
.Net:

private string id;

public string Id
{
get{ return id; }
}
While it may seem that this provides much of the savings, the Scala version provides a much more verbose toString method, so some of the per-property savings is actually understated in this comparison.

While developing I did find myself consistently following the pattern:
  • Identify logical functionality I wanted to extract into a helper method
  • Writing code for helper method
  • Realize it was just a single line and moving back inline
Simply put, I found the structure and features of the Scala language very well suited to write high level understandable code that performed the same functionality in fewer lines of code, compared to C# or Java.

Performance was something else. As a functional language Scala is well suited to be a scalable language. It also runs on the JVM, benefiting from a significant amount of work building a high performance virtual environment. That said, this implemenation was purely single threaded. The initial results were somewhat surprising:

Scala 4,800 ms
.Net 650 ms

That seemed like a pretty significant difference, so I took a deeper look. The Scala library to load the XML file from disk took nearly 3 seconds! This is a huge hit and accounts for much of the performance difference. The remining functionality took ~1,800 ms, nearly 3x the entire .Net solution. Where does the time go?

~3000 ms - reading the XML file from disk using Scala's XML library
~100 ms - parsing the main library attributes.
~1700 ms - parsing the tracks
~60 ms - parsing the playlists

Clearly loading the file and parsing the tracks are the biggest targets. I'll leave loading the XML out of scope as it is a Scala library and focus on the tracks. For each of entities (Library, Playlist, Track) I use a Trait that parses the plist XML into a Map and then assign the properties from the map. For each track, parsing the XML into a Map took 1 ms or less, but assigning the variables from the map (and doing any type conversions) took up to 4 ms but more commonly took 1 ms or less. At the single ms level the simple act of measuring introduces a significant impact, so I'm not sure these numbers are meaningful.

With over 2600 tracks in the library, even spending 1 ms per track is forever (and indeed the average time per track without the measurement was closer to .65 ms). The additional complexity introduced by my Scala approach caused a significantly slower execution time. In the end, clean code is not necessarily fast code.

That said, this test does not really hit on the strength of Scala, which is scaling multi-threaded environments. With less than 8 hours experience writing Scala, I'm sure my code is far from efficient.

Update: I realized that the default behavior of the parsers is to validate the Schema. I've disabled this behavior in the .Net version but not the Scala/Java version. Much of the 3 seconds to 'load' the XML is really an external HTTP request and the validation. However, using the Scala library I do not see an easy way to disable the Schema validation.

Apple's plist XML (Properties List) is PAINFUL

I started poking around with Scala and tried to re-implement the iTunes Music Library parser I wrote in .Net as a Scala library to give me a real problem to solve. This effort reminded me of how painful Apple's plists are to work with.

Apple saves the iTunes Library information in a binary file and as well as an XML file using its plist format. It is great that the provide the XML data that enables tools like iTunesExport, but they certainly could have made working with the XML easier.

Simply put, they store key value pairs as:
<key>Artist</key><string>U2</string>
Three is no clean way to associate the 'value' with a 'key' other than the order of the nodes in XML. This makes things like XPath parsing very difficult.

For a more in depth explanation, read this post.

I was able to get the Scala library to parse the plist but it wasn't as clean as I hoped. If I stay motivated I may finish up a Scala version of the tool to provide a version that will work on OS X (Scala compiles down to Java classes).

Google Provide Personal Transportation Device

Google posted an interesting 6 minute video tour of their new container data center. It's a reasonably interesting video, but my favorite part is the "Google provided personal transportation device".

The inside views of the containers are interesting. Obviously all the equipment here is tailored for maximum efficiency.