Shurly

I've recently been looking at Node.js with a view to using it on a few projects I have in mind. This evening I have finished my first Node project, and I thought I'd tell you all about it.

Node is an "Evented I/O platform for V8 JavaScript". It's server-side JavaScript. The thought of that initially made me nervous, but I soon realised that there's no reason for JS to be limited to the client-side. This is the same mental block that makes many people believe PHP should only be used for building web pages when it's more than capable of performing many other roles.

The core concept behind Node is to ensure that nothing blocks. What this means in practice is lots of callbacks and a very thoughtful development process. It was interesting to see my approach to writing code changing such that whether it would block was foremost on my mind.

In order to get to know Node better I decided to rewrite JMP.LI. I've pushed the code up to our github account and called it Shurly (insert Airplane jokes here).

The current service is implemented in PHP and uses a simple disk-based key-value store. It uses no other libraries beyond PHP and runs to about 400 lines of code not including templates. To match this environment I have not used any external libraries save for a module to access MongoDB. Not including templates, the Node version runs to around 600 lines of code.

I normally work very hard to ensure code reuse and elegant design, but given that one of my primary goals was to build a fast service I have made sacrifices in that area. Two things I definitely wanted to retain was some sort of simple templating and the ability to use different data storage backends with minimum fuss. I managed to include both while retaining decent performance.

Templates

There were two issues with performance with the templates. The first is having to load the templates from disk every time they are needed. This was easily solved by preloading them into variables during startup.

I initially had the files being loaded in sequence, with the callback for each triggering the next. The callback for the final file then kicked off the data initialisation. This only happens at startup so I decided to simplify the code and allow the data module to initialise while the files were being loaded. This leaves a very small chance that upon restart people might get served a page containing the default text, like this:

Loading...Loading...Loading
They'd see it three times, once for each of the header, index and footer templates. The window of risk, however, is extremely tiny.

The replacements in the templates are handled by simply replacing all instances of %%var%% with the variable value. Templating at its simplest!

Data providers

The other thing I wanted to include was changeable data providers. This was not rocket science and simply loads the module specified in the configuration file. If a module exports a variable called data which implements init, shortenURL and get and they do the same job as the existing MongoDB module, everything will work fine.

The one major compromise I had to make is to have the data module actually generate the next short URL. While not the end of the world this doesn't sit right with my brain. I could have used a callback to have the main code do it, but this seemed like even more ugly so I've left it as it is.

The main reason for this is that the data module, by necessity, handles the atomic selection of the next short URL to be used. The fact that the current implementation is not actually atomic is neither here nor there; it should be! I can live with it until I can come up with something better.

Summary

Hopefully I'll get some time to do a bit more testing on this code, iron out some rough edges, and make it production-ready. If I do then I'll definitely consider putting it in to replace the existing JMP.LI backend.

I also want to implement a MySQL data provider that optionally uses memcache to cache keys. It would also be interesting to experiment with adding a simple, internal cache to the Mongo provider to save round-trips to the server, where it would also queue up updates to the hit stats and use periodic batch writes instead of doing it on-the-fly. I also want to sort the race condition in that module which means figuring out how to execute an increment operation using the mongo driver.

So that's that. Another late night of writing code all in the name of learning comes to an end. Fun.

No, seriously... FUN!

blog comments powered by Disqus