Wednesday, September 14, 2011

Back with a bang!

So the last post was in 2009 and I’m hoping the next addition to this blog will happen sooner than 2013… So what has been going on in databaseland?

New name – Trunk

The main project has had a name change – AudioDB was too well media centric and as a database it’s name should reflect it’s generalised nature – the new name – at least as far as codenames go is now “Trunk” – I think this reflects the true nature of the application project much better!

CCR Out – C# Async In!

Well the codebase has indeed been ported to .NET 4 – no surprise there however the big shocker is that reliance on Microsoft Concurrency and Coordination Runtime (aka CCR) has ceased!

The database source has been rewritten to use the new asynchronous support (at time of writing still in CTP state) but which will be putting in an appearance in the next version of .NET framework – that’s v4.5 or v5 or something like it…

The performance is equal to that of the CCR and being a language feature it has a much more natural syntax and therefore makes it much easier to write asynchronous code in a clean and logical manner.

Read more here

Refactoring The Beast

The amount of work needed to rip out the CCR was huge so while the code was lying in pieces I took the opportunity to plug-in the Enterprise Library v5 and bring in an IoC container – Unity. The code already made use of the standard .NET component IoC pattern – overriding GetService to allow derived classes and other containers the opportunity to supply service implementations and in many ways this is still preferred for certain services since the database and its internal classes are so hierarchical in nature. However for some objects using Unity makes more sense – obtaining the global caching buffer device for example.

So all in all I broke the codebase (causing upwards of 1000 syntax errors) and slowly refactored and hooked it all up again.

Trunk SQL

So one of the reasons that development slowed down was due to the fact that to create databases and tables using the codebase required the writing of an awful lot of code – you know – create message, post message, check result, blah blah blah – it was in short a real pain – even for test purposes! So now the Trunk solution is getting its very own SQL grammar. This is a major undertaking but like so many things in the Trunk project – it is one that will grow slowly from a core set of functionality.

Now for those of you who have experience of parsers – the thought of writing a SQL parser would fill you with dread – no surprise – SQL is a large grammar and has a number of quirks that make processing it a challenging prospect – thankfully much of the heavy lifting has been taken care of by a lexer/parser generator tool called ANTLR.

This tool allows me to concentrate on defining the grammar without having to actually write the lexer (the thing that tokenises the input text) or the parser (the thing that converts tokens into larger blocks) hell I don’t even have to write the standard code to walk over the parse tree and build actual actions – really ANTLR is a piece of work – check information about the C# port of it here

Currently Trunk-SQL supports the following commands;

  • CREATE DATABASE
  • USE DATABASE
  • CREATE TABLE

Not exactly setting the world on fire as yet but these are early days and one must learn to walk before one goes unicycling…

So with SQL in-place (albeit a severely stripped down version of it) I can now test the database creation code much more easily – from single file-groups to multiple file-groups each with multiple devices – this will go a long way to making the code more robust!

As it happens the database creation code is looking rather good – data and log devices are both looking good enough to move attention to the CREATE TABLE statement.

Testing and debugging this will be a lengthy affair – there a number of data-types and constraints to deal with together will making sure that tables with a large number of columns also work as expected. In this version of the database, an individual row will not be allowed to be more than around 8000 bytes as the entire row must be able to fit into a data page.

After this work has been completed next up will be inserting data swiftly followed by indexing. The Table Index Manager already exists in code but as yet has not been exercised or tested at all.

With that in-place the task of SELECT/UPDATE/DELETE will be attacked – this will undoubtedly require yet another piece of serious programming – a query optimiser looks like a nasty piece of rules-based programming to me…