Sunday, December 11, 2005

Data Version Control

This is just an idea I had:

People are always changing the way they organise their data. It is in the nature of knowledge that concepts are fluid and change from time to time. This same principle applies to IT projects. When a new project starts, the developers have immature ideas about the domain they are working in. Hence they create simple data models. When the project matures, so do their concepts about the domain. These changes may be simple, like adding a property to a class of objects, or they can be bigger, as when two distinct concepts need to merge or a concept splits up. This can be a lot of work for a database developer. The database always reflects the latest view of the domain, in a very static way. Changing the database also means changing all data that is inside it, or is to be imported later. The main idea here is that data is static.

On the other hand, we have version control systems that work on documents, and program code in particular. Code is allowed to change. Files may be added or deleted, classes can be added and modified. And all these changes may be retracted as well. Because we have version control systems, like Subversion. Documents in such a system are tagged by revision numbers. The differences between every two revisions are stored. Let's say that code is flexible, code is allowed to be fluid.

This train of thought can lead us to a combination of these ideas. Is it not possible to apply the concepts and techniques of version control systems to database management systems and the data that depends on it?

The first consequence could be that, whenever a datastructure, like one or more database tables, changes, the diff, or difference, between these structures, is stored as well. This diff should be seen as a procedure to change data from one structure (or revision) to another.

The second consequence could be that a piece of data is not only interpreted as a datatype like a word, string, object, or array; but is connected to a revision number as well. The old data can be used by newer versions of applications by applying the change-procures to it. And, the other way around, newer versions of data may be used by older applications.

Such a system would allow you to change your datastructures without having to change legacy data. This old data will be automatically upgraded when it enters your application.

Concepts are fluid. It would be great if we could write the software to support it.

Saturday, December 10, 2005

Creative Commons

"Creative Commons" is an easy way to allow other people to use your creative work in ways that you control. If you want others to use your work without them having to ask you first, you can declare so by placing a Creative Commons link/image on your website.

I did so on My AI repository website. It is free and easy. Just go to Creative Commons, press the big "Publish" button and follow the wizard. At the end you will have a piece of HTML code you can paste on your website. It is a link to the full license.

It is part of the Web 2.0 thing where everyone can publish anything and everything. It is a good thing to be clear about what others may do with your pictures/movies/code/writing.