Posted by: rcosic | 08/04/2008

Database generator

I’m a big fan of the famous Nikola Tesla. Not just because it is my national hero, but because it was just a genius. I wish if I could have such ideas! Well, I don’t, and I won’t bother …. I have a few little and interesting ideas though.

One new, but again old topic which I want a little bit to lament about, is one small solution which ‘stuck in my eye’.
I’ts about a C# application (but it can be also, a clipper app, e.g.) which should serve for generating the data (in this case, the emphasis is on reality of entered data), which should use to make a good foundation on testing the application.

Here is what I was thinking and (until now) coded …

Basically, it’s about a small C# project which I named DataGenerator. It consists of couple of files:
Program.cs – the entry point in the app, it calls the main form,
Form.cs – main form – contains entering the parameters of the database (TextBoxes, combos, radio buttons),
DataManager.cs – singleton class with general-purpose functions of data layer (methods like Connect, Disconnect, Execute, List, Store, etc),
DataGenerator.cs – class which serves for generating the data depending on the structure of a database,
DataSampler.cs – class which serves for sampling the data based on some algorythm (uniform distribution, random picking, and so on).

First three files are self-explanatory, so I jump into the last two:

It is structured as set of methods which add arrays of data in the database (for example: AddAccount, AddEmployee), or adjust the data depending on business logic and rules (for example: UpdateCompany, SetParameters, CalculateLoan). Each of the methods should be created as much as simple and short as it can, and it should be clear what is it doing. It should be able to be called multiple times by some loop and as part of a transaction.

This is the class which shouldn’t have anything to do with the database, but it’s rather just a math. It contains the methods which represent different kinds of algorythms of sorting and meshing the data by some criteria; random values from some set of n, uniform distribution of values of n, and so on. It should also be flexible and extensible to accept new algorythms later on. It should also be able to support (or call some web service) with the sets of already-done data, such as names and surnames, randomly generated names of the streets, descriptions, etc. This idea actually lighted a bulb in my head when I was playing Dungeons & Dragons with the RPG game generator (creating names of a character, for example).

As I’ve previously said, the aim of such an app would be generation of set of data as much closer to the real life, to produce the most realistic solution. And that is the best to show in examples…

Let’s say we have an app which should maintain employees, or members of some organization. In that case, we want to test the performance of the system with, let’s say, 10, 20, 50, 100, 1000, and even more persons entered. The problem is how to generate such set of data regarding amount of data (maybe it is not possible to just copy the rows because of analitic data or some database constraints), or realistics (there is no point to have thousands of persons named John Doe with the same data inside).
The suggestion is that the employees’ being sampled in such a way that you can parameterize the number of employees and that the application samples (in a way that developer wishes), let’s say, one half of male and one half of female workers, each of them having his/hers unique first and last name, address, evenly spread zip codes (let’s say, every 50 workers in one state, and then every one living in one bigger town), aged from 18 to 70 (not just random number because of the employment regulations), situated in company branches and offices by some algorythm, and so and so.

Also, in simple way it should be created as least three functions which could be called by pressing the button on the form:
Generate – generation of data based on given parameters,
Clean – deletion of data (even based on given parameters) so you can perform test generation over and over again, and
Backup/Print – backing-up the data (if the database should be shipped somewhere externally for testing or evaluation), and report about the generated data to make sure everything is properly entered.

I realize that such a ‘small project’ in some cases just don’t worth a dime to develop (if it is very small set of data), or that it can last for a while it is developed (if it is a complex system), but I think for the most of the cases, it pays its development, especially in a way that eases the life of testers, which are, as we know, an integral part of Visual Studio 2005, actually Team System’s team in a whole.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: