You are here: Home Consultants Consultant Articles Do Flat Files leave your flat?

Do Flat Files leave your flat?

by janselmo — last modified 2007-06-22 12:24 PM

 

Do flat files leave you flat?

By Kelly Bell

All rights

                     

In the past 10 or 20 years, we have all witnessed an important

evolution in data processing database design. Before this, most

all databases were essentially ‘flat files’.

 

That is, they were sequential files, ones where you must read record 3 before you can read record 4. Relational databases were essentially the product of the cost of disk space.

 

With ever accelerating hardware, disk seek times were narrowing, and database designers were content to lean more towards the database space requirements than towards the access time for that data.

 

So came the relational database. Without exploring all the ins and outs of a relational database, it is essentially one where a particular piece of information, can be stored once for each record or entity, and not necessarily stored with each record or entity.

 

Here is an example. We are designing a database for prison inmates. We have 12 prisons within our jurisdiction, and we need an address for each prison inmate. In the flat file model, a street address, complete with city, state, and zip would be included within each record. In this way, you could easily write a query program that located and read the record and retrieved the inmates mailing address, which is stored with each inmate 

record.

 

In the relational model, since we only have 12 prisons, then each inmate in these prisons must have one of these 12 addresses. The model would dictate that the designer would create a separate table, called ADDRESS, and store the 12 addresses in this new table along with a code of say 01 thru 12. This 2 byte code would then be included with each inmate’s information. The logic here is that instead of allowing say 30 bytes for the address in each of 12 million inmate records (360,000,000 bytes), you can just include the 2 byte code (24,000,000 bytes) and save over 300 million bytes.

 

Relational databases have taken over the world in the past 10 or 20 years, and with good reason. These databases are compact and efficient. In the example above, more inmates can be stored before having to obtain more hardware storage for a growing system. And if a prison address should change, the change is a simple one.

 

This evolution is not without its drawbacks, however. I’ll support this statement with these 2 points:

 

  •         Because of the very nature of their design, disk drives as we know them will always access flat files faster, and
  •         To properly access a relational database, you must have more skill and training than to access a traditional flat file database. 

To substantiate the first point, let’s briefly examine the way a hard drive works without getting ourselves too lost in the technicalities. A hard disk is divided up into little sections or parts. These parts are usually called sectors, but they may sometimes be called something else. Each of these sectors has a unique number assigned to it. This number never changes and is also known as the disk address. To open a flat file, your computer first takes the sector number of the first sector in the file and reads, or seeks it and places it into memory where it is available to the program. There may be only a few bytes of useable information in this sector, but a sector is the most granular entity a disk drive can read. Usually, in a flat file there are many records in a sector, so you may do one seek and obtain lots of data.

 

The disk drive gets more of a workout when using a relational database, however. Since a disk drive read/write arm can only read a sector, and sectors typically have multiple records within them, on average each disk drive seek retrieves one record. I say ‘on average’ because this is not always true, but for my simplified example it is.

 

So, we have 2 databases, one is flat and one is relational. To read 48 records from the relational model will take at least 48 seeks from the disk drive. If two tables are joined,  then it will be 96 seeks. The flat model could take many, many less, perhaps 3 or 4. 

 

My point is that while relational databases are very good at preserving space, they are not always good at reducing seek time, and if you have only a small amount of data, a flat model would be better on both scores.

 

I am currently working a contract in a government public health facility. I predominantly use 2 databases in my work. One is a relational database, and one is an older, flat file database. In my department, we are blessed with all skill levels. There are persons who are predominantly scientists, and do not know how to, and don’t care to learn about electronic data processing technologies, some who know a little about data, some who want to learn more if they had time, all the way up to those who are knowledgeable in data access methodologies and use that knowledge every day.

 

One of the great things about my flat file model is that even those with limited data processing skills can use this model to get the information they are after. This not only enhances the user’s ability to extract and use information, more importantly it gives them something sorely lacking today – a positive experience with computers. Think when you were in school and you were learning how to build an algorithm to solve a particular data processing problem. Read a record, process it, read another one, process it. All those simple data processing fundamentals apply to all flat files, so anyone with just a tiny bit of experience can use them, and can access their own data.

 

Don’t write off flat file databases as you go forward. They have their place in modern data processing problem solving. For short bits of data they are fast, simple and even the most inexperienced data processing professional can master them.

 

 

###

 


Personal tools