You are here: Home Consultants Consultant Articles Creating PreFunctioned Databases

Creating PreFunctioned Databases

by admin last modified 2004-09-22 12:15 AM

 Creating "Pre Functioned" databases


I have just completed some consulting at a government research facility here in Atlanta. We have a large user community who are casual SAS users. That is, they are MD and MS  level health care professionals and have minimal formal training in data processing, but need the flexibility that comes with knowing how to write and modify relatively simple programs that can extract information from the databases we maintain.

The data processing personnel here have paid particular attention to creating databases that assist the health care professional with this task by placing data elements within the SAS datasets that are "pre functioned", a term I coined to refer to some of the preprocessed variables in these data. As an example, in this database there is a date of onset of a particular health condition, along with a birth date of the patient. A common calculation for the users in our group to perform would be to calculate the age of the patient at the date of onset by subtracting the birth date from the onset date.

In our environment, during the phase the user database is being built, we do this often needed calculation and create a separate variable, CAGE_YRS, which is the age of the patient in question at the time of the onset of the reported health condition. When there are more than one date within your database, you should ask yourself the question "What calculations with these dates can I do now to keep them from being done by every end user here?"

Another great example: There is an array in this database that contains up to 40 costarts, which are text fields describing some symptom or complaint experienced by the patient, such as "Headache" or "Runny Nose" for example. A common calculation done by the user community would be to count up these costarts, or separate symptoms to establish if only a single costart was reported, or multiple costarts were reported, and if so, how many
.

 

 We would do this calculation at the time the database is being built and establish a new data element, NUM_SYMPTOMS, which contains the integer number of entries in the costart array. The user could interrogate this single data element to establish the number of costarts reported, as opposed to the more lengthy, time consuming, and most importantly tedious and error prone task of describing the array and looping thru it to establish this often needed bit of information.

 

One last example using the costart array from above. Often, users would be looking for a particular costart, perhaps "Runny Nose" for example. To prevent every user from establishing the array and writing search logic to interrogate every element for the single value we would concatenate all 40 of the elements in the costart array into one long text field. The user then can, with a single function such as INDEX, CONTAINS or INSPECT  search for costarts within this one text field, a much simpler and more direct way to get the needed information.

The advantages of these "pre functioned" data elements are many. First, the non data processing personnel using the data can more easily and more quickly extract the needed information from the database on there own, without IT intervention. Their programs are more straightforward and understandable without, to take the example above, the added code of describing the costart array, and looping thru it just to count the number of entries.

Secondly, with a user community of perhaps 25, using the "pre functioned" data strategy, the data base administrator has reduced the number of calculations from 25 (one for each user) to one, the one established at the time the database is built. The more users using the data, the more economical your database becomes, in terms of cpu time usage.

Lastly, this strategy empowers more of the user community by allowing professionals of a lower experience/skill level to use the database effectively, as the more difficult calculations needed to use the data have already been done. Your database will become more socialized and understood by more people within your organization.


Keep these principles in mind in the future as you build databases which could possibly be used by non DP professionals, as was the case at my organization. The bother of including these one time calculations far outweighs the benefits in user productivity, reduced CPU usage, and a more DP aware user community.

 

Generously Contributed by Kelly Bell


Personal tools