Chapter 10

Creating Expert Tables and Queries


Each important new fact that flows into an organization must find a place to live. Frequently, a database provides just such a residence. The tabular structure of a database allows facts to be collected (organized), protected (validated), and dissected (analyzed) in an orderly fashion.

With the Jet database engine, you can collect specific data and then develop an application layer over the data using the non-Jet components in Access. Before you build your forms, reports, and program code, however, you should always make sure you have created a solid data structure as their foundation. This chapter covers the key elements of providing a solid data structure:

He's Attached to His Data


"Many demos I gave of Access 1.0 to members of the press went something like my meeting with Stewart Alsop of InfoWorld. My goal was to show Stewart several of the exciting features in Access, but in order to do this, I wanted to show him the product using some of his existing data.

"He willingly pointed me to some of his data (stored in a non-Microsoft product's format), which I imported into Access and began to browse. While I was showing him how easy it was to make ad-hoc queries on the data, his eyes went "I never noticed that before, can we drill-down into this data?". For the next hour, we queried, sorted, filtered, and printed his records as he analyzed and trended his company information, not once noticing that we had spent the entire time focusing on his data and had barely even discussed any specific features in Access!"


Tod Nielsen, General Manager, Microsoft Access Business Unit

The Jet database engine, a standalone component that ships with Access, provides the advanced data management features that you would expect from a market-leading product like Access: "engine-level" rules (automatically enforced at the record and field-level), relationships between primary and foreign keys to maintain data integrity (called "referential integrity"), and cascading updates and deletes based on established relationships.

Of course, simply having these features in your applications does not make them into expert solutions. Creating an effective and optimized application involves properly applying the powerful features of Jet. You must know when and how to use the built-in features, as well as know how to write code that goes beyond the limitations of the built-in features.


Lest you think that the Jet engine was built only for use with Access, Jet 3.5 actually ships with Access/Office, Merchant Server, Picture It, Publisher, Team Manager, Visual Basic, Visual C++, and Visual J++.

In fact, if you look in the files shipped with Microsoft's home management application, Bob, you'll find MSAJT110.DLL-yup, it's Jet!

In addition to knowing how best to use Jet, building an expert solution involves structuring the data to provide the optimal combination of usability and performance. There are some basic rules to use when creating relational database structures (brush up on the concept of "normalization" if you do not already know it or read "Design Step 1-Modeling the Data Structure" in this chapter). In addition to the basic rules, many data management scenarios are unique to a specific application and require the traditional brainstorming session at the whiteboard to discover the best approach.

In this chapter, I've tried to pull together examples relevant to the areas in Jet where I see developers struggling the most frequently. This chapter covers table and query concepts designed to help you resolve common database management and structure predicaments. The techniques here catapult you to a higher level of database productivity and your applications to a higher level of performance. Unfortunately, the art of good database design and structure cannot be covered in a single chapter, so I have used my discretion in selecting poignant examples of data management challenges.


In this chapter, I will often use the term "schema" to refer to the overall design of a database.

Reviewing the Best of the New Features

With the number of significant changes made to Access itself between versions 2 and 95 and 97, you would expect the list of recent feature additions in Jet to be quite lengthy. However, the biggest changes in Jet have taken place "under the hood" rather than to the features list or object model.

In the past, Access called the Jet database engine DLL directly, which made each version of Access "hard-wired" to a version of Jet and created the necessity to call the version of Jet directly from Access Basic. With the removal of Access Basic and its replacement with VBA came the opportunity to add proper support to Access for an object-centric model. Now, VBA communicates with the type library for the Data Access Objects in DAO350.DLL to determine what objects, properties, methods, and events are available for database work. It does so in the same fashion that VBA looks to any other referenced type library in your Access application (from Excel to ActiveX controls) to discern how to use them and to compile code against them. One of the services that DAO provides is access to the Jet engine.


A type library is a file that lists descriptions of the object model (objects, properties, and methods) of a program or service provider. Chapter 11, "Expert Approaches to VBA," provides an enhanced explanation of these files.

For the purposes of this chapter, think of the DAO type library as the organizer of database services on Access' behalf. If Access needs to work with an MDB file, it asks DAO to work with Jet to link to the file and manipulate its data. On the other hand, if Access needs to work with an ODBC data source, it asks the DAO to initiate ODBCDirect functionality instead.

DAO has undergone a structural revamp to expose it as a 32-bit in-process Automation server. This server engine is implemented in the DAO350.DLL file, which provides data services to Access. DAO can provide services for Access to talk to ODBC data sources (such as SQL Server) and to the Jet engine routines stored in the primary Jet DLL MSJET35.DLL. Thus, Access' dependency on Jet has diminished in favor of a dependency on multiple data services provided by the Data Access Objects library.

Why should you care about this improvement? Because any product that fully utilizes VBA now can call the DAO; thus, any code and data structures you build or prototype in Access can be shared with or moved to other VBA-centric components of your applications. As an example, you can process data in Access and send it to Excel using tricky DDE or slow Automation, or you can now alternately create a database in Access, write and test your DAO code there, and then copy and paste it into Excel, running it directly from Excel VBA.

You should also care about the fact that DAO provides access to more than one type of back-end server through a common syntax. Thus, you can now create applications that create a recordset against Jet, move the data from Jet to another back end supported by the DAO, and not change any of your code. This is a significant improvement in the way you can code.

There are several improvements in DAO/Jet that are noteworthy. Let's look first at the ones that you inherit in your applications but have no control over. Here are some of the internal changes that should make your data operations faster:


This does not remove the need for transactions that provide the ability to undo an operation, but it does remove the need to add transactions whose only purpose is to provide speed.


Some of the enhancements in the prior list were introduced in Access 95 and enhanced in 97.

In addition to internal changes, Jet and DAO also include some improvements that are exposed to developers. Many of the exposed enhancements deal with ODBCDirect (as described in Chapter 19, "Exploring Client/Server Issues,") or with replication; this section will introduce the ones that don't.

Two new objects have been added to the DAO object model: the Connection object and the Errors collection. The Connection object is used to create a connection to an ODBC data source, bypassing Jet. This object facilitates ODBCDirect and is discussed in Chapter 19, "Exploring Client/Server Issues."

The Errors collection improves upon the single-dimension structure provided by the Err and Error$ functions by allowing for the accumulation of more than one data access error. The benefits of the collection structure become immediately apparent when performing operations against ODBC data sources, which can generate multiple error messages per error event. (See Chapter 19 for more information on the Errors collection.)

DAO 3.0 introduced a new EditMode property, which applies to RecordSet objects. This property shows the current status of the edit buffer of the recordset. The edit buffer holds either the current field values (after an Edit method has been invoked on the recordset) or the new field values being entered (after an AddNew method). Compare the value returned by this property to the following constants:

The new CancelUpdate method works well in conjunction with EditMode. CancelUpdate discards the information in the edit buffer but does not move the record pointer. To cancel an edit in prior versions of Access, you had to move the record pointer off of the dirty record without issuing an Update method, then move back, which made for slightly sloppy code. Now you can cancel changes explicitly, as the example in Listing 10.1 shows.

Listing 10.1 AES_Tbl.Mdb-Using CancelUpdate to Discard an Edit

' Get a value from the user

varWork = InputBox("Enter new name or Esc to cancel:", "Edit", rst!CompName)

If Len(varWork) = 0 Then ' User pressed Esc

If rst.EditMode <> dbEditNone Then ' Edit in progress

rst.CancelUpdate

End If

Else ' Make the change

' Make certain the current record is being edited

If rst.EditMode = dbEditNone Then ' No edit in progress

rst.Edit

End If

rst!CompName = varWork ' Set the value

rst.Update

End If

An additional new property for recordset objects is AbsolutePosition. This property provides the relative record number of the current record within the recordset. The setting is a zero-based long integer value that is read (to detect the current position) or written (to move the record pointer). The setting returns -1 to indicate that there is no current record (the recordset is empty).


The AbsolutePosition setting is analogous to the record number shown in the navigation buttons control on a form and should be taken just as lightly. In a relational database, a record's location in a recordset is informational only, and should not be used to identify or refer to the record.

You should use the Bookmark property of a record if you need to store its location in the recordset and return to it later, and use the primary key or other designated unique value of the record to identify it to users.

Another new property added to DAO 3.0 was the RecordsAffected attribute of a QueryDef or Database object. Immediately following an action query (delete, insert, or update), RecordsAffected describes the number of records that were impacted, as shown in Listing 10.2.

Listing 10.2 AES_Tbl.Mdb-Counting Records Affected by an Update

Set dbs = CurrentDb

strSQL = "UPDATE tblComp SET Employees = (Employees + 1)" _

& " WHERE CompID BETWEEN 4 AND 6"

dbs.Execute strSQL

Debug.Print dbs.RecordsAffected


This property also works with data in ODBC data sources, and can be extremely useful when used against server data. See Chapter 19, "Exploring Client/Server Issues," for examples.

DAO 3.0 also introduced the GetRows method of a RecordSet object, which creates a variant array from a selection of data. The only argument to the method is the number of records to fetch from the target object. The array is dynamically sized to two dimensions, with the first containing fields and the second containing rows. Here is a syntax example:

varComp = rst.GetRows(5)

GetRows starts fetching information at the current position of the recordset pointer and grabs as many records as are specified. After the fetch, the pointer is relocated to the record after the last one retrieved.

As with all array measurements, call the UBound function to determine the size of the array after a GetRows. Because GetRows rebuilds the array each time it is called, UBound will indicate how many records were actually fetched. See the example in Listing 10.3 for clarification.


If you attempt to retrieve a record that cannot be fetched because it has been deleted from the recordset, GetRows does not generate an error. Instead, it retrieves records up to the problem record and stops there. Always compare the bounds of the target array with the number of records requested to make sure that they are the same. If they are not, there was a problem with a fetch or the end of the recordset was reached.

Because no error is generated when the expected number of rows is not fetched, you can request more records than are left in the recordset without generating an error. However, calling GetRows again once the recordset pointer is at EOF will generate an error.

Listing 10.3-AES_Tbl.Mdb-Using the GetRows Method to Fill an Array From Data

Sub GetRowsDemo()

' Purpose: Demonstrate the new GetRows method

Dim dbs As Database

Dim iintCol As Integer

Dim iintRow As Integer

Dim rst As Recordset

Dim varComp As Variant

Set dbs = CurrentDb

Set rst = dbs.OpenRecordset("tblComp", dbOpenDynaset)

rst.MoveLast ' Pull the entire table into the buffer

rst.MoveFirst ' Set the starting position

varComp = rst.GetRows(rst.RecordCount) ' Get all rows

' Increment the UBound by 1 to compensate for zero-based arrays

Debug.Print CStr(UBound(varComp, 1) + 1) & " fields returned"

Debug.Print CStr(UBound(varComp, 2) + 1) & " records returned"

' Dump the contents of the array

For iintRow = 0 To UBound(varComp, 2)

For iintCol = 0 To UBound(varComp, 1)

Debug.Print varComp(iintCol, iintRow);

Next iintCol

Debug.Print

Next iintRow

End Sub

Running the code in Listing 10.3 produces the output shown in figure 10.1.

Fig. 10.1

This Debug window shows the results of dumping an array populated with the GetRows method on a recordset.


You can build very large arrays with this method, either by fetching many rows into the array or by including Memo or OLE Object fields in the source recordset. Be careful not to create arrays that use so much memory that they reduce the performance of your application.


Exploring Complicated Data Management Situations

While no two business problems are solved by exactly the same database structure, there are various types of data tables that are common to most databases. You will usually include tables with these purposes in each business application database:

In addition to these designations, there are other types of tables that play supporting roles in an application. Some applications need tables to archive historical data or to log events, while other applications make use of various types of summary, temporary, or work tables. As you evolve standard approaches to designing and building databases, you will develop your own definitions and standards for the various application tables commonly used in your environment.


Review Chapter 7, "Understanding Application Architecture," for a detailed description of the various other categorizations I use to classify application tables.

One of the keys to creating a solid and effective database structure is to understand these various roles of tables (and queries) and how to optimize their use. There are many decision points when designing a database structure, and usually more than one way to solve a specific problem.

I see several challenging situations arise frequently during database construction efforts. In the next few topics, I distill these common situations into examples that help you understand the structural challenges posed by data tables and prepare you to address similar challenges when they appear in your applications.

Designing a Database-An Example Using an Everyday Situation

The more database applications you create, the more likely you are to bump into one of the most common problems in database construction: the creation of a contact management system. Such a system provides us with several widespread real-world problems to address here, as exemplified when you try to classify the relationships between companies, people, and the extra information for both the companies and the people.

When I approach any database design project, I usually create several diagrams of the relationships between the data items. Database diagrams (often called "data models") help to convey data dependencies visually and provide an easy format for discussion and review.


TipWhen designing a database application, a whiteboard is the most important tool besides a computer. A complex database design session involves a team of designers doing a lot of drawing and erasing.

You cannot create database diagrams until you've determined the basic "entities" (real-world objects) and "attributes" (the properties of the entity objects) that will be stored in the database. For example, company information is one type of entity in a contact management system. This information has attributes that include name, number of employees, date established, and so on. (In Access terminology, you will distill entity information into one or more tables and the attributes into table fields.)

Design Step 1-Modeling the Data Structure

When modeling a new database, create a list of standard steps to follow that work best for your development team. You must explore the various structures for data items and their relationships, and distill the structure that provides the best combination of practicality and performance.

Here is an example of simple steps that you might follow in a data modeling process:

  1. List the entities (groups of data) that must be accommodated by the application. These groupings will become one or more tables in the final database, but at design time you should think only in terms of data items and the relationships between them.
  2. For our sample contact management system, Table 10.1 lists the primary entities.

Table 10.1 Primary Entities in a Contact Management System

Entity

Description

Companies

The top of the data pyramid in the system

Contacts

People, many of whom work at companies

Addresses

Both companies and people possess these

Telephones

Companies and people have phones

  1. Describe the relationships between the entities. Often, there are mutual dependencies between data items, as in the case of companies that have multiple people (employees or related parties), and people that can work at multiple companies.
  2. One simple way to describe database relationships is to use the language of objects and their roles (a technique called "Object Role Modeling" has developed around this approach, as exemplified in products like Asymetrix's InfoModeler). Here is an example of sentences that describe objects (entities) and their roles:
  3. Figure 10.2 displays a diagram that visually represents the described relationships (roles) between the entities. Notice that the diagram includes our modeling language.

Fig. 10.2

This diagram uses modeling language to convey the complex relationships between database entities.


It is easier to involve users in the design process when you use database diagrams modeled with English-like syntax than with diagrams that are highly technical.

  1. Describe the attributes for the entities. Once you have a rough idea of the data groupings and relationships, determine what details must be collected for each entity. For example, address information usually includes a location name, one or two address lines, and information about the city, state, and postal code.
  2. During this process, note any areas where information will be duplicated or defines its own entity. As an example, here are some of the attributes you could track for people, as follows:
  3. At this point in your modeling, you should begin to think in terms of tables and fields, and begin to collect sample field values. Figure 10.3 shows the initial tabular layout that you might create using the information about people in the prior list. The table tblCont contains the contact information.

Fig. 10.3

The known information about people has been turned into a tabular layout.

  1. Closely analyzing the items that are known about each person yields a discovery that the telephone number actually repeats more than once for each person. The next step addresses this.
  2. Determine how entity information can be "normalized." "Normalization" is a formal process applied to database structures that involves determining how to group and structure the data to best fit the relational model. (Jet is based on the relational database structure model and facilitates normalized tables.)

Because normalization is a formal process, much written material exists about it from both the philosophical and implementation standpoints. Thus, I chose to not turn this chapter into yet another full treatise on the rules of normalization.

  1. From a practical standpoint, the essence of the normalization process can be summed up in a few simple techniques-I'll summarize them here and note how they apply to the sample data at hand.

Fig. 10.4

Phone number information from the tabular layout in figure 10.3 has been removed into a separate child table to allow for multiple values per person.

  1. When each record is unique within a table, contains a unique identifier, and does not contain data that is unrelated to the unique identifier, your database structure is essentially "normalized".

The formal process of normalizing a database design includes several levels ("forms") of the state of normalization. The level attained in the previous discussion is called "third normal form", where unique keys identify records of non-repeating, non-compound, related data items.

Achievement of the third normal form is commonly used as the objective when designing a relational database system. There is a total of five levels of normalization that can be attained, but most database designers stop at third normal form for practical reasons. Consult a book on relational database theory if you are interested in learning about the fourth and fifth normal forms.

  1. Describe the structure of the attributes. Once you have firmed-up the descriptions of entities and their attributes, define the characteristics of the attributes. In Access terms, this equates to determining the properties for each defined field.
  2. As an example, names stored for people need to be given a data type (Text), a length in characters, rules for testing the data on entry and editing, and so on.
  3. When defining a database structure, one of the most important attributes of a field is whether or not it participates in a relationship. A fields that constitutes a unique record identifier (primary key) may need to link to similar fields in related tables (foreign keys). The fields that constitute primary key/foreign key pairs have common attributes, such as the same data type and, usually, the same name.

If you follow the commonly accepted database design wisdom that favors a centralized (also called "integrated") data dictionary, you use the same name for each field in a primary key/foreign key pair.

A data dictionary is simply the definition of a database's objects and their relationships, and an integrated data dictionary is one that treats fields that appear in different tables but have the same purpose as one field definition. The centralized definition of such a common field requires that the field's name and other properties are the same in each occurrence of the field.

When you apply the previous rules to the other entities in this data model, you discover that the final database structure closely resembles the role model you originally diagrammed in figure 10.2 earlier. Figure 10.5 shows the base tables you end up with after normalizing the attributes for companies and people. (The base table names reflect the abbreviations "Comp" for company, "Cont" for contact, "Addr" for address, and "Phon" for phone.)

Fig. 10.5

This contact management system structure has been normalized to provide a relational data structure.

Notice in the figure that once you've normalized the structure, the database objects in the diagram are identified as tables rather than collections of attributes. When you have gotten to this point in the design process, it is wise to "concept test" the preliminary tables using sample records (this does not need to be done in Access, it can be done on paper or a whiteboard). Additionally, once a database design has been drafted at the table level, you can begin to consider the primary key identifiers for the records in each table, and the foreign key fields that will provide the links between the tables.

The example database design now includes the primary objects in the database. The next step is to identify and resolve the unique challenges presented by the finalized design.

Design Step 2-Answering the Tough Questions

When you have created a draft model of the database structure, you still don't have the answers to all of the database structure issues. You've survived the first round of the design battle, but the seeds have been sown for the more serious questions to be asked.

During your detailed application research efforts (as described in Chapter 3, "Preparing for Development"), you discovered the various types of data that must be stored by the contact management system. Now, further along in the design process, you must review the design notes and determine what non-standard data items and relationships were discussed during the research. You must make certain that the design facilitates these issues. As an example, during the design of this contact management system, a user stated that the system needed to store pager numbers in addition to other telephone numbers. You would review the design at this point to determine if the proposed database structure and table tblContPhon will accommodate this type of data or not. If not, you may need to revise the proposed structure.

The first question to ask yourself when reviewing a proposed data model is, "Does this model accommodate all relationships between entities?" Sometimes, an initial database design has an overabundance of join lines (relationships) between tables, and can be simplified. At other times, data models get more complex as more "What if?" scenarios are run against it.

Take the example of the relationship between companies and people. Our current model assumes that people and companies are directly related. In this scenario, there is a direct relationship between one company and many people, and also between one person and many companies (a person can hold two jobs, right?).

Inherent in this scenario, however, is this limitation: although a person works for a company, a person usually works at one primary company location each day. How does this model provide information about the relationship between people and specific company addresses? In other words, assume that Bob works at Jones Company, and Jones Company has three locations. When you join Bob's contact record to Jones Company' record, and then join Jones Company's record to its addresses, by inference Bob works at all three addresses. A good data management model must be smarter than this and allow you to indicate which of the three addresses relates to Bob.

To confront this issue, you may have to alter the data structure, the relationships between entities, or both. In this specific case, you have these options for organizing the data:

Fig. 10.6

This diagram shows how records of people in the database have a relationship to companies and a relationship to one or more addresses for the company.

Notice that the third option creates a type of circularity in the data. While the relationships shown in figure 10.6 do reflect the real world, they make querying the data and keeping entry and edits synchronized more of a challenge for the developer.

A second type of unaccommodated data relationship comes to light in further analysis. Thus far, we've provided for telephones that are attached to addresses, with the addresses attached to companies. In this layout, how does a user enter a toll-free or central switchboard number for a company, which are essentially address-independent telephones?

One approach is to provide direct numbers for companies in the same way that we've provided for direct numbers attached to people. This situation allows telephones to belong to companies without the need of an address relationship.

An alternate approach would be to restrict the data entry through a policy stating that all company telephone numbers entered into the system must be assigned to an address, even if this designation is somewhat arbitrary. Therefore, a toll-free phone number would be assigned to the corporate headquarters address, the marketing/sales office address, or another reasonable candidate.

A third question to ask at this point in the data model review is, "Does the data structure accommodate and allow orphaned records?" Certainly a telephone number or address record without a parent person or company is devoid of context, and thus these records should not be allowed to exist without a parent. People, however, present a different challenge: How do you track people in the database that do not have a relationship to a company (retired persons, homemakers, the self-employed, and so on)?

Answering this question means that one of two situations must be allowed, as follows:

  1. People must be allowed into the database even if they have no matching company or company address entities. To enable this, there cannot be a referential integrity requirement in the database that insists each person belongs to at least one company.
  2. People that have no parent company must relate to a phantom company record entered into the database simply to preserve the data integrity. To enable this scenario, one ("dummy") company record must be entered that provides the required parent record for all contact people that have no actual relationship to a company entity.

Either of these options is viable, and which one you choose is determined by personal preference or by the needs of the database platform and the application.

Going through this type of analysis provides new insight into the data structure and allows you to refine your model to match the world it must function in. Without this type of analysis, a database design that looks good often does not provide an adequate storehouse for its data. I often see database designs that seem adequate to the designers, but are challenged by the users the very first day-"Hey, Mr. Developer, how do I enter this kind of information into the system?"

Design Step 3-Implementing Relation Tables

Our preliminary data design now provides for a variety of relationships between entities. Some of these relationships are two-way, called "many-to-many relations," as in the case of companies and people. Expressing many-to-many relationships in a conceptual diagram can be done with table join lines that point in both directions, as in Figure 10.6 and others in this chapter.

However, when translating the conceptual diagram to an actual database structure, it becomes apparent that Access does not provide a way to store pointers between two tables in a many-to-many relationship in either one of the tables themselves. In other words, the relationships between the three tables in figure 10.6 cannot be expressed in Access using only the three tables. Instead, a third table (commonly called a "relation table") must be located between two tables with a many-to-many to provide the linkage.

Figure 10.7 shows how most of the table relationships in this current sample database structure require the presence of a relation table to maintain the many-to-many links.

Fig. 10.7

Many-to-many relationships in a database are facilitated by inserting relation tables between the base tables.

In general, an intermediary relation table consists of nothing more than records with foreign key values pointing back to the primary keys of the two tables it relates to. In other words, the relation table trelCompCont that you build to link tblComp to tblCont would contain a company ID and a contact ID in each record, as shown in Table 10.2. (The Comment item in the table is for explanatory purposes and is not in the data structure.)

Table 10.2 Entries in a Relation Table Allow for Many-to-Many Data Relationships

CompID

ContID

Comment

100

1

Bob (1) works at Jones Company (100)

100

2

Fred (2) also works at Jones Company (100)

101

1

Bob (1) also works at Smith Company (101)


A relation table usually does not allow more than one instance of a defined relationship between two records; thus the two foreign key values in the relation-table record can often constitute a primary key for the table.

Relation tables add a layer of complexity to a system, albeit a necessary one. At every point in your data diagram where a two-way arrow is used to indicate a many-to-many relationship, a relation table is required.

For a developer, the presence of relation tables introduces extra objects into table joins and thus provides additional performance challenges during development. For users, ad-hoc queries can be confusing due to the presence of the intermediary table and may add an additional training burden when deploying the system (see "Creating Complex Queries" later in this chapter).

Design Step 4-Expanding and Contracting the Roles of Tables

Hopefully, you're keeping up with me so far. We're three topics into the database design tutorial, with two topics to go. Now might be a good time to stretch and grab a hot drink to sip while you read further.

By definition, a relation table need not contain attributes other than the two keys of the linked records. However, once a relation table is in place, it can provide you with a handy storage device that solves certain data problems.

For example, in the discussion of the relationship between companies and their people, we have not defined how to store the job title for each person at each company. A relation table between companies and people provides the natural home for such a data item, because the job title is specifically a function of the intersection between a person and a company. Thus, you could expand the relation table trelCompCont shown in Table 10.2 in the previous section to include the job title, as shown in Table 10.3. (As before, the Comment column is for clarification and is not in the data table.)

Table 10.3 A Relation Table Can Also Store Attributes for a Specified Relationship

CompID

ContID

Title

Comment

100

1

Clerk

Bob's title at Jones Company

100

2

VP Finance

Fred's title at Jones Company

101

1

Bookkeeper

Bob's title at Smith Company

Carrying this example further, we find that people relate to company addresses in the same manner, in that they have a specific relationship to each address. (Physicians provide a good example of this situation because they often work at two or more clinics, with one being the primary location.) Thus, you locate an address type descriptor in each record of the relation table between people and addresses. Continuing the example of physicians, 93"Primary Office" and "Friday's Only" would be common address type descriptors found in the relation table.

Just as reviewing a database structure can lead to the addition of tables (such as relation tables), a complete analysis of a database design often finds opportunity for contraction as well. Sometimes, two tables that are similar in form and function can be combined into a single table.

For example, you've noticed by now that the use of separate address tables for people and companies introduces some redundancy into the database structure. The same is true of the three tables for telephone numbers. Each of these provides an opportunity to consolidate data items. However, before you blindly consolidate, consider the unique circumstances in each case.

For addresses, a given address is uniquely identified by a primary key, and its relationship to its parent is managed within a relation table. Thus, you could combine the addresses for people and companies into a single table, while keeping the two separate relation tables that identify the links between addresses and their owners. Because the address table does not store the foreign key of its owner, the table can serve multiple masters without compromising its data. In other words, the first record can be an address for Bob, while the second record can as easily be the address for Jones Company. Because the "parentage" of the address record is outside of it in a relation table, the address table itself doesn't care whose addresses it stores.

A more complicated example is provided by direct telephones. The data diagram for this information (refer back to figure 10.7) shows a many-to-many join between people and phones, as well as one-to-many joins between addresses and phones. (The relationship between addresses and phones is one-to-many rather than many-to-many because a telephone number cannot be related to more than one address record.) At first blush, you may be tempted to merge the three telephone structures into a single table. However, a single telephone number in the merged table would then belong to only one of the following types of parent record: company, contact, or address.

Figure 10.8 shows one option for restructuring the tables to support this kinship. In the diagram, a separate field is placed in the telephone table tblCommPoint for each type of parent. While this structure is effective, it is not easy to use (the table reflects only one kind of data, but it has a different foreign key field for each of its different parents).

Fig. 10.8

The second, third, and fourth fields in this table are each a foreign key and reflect the fact that each table record has one of three possible parents.

A different approach to this problem involves creating an all-purpose foreign key field that serves as a pointer to any parent. Because this field can only have one data type, the primary key values of the three different parents must all be the same data type. For example, a telephone table with a single foreign key field might include two records, one which has the company address table as its parent and the other that is related to the contact address table. Table 10.4 shows these two records. (The Comment column is for clarification and is not in the data table.)

Table 10.4 The Structure of a Table That Has Two Parents

ParentID

PhoneID

Number

Comment

100

1251

555-1212

Number for a company. ParentID points to the company table.

768

1252

555-1313

Number for a contact person. ParentID points to contact table.

In both of these situations described for telephones, a condition called "multiple parentage" has been created because a specific record could be participating in any of several different relationships. While structures such as the two previously described are sometimes necessary and certainly provide economies of scale when storing and querying data, they are confusing to work with.


In this particular example, you solve the problem of multiple parentage by removing the foreign keys from the combined telephone table altogether, and by using relation tables instead. Using a relation table helps to keep the actual data table "clean," without any fields to provide the "hack" that allows telephone records with different parents to exist in the same table. In the current example, this means that each telephone record has a unique ID only and no field that points to its parent. Instead, between the company address table and the telephone table there is a relation table. Also, between the contact address table and the telephone table lies a different relation table. Table 10.5 lists the required objects in this model.

Table 10.5 This Mix of Tables Supports Multiple Parentage for a Telephone Table

Table

Purpose

Company Address

Company addresses

Company Address Phone

Company addresses related to telephones

Telephone

Both company and contact telephones

Contact Address

Contact addresses

Contact Address Phone

Contact addresses related to telephones


A database structure built around relation tables allows you to change the model as the database grows without restructuring the actual base table. All future changes to the storage approach take place in the relation table instead. For example, adding warehouse address information to the structure listed in Table 10.5 would require the addition of a warehouse address table and a table to relate the addresses to telephones, but no change would be needed in the telephone table.

The previous discussion was necessary because of the existence of one-to-many linkages between addresses and telephones. If there is a many-to-many relationship between your data items, the use of a relation table is mandated automatically and determines the structure.

Further exploration of the database structure can often provide additional opportunities for improvement. Ideally, good database design teams should generally catch all data structure issues and opportunities early in the design process. Nevertheless, some people simply think better from a picture and will generate new ideas only once structure diagrams like those in the figures in this chapter are available.

In the case of our current example, as we review our drafted structure we note that no provision has been made for accommodating additional technology available for contacting people, specifically pagers and e-mail. The first temptation might be to create another table or tables to store the contact points. On closer examination, however, you find that the relationship between people or companies and direct phones is quite similar to that between people or companies and other communication points.

Only three changes must be made in order to accommodate additional communication (contact) points into the structure already designed for telephone numbers:

  1. The table, field, and related relation table must be renamed to reflect the revised contents. Up to now, we've been describing a telephone table. In our sample database, we'll rename the fields and table for telephone information to reflect the concept "communication point" instead.
  2. The field width of the telephone number field must be widened to allow e-mail addresses, which are longer. Table 10.6 shows the new mixture of sample communication point records. (The Comment column is for clarification and is not in the data table.)

Table 10.6 This Table Contains Different Varieties of Communication Points

CommPointID

CommPoint

Comment

7

206-555-1212x145

A telephone number

8

206-555-4666

A pager number

9

Bob@Jones.Com

An e-mail address

  1. An additional field must be added to the relation table to allow for differentiation between phones, pagers, and e-mail addresses. Each relationship between a person and a communication point for that person should convey the type of contact information that it stores. Table 10.7 shows some relation table records that join contact people to communication point records.

Table 10.7 Sample Relation Table Records That Describe the Relationship Between Contacts and Their Direct Contact Points

ContID

CommPointID

CommPointType

456

7

Direct line

102

8

Pager

In the previous four sections , we've analyzed the nature of the contact management data and the data relationships in the proposed structure and found ways to get more data into less tables. We've done so partly by combining data tables and shifting organizational work to relation tables. A finalized data structure that solves the problems described in the four sections is shown in figure 10.9.

Fig. 10.9

This data structure accommodates multiple parents for the tblAddr and tblCommPoint tables and displays the many-to-many relationships.


At this stage, you may want to launch Access and review the tables in the sample database for this chapter: AES_TBL.MDB. The tables in the database reflect the final schema design shown in Figure 10.9, and include sample records.

Design Step 5-Defining the Supporting Tables

Once the core data structure is defined, you must revisit each entity (table) in the database and extrapolate the supporting information required by its attributes (fields).

The most predominant types of supporting information in a database structure are lookup table values. Lookup tables provide several benefits to a database structure:

Database designers must walk a fine line when using lookup tables. The use of these tables provides a solid, centralized manner for enforcing data integrity in the database. Thus, it's tempting to use them for every field that has non-variable values.

The temptation to use lookup tables to validate and populate certain fields in a database can become extreme and can lead to an overabundance of such tables in the database. (As an example, I once saw a people table that had a Number of Children field fed by a lookup table. The lookup values were: "1," "2," "3," "4," "5," and "6 or more." Because this data was not fed into the computer based on a hard copy document with similar values, I saw no benefit in this structure over a validation rule in the table that simply enforced a non-negative number range instead.)

There is no art to creating lookup tables, as the majority of them have only one or two fields-a key value and a display string. (When the key value is self-explanatory, no additional display string is required.)

The example of the United States postal codes provides a common scenario. We now have short state abbreviation code values ("AK," "AZ," and so on) that are defined and maintained by the U.S. Postal Service, and these are used by the majority of database systems. However, even before the government's involvement, most computer system designers chose to define and store a shorter value for states ("Alas," "Ariz," and the like) in the base table data and then translate the short code to a full string via a lookup table as needed.

At the simplest level, some lookups have only one field. Lookup tables of this type are not intended to translate short codes into longer strings (as with the previous state example). Instead, single-field lookups provide only one purpose: data entry validation. Tables of this type are used to populate combo boxes on entry forms in order to limit the values placed in a field by users.

At a more complex level, lookup tables can present a slightly more intricate set of opportunities. Consider the decision whether to include multiple fields in a single lookup table in order to accommodate multiple uses, or to create several lookup tables instead.

An example of this situation is presented by address records. We have defined previously in this chapter a structure where addresses that are attached to people can indicate personal addresses or company addresses. Thus, it is useful to catalog each relationship between address records and people by type (such as "home"). This type of code is stored in the relation table between people and addresses.

Consider that addresses are collected in the database for companies and individuals. In order to create a single lookup table that can be used for both types of addresses, a flag field must be placed in the lookup table to use when building filtered sublists from the table's records.

Figure 10.10 shows the lookup table for address types, with a flag field captioned Company/Contact that allows for the creation of one value list for company addresses (where the field equals "C") and a different list for personal addresses (where the field equals "P").

Fig. 10.10

These lookup table values are flagged to allow them to populate more than one value list.

The benefit of the layout shown in the figure is that it allows you to build two combo or list sources from a single lookup table. By distinctly selecting the Address Type ID values based upon a Company/Contact flag value of "C" or "P," a list of address types for a specific purpose can be built.

Notice that the table in the figure includes Address Type ID values that could easily be grouped into two levels of detail (in other words, broken into two lookup fields). Addresses for companies are of the flavor corporate or retail (their Address Type ID begins with either "CORP" or "RETAIL"), and addresses for people are for home or work (their Address Type ID begins with either "BUSN" or "HOME"). Codifying these lookup values in one field provides the easiest maintenance, as well as ease of readability and querying of the base table data.

For example, consider the fact that you could find all business addresses for individuals by using a WHERE clause criteria of AddrTypeID LIKE "BUSN*" in the address table. You would not need to join the address table to the lookup table in this scenario to answer the question because the lookup value stored in the address table is descriptive. In contrast, if the lookup table ID values were numeric and that number was stored in the address table, the number itself would not have any descriptive value. Every query with criteria based on the lookup table information would require joining the lookup table to the addresses.

An additional opportunity provided by lookup tables is to redefine a single lookup table to include multiple disparate value types. This can be accomplished through the addition of a flag field to the table to provide the classification used for selecting only a subset of the lookup vales at any given time.


In the previous example, the flag field Company/Contact was used to select a subset of information from the lookup table in order to populate two different types of combo boxes from the same table. I have seen developers take this example to extremes by combining all of an application's lookup tables into a single table when performance-tuning a slow-running application. The table contains a short code and long code field, as well as a flag field for type in each record. All lists in the application query the master lookup table with a WHERE clause against the type value to grab the specific records they need.

The basis for this strategy is a performance theory stating that a single lookup table serving multiple needs performs better than multiple tables. A file server dedicated to an application keeps the most commonly used tables for that application (and their indexes) in cache memory at all times. The theory postulates that a combined lookup table would be used so frequently that it would always exist in the RAM cache, thus increasing form performance.


I am not advocating that you structure your databases in this manner and combine all of your lookups into a single table. However, I document the performance theory for you here so that you can explore it as an option when attempting to speed up your solutions.


Reviewing all of the fields in our sample database layout, we find the need for six lookup tables. Adding these lookup tables into the database structure, plus the field information and relation tables defined in the prior steps, helps us to arrive at a final database structure. The structure and its relationships, once built, appears as in figure 10.11.

Fig. 10.11

This Access Relationships dialog shows the final normalized and optimized data structure resulting from the discussions in this chapter.

By using the example of a contact management system with several many-to-many relationships, I've provided a set of real-world database needs that illustrates some of the challenges you will face when defining a database layout. If you are new to Access application development, you may feel that the example provided is somewhat extreme, and the type of structure shown in the figure is rare. In actual fact, business data storage needs very frequently require the kind of analysis and approaches detailed in the previous five sections. With minor adjustments, you will find yourself using the techniques here in your applications over and over again.

Under-normalizing a Database

Previously discussed in this chapter was the importance of a well-organized database structure, following the rules of "normalization." Most of the time, you will strive to create a database that adheres to the proper usage of parent, child, lookup, and relation tables. Sometimes, however, there are specific tables, or even specific databases, that you may choose not to split into the lowest possible level of component parts.

The consequence of not splitting tables is repeated data. If a table of data is not broken down into its smallest logical units, the same data items may occur more than once in the table. In Table 10.8, the names of contact persons have not been separated from their telephone information into an independent table, so the same person's name is duplicated in multiple telephone records.

Table 10.8 This Table is Under-normalized Because the Repeating Name Data Should Be in a Different Tables

AddrID

FirstName

LastName

Telephone

142

Bob

Hopeful

206-555-1212x145

143

Bob

Hopeful

206-555-4466

144

Bing

Crawford

206-555-1896

Data that is not fully normalized is commonly referred to as "under-normalized", "de-normalized", or "flattened". Here are a few scenarios where an under-normalized database layout may prove valuable:

Under-normalizing a database structure often occurs as you design it. You determine at design time that the structure should be flat to serve on of the purposes in the previous list. Alternately, you can de-normalize an existing database by creating additional tables that store the flattened versions of the normalized data. The flattened tables may be stored in the back-end database, created in a separate database, or exported to another data storage system.

The most common use of flattened tables is in data warehousing, a concept described in Chapter 7, "Understanding Application Architecture." Data warehouses are created by making copies of production data on a regular basis and restructuring the copied data to provide the best possible ease-of-use for researchers. Often times, the restructure involves flattening (combining) several related tables into one.

An alternative to flattening data by appending it into new table structures is to flatten the data with queries instead. Multiple tables can be added to a single query, which produces a datasheet of the combined data that can be used for statistical analysis or reporting.

The determination to use flattened tables versus flattening queries is made based on these factors:


Editing flattened data is a one-way street-changes to the flattened data are not reflected in the source tables from which it came. Worse still, the changes will be lost the next time the flattened data is rebuilt from the source data. It is rare that users would be allowed to edit copies of historical data.

It can be useful to create and save a set of queries that flatten the data in every database you build. This provides users or system administrators with a starting point for data research and analysis, rather than requiring them to create complex joins themselves every time multiple tables are required in the result set. You can document for your users how to do research into the data by using the saved queries.

Using saved queries for data analysis does have certain limitations. For example, in the sample database for this chapter, the creation of family join and grand join queries that flatten all the data into a single datasheet is problematic, because the complex data structure as expressed in figure 10.11 does not allow you to see all information in the database with a single simple query. This problem is discussed further in "Creating Complex Queries" later in this chapter. (Refer back to Chapter 7, "Understanding Application Architecture," for the join terminology such as family join I use to describe queries.)

Nevertheless, saved queries that pull together commonly used views of data can be built and used as report sources or for ad-hoc queries. An example of such a query from the sample contact database is shown in figure 10.12; the query combines company records with their addresses and the phone numbers for the addresses. This type of query would commonly be used to search for address or phone information or print phone list reports.

Fig. 10.12

This query "flattens" related table data into a single structure for easier analysis.

Cataloging and Finding Record Attributes

As you structure a database, you frequently find that a single data record needs to collect multiple attributes related to the record. This is usually referred to as "keywording" the data records. The challenge of keywording crops up regularly as you build expert solutions.

Keywording a data record involves attaching multiple attributes of the same type to a specific record. The simplest way to give a record multiple attributes is to place the attributes in a delimited string in a text field within the record. Using our contact information example, a field could be added to the people table to catalog each person's favorite foods, and the field could contain delimited values like those shown in Table 10.9.

Table 10.9 Assigning Multiple Keywords to a Single Record

ContID

FirstName

Favorites

1

Bob

Ham; Pizza


Access' new multi-select list box control is an excellent device for capturing this type of keyword information on forms. See the example in "Using Multi-Select List Boxes" in Chapter 13, "Mastering Combo and List Boxes."


When multiple values are concatenated in a single field this way, the data structure is not normalized with respect to the rules provided in "Designing a Database" earlier in the chapter. Also, saving, extracting, and finding individual values within the delimited keyword list usually involves quite a bit of programmatic overhead in the application. Data structured this way provides several problems for the developer as well as the users. Some of these problems are:

Table 10.10 Each Keyword in a Delimited List Must be Delimited the Same Way

ContID

FirstName

Favorites

1

Bob

Ham; Pizza;

Despite their shortcomings, delimited lists can provide a simple way to store repeating values that are homogeneous. Data structured this way is also very easy to query using AND and OR logic-simply request all people records where the keyword values are LIKE "*ham;*" OR LIKE "*pizza;*"

When the number of records or keywords is large, the ability to search the attribute data using indexes can be valuable. Similarly, you may determine that you want the keyword values in a normalized structure, or you may determine that writing the code to deal with delimited keyword strings is too much work. In these instances, using a separate keyword table for the values may prove more effective.

At the simplest level, a keyword table is like any other child table in a parent-child relationship. Continuing with our example of people's favorite foods, a child table for food favorites would need only a foreign key field with the parent record's ID value and a keyword field. Unlike the single field with repeating values, a keyword field that stores a single value can be indexed for performance, and can also be used in SELECT DISTINCT statements for creating lists of available values or for computing keyword statistics with functions like Count. Table 10.11 shows the information for Bob's favorite foods from Table 10.10 removed from his contact record and placed in a keyword table structure.

Table 10.11 Multiple Keywords Stored Using a Normalized Structure

ContID

Favorite

1

Ham

1

Pizza

At a more complex level, the use of keyword tables presents a dilemma for developers when there are a significant number of keyword types available in the data (for example, you want to track keywords for favorite films in addition to keywords for favorite foods). If a single record can have multiple different types of keyword values attached to it, the developer will be inclined to create one keyword table for each keyword type-one for foods, as shown in Table 10.11, and another one for films.

When the number of different keyword types is small and there is no need in the project plan to add future keyword types to the system, the most "pure" data structure is provided by using a separate table for each type of keyword. Thus, cataloging favorite foods and favorite films would involve two keyword tables. If, on the other hand, the data structure must provide growth and flexibility, combining keywords into a single table is worth exploring.

Let's use the example at hand to clarify the point. Assume that the design requires that a single person in our sample database can have many different types of favorites: favorite foods, favorite films, favorite colors, favorite leaders, and so on. If we create a keyword table for each of these favorites, we have an easy database to manage, but also a cluttered structure-the database cannot be extended to handle another type of favorite without adding an additional keyword table.

If, on the other hand, we create a single favorites table, we will need to supply a field for the "type of favorite" in order to clarify the nature of each keyword. Figure 10.13 shows an example of such a table. Notice that the structure we defined in Table 10.11 has been expanded to include a favorite type code.

Fig. 10.13

This keyword table contains both a type field and a value field.

I describe records following this metaphor as "tag/value pairs" (the type code for the keyword is the "tag"). Structuring records in this fashion allows for infinite flexibility in the actual data because additional tags can be created by the user during entry, or by a system administrator as an addition to a lookup table. In either case, tags can come and go without any change to the schema.

The challenge in a tag/value pair layout comes from the need to query the data. Looking for any individual value in the keyword table presents no problem, because the search criteria is not compound and thus only one record per parent is found. In other words, the following SQL statement would yield a list of parent ID records for people that like pizza:

SELECT ContID FROM tblFav

WHERE FavType = "Food" AND FavValue = "Pizza"

It is also not difficult to find people that like ham or pizza using tag/value pairs. Any of the queries in listing 10.4 provides the desired result.

Listing 10.4 Three Approaches to Finding Tag/Value Pair Keyword Records
Matching Ham or Pizza

SELECT DISTINCT ContID FROM tblFav

WHERE FavType = "Food" AND FavValue IN ("Ham", "Pizza")

SELECT DISTINCT ContID FROM tblFav

WHERE (FavType = "Food" AND FavValue = "Ham")

OR (FavType = "Food" AND FavValue = "Pizza")

SELECT DISTINCT ContID FROM tblFav

WHERE (FavType = "Food")

A more difficult situation arises when you want to find all people that enjoy ham and pizza. Of course, the following SQL string returns no records because it is nonsensical given the data structure:

SELECT DISTINCT ContID FROM tblFav

WHERE (FavType = "Food" AND FavValue = "Ham")

AND (FavType = "Food" AND FavValue = "Pizza")

The problem here is that there's no way to use a single simple SELECT statement to answer the question because the target values are not in a single record; they are in more than one keyword record but the AND operator applies the compound criteria to each single record. Thus, more sophisticated queries must be crafted to answer the compound question.

One option is to "flatten" the data so that it looks more like a single record, then to use a standard compound WHERE clause against the virtual record. A SQL statement that accomplishes this is shown below, and the query in design view is in figure 10.14.

SELECT DISTINCT tblFav.ContID

FROM tblFav

INNER JOIN tblFav AS tblFav_1 ON tblFav.ContID = tblFav_1.ContID

WHERE tblFav.FavType="Food" AND tblFav.FavValue="Ham"

AND tblFav_1.FavType="Food" AND tblFav_1.FavValue="Pizza"

Fig. 10.14

This query tricks Access into performing a multi-keyword search by joining multiple copies of the same base table.

This method is limited because the SQL statement must include one copy of the base table for each major criteria, and must be rewritten for each request to reflect the number of FavType values that will be searched. In other words, expanding the previous query to include people whose favorite colors include blue would require an additional copy of tblFav in the join and the addition of the criteria: AND tblFav_2.FavType="Color" AND tblFav_2.FavValue="Blue".

If the number of criteria options you must present to users can be limited, you can actually create one parameterized version of this query for each unique number of criteria, and save and reuse these queries. This creates an ad-hoc search capability against saved queries that can be used by unsophisticated users. Thus, the query in figure 10.14 previously could be redesigned as shown in figure 10.15 to use parameters, and saved as qryFav_2Criteria. Additional canned queries qryFav_3Criteria, qryFav_4Criteria, and so on could also be created, each with an additional copy of tblFav to apply criteria against.

Fig. 10.15

This parameterized query follows the same search model as in figure 10.14 but uses criteria arguments passed in by the user.


Depending on your system configuration, Jet should have no trouble joining several copies of a table to itself in this fashion. Access had no problem running a version of this query for me that contained ten copies of tblFav and thus accepted twenty parameters to test ten tag/value pair criteria.

A canned query like this can be run from code easily, because the code does not need to write any SQL statements, but instead must simply set the query's parameters as shown in Listing 10.5.

Listing 10.5 Setting Query Parameters from Code

Set dbs = CurrentDb

Set qdf = dbs.QueryDefs("qryFav_2Criteria")

qdf!Type1 = "Food"

qdf!Value1 = "Ham"

qdf!Type2 = "Food"

qdf!Value2 = "Pizza"

Set rst = qdf.OpenRecordset()

Self-joining a table to copies of itself in this fashion does not always provide the best performance and is limited to the number of self-joins that the user's system resources can accommodate.


A self-join is inefficient because it creates a "cross-product" result set, meaning that each value in the table is joined to all other matching values. For example, if there are six values for the ID number 1, each of the records with this ID is joined to itself plus the other five. This produces a total of 36 (six multiplied by six) records in the join's result set. As you continue to add more copies of a table to the self-join, this growth continues geometrically.

There are two alternative approaches to this style, both of which involve dynamically creating SQL statements in code that test the desired criteria.

The first SQL statement approach utilizes Jet's ability to perform a sub-select (a query within a query). In Jet, sub-selections are handled with the IN operator, as in the example in Listing 10.6.

Listing 10.6 A Sub-Select Query Using an IN Clause

SELECT * FROM tblCont

WHERE tblCont.ContID IN

(SELECT tblFav.ContID FROM tblFav

WHERE tblFav.FavType = "Food"

AND tblFav.FavValue = "Ham")

AND tblCont.ContID IN

(SELECT tblFav.ContID FROM tblFav

WHERE tblFav.FavType = "Food"

AND tblFav.FavValue = "Pizza")

In the syntax shown, the primary SELECT statement takes the result of two sub-select statements and joins them together to find only the records that match both criteria sets.


In my testing, Jet had no problem running a query of this style with ten IN clauses compounded together, but this type of query will eventually run out of resources at some finite number of criteria (and it's not very fast).

A better approach to tag/value queries that involve "and" comparisons uses the intriguing approach of using "or" comparisons and then counting the matches to find the correct target number. In other words, find all people that like ham or pizza and count the matches, and only select the people that matched to both criteria.

Figure 10.16 shows the query design to achieve this. The SQL statement for the query must be rewritten from code for each use in order to place the appropriate criteria into the WHERE clause and the appropriate number into the match for the Count, as shown in this example:

Listing 10.7 AES_Tbl.Mdb-A Sub-Select Query Using an IN Clause

SELECT ContID FROM tblFav

WHERE (FavType = "Food" AND FavValue = "Ham")

OR (FavType = "Food" AND FavValue = "Pizza")

GROUP BY ContID

HAVING Count(ContID) = 2

Fig. 10.16

A GROUP BY query that counts record occurrences provides a handy device to find keyword values.


In the previous syntax, note the difference between the HAVING clause and the WHERE clause. WHERE is applied to the base records in the table, while HAVING is applied to the data after it has been restricted and grouped. HAVING is essentially a WHERE clause restriction against the aggregated result set.

While each of the techniques in this topic is viable and useful, none are simple to use for ad-hoc queries. Either the user must understand the type codes and values available in the data in order to ask the appropriate question, or Basic code must be run to write an SQL string into a query object that drives the search. In either case, querying keyword data is usually facilitated in an application by the use of a form that places a "wrapper" around the complex process of selecting and using multiple keywords.


Test each of these techniques against the actual data in your application before determining the one to use. Depending on the type of data, the number of records in the keyword table, the uniqueness of keyword table indexes, the average number of keywords used in a search, the average users' hardware, and other factors, one technique here may show a significant performance increase over the others. In general, the technique using HAVING will run the fastest.

Also, when repeated operations must be run against the results of the keywords search, use INSERT INTO or SELECT INTO in the SQL string to place the chosen ID values in a temporary table, then base forms or other processes on the temporary table to avoid running the keyword search more than once.

Building High-Powered Queries

Once you've mastered the challenge of refining your database table layouts, your next hurdle becomes mastering the retrieval of information from the tables. Not all applications allow users to browse, sort, or filter table datasheets to scrutinize data attributes. Instead, filtered views of data are provided to users through forms and reports. The data retrieval engine for these forms and reports is Jet's query processor.

Understanding the optimal way to utilize this query processor, as well as the nature of queries themselves, will assist you in creating expert solutions.

Creating Faster Queries

Queries are the "big machines" that run the factory of your application. They provide the horsepower for record selection on forms and reports, along with data analysis, record exports, and bulk operations. Consequently, you should always spend some time at the end of your application development cycle "tuning" your queries to make them run as fast as possible.

Here a few guidelines for getting the best performance from queries:


Data operations may slow down over time as the database file becomes "fragmented," with table records scattered over many different locations in the MDB file and perhaps many different locations on disk. In this scenario, compacting helps to speed up queries and other table data retrieval because it copies all records for each table into a contiguous area within the database file and copies all index entries into contiguous order by primary key (this is called "clustering").


Even though sorting in queries utilizes indexes where available, sorting is one of the greatest performance hits on a query. Use sorting only when required, and never use sorting in a query that will be subordinated inside another query-when nesting queries, apply sorting in the highest level query only.

Because the Jet group put some effort into improving sort performance in Jet 3.5, this tip is less urgent than in pre-97 releases of Access. You may no longer need to think of sorting as a serious performance negative, but you will still want to minimize where it occurs in your applications.

column1 operator expression1 OR column2 operator expression2

column1 operator expression1 AND column2 operator expression2

SELECT

InvcNum,

InvcDate,

Month([InvcDate]) As InvcMon

FROM tblInvc


I once saw timing tests for Access 2 that measured the difference between an expression in a comparison (such as our WHERE Month([InvcDate]) BETWEEN 10 AND 12) and a direct comparison (such as WHERE InvcDate BETWEEN 10/1/96 AND 12/31/96). The test showed the direct comparison variation as more than 100 times faster than the expression-based comparison!

I have not attempted to duplicate these timing tests in Access 97, but it is safe to say that the performance differential between the two techniques, if not quite as extreme, will still be measurably significant.

Listing 10.8 Two SQL Approaches to Grouping and Counting Addresses by State

' This query groups and counts on the StatID field in tlkpStat

SELECT

tlkpStat.StatName,

tlkpStat.StatID,

Count(tlkpStat.StatID) AS CountOfStatID

FROM tlkpStat

INNER JOIN tblAddr ON tlkpStat.StatID = tblAddr.StatID

GROUP BY

tlkpStat.StatName,

tlkpStat.StatID

' This query groups on the StatID field in tlkpStat and counts on tblAddr

SELECT

tlkpStat.StatName,

tblAddr.StatID,

Count(tblAddr.StatID) AS CountOfStatID

FROM tlkpStat

INNER JOIN tblAddr ON tlkpStat.StatID = tblAddr.StatID

GROUP BY

tlkpStat.StatName,

tblAddr.StatID;

Listing 10.9 Using the First Function to Remove One Criteria From the GROUP
BY

SELECT

First(tlkpStat.StatName) As StatName,

tlkpStat.StatID,

Count(tlkpStat.StatID) AS CountOfStatID

FROM tlkpStat

INNER JOIN tblAddr ON tlkpStat.StatID = tblAddr.StatID

GROUP BY

tlkpStat.StatID

Understanding Join Types

Access provides effective capabilities for joining multiple data tables, based on the SQL model. Understanding the various join options allows you to create more useful and powerful queries in your applications.

Essentially, you can join tables according to one of two models:

Because an outer join has a driving table, it is described as driven from either the "right" or the "left," as defined by the direction of the one and many sides of the join line in the visual representation of the query layout (the Access query grid):

I'll quickly illustrate these join types. Assume that our sample data from the contact management system includes records for five contact people in tblCont (ID values 1 through 5), but only two of them have been assigned personal preferences in the table of favorites tblFav (ID values 1 and 2).

Figure 10.17 shows an example of a simple multi-table query that joins these two tables with an inner join. The datasheet returned shows only the records that have matching join field (ID) values in both tables.

Fig. 10.17

This datasheet results from an inner join on tblCont and tblFav, showing only the items that are common between them.

Listing 10.10 shows the join syntax behind the query shown in figure 10.17.

Listing 10.10 AES_Tbl.Mdb-The Inner Join Query That Produces the
Figure 10.17 Datasheet

SELECT

tblCont.ContID,

FirstName,

LastName,

FavType,

FavValue

FROM tblCont

INNER JOIN tblFav ON tblCont.ContID = tblFav.ContID

Figure 10.18 takes the same query and executes it using LEFT JOIN syntax, which returns all records from the driving table tblCont joined to any matching rows in tblFav.

Fig. 10.18

This datasheet results from a left outer join on tblCont and tblFav, which returns all the records for tblCont.

Listing 10.11 shows the join syntax behind the query shown in figure 10.18.

Listing 10.11 AES_Tbl.Mdb-The Left Outer Join Query That Produces
the Figure 10.18 Datasheet

SELECT

tblCont.ContID,

FirstName,

LastName,

FavType,

FavValue

FROM tblCont

LEFT OUTER JOIN tblFav ON tblCont.ContID = tblFav.ContID

Notice that the layout of the query remained the same-only one join type keyword ("INNER") changed (to "LEFT OUTER") to facilitate a completely different type of relationship between the tables.


Applying the third join type ("RIGHT OUTER") to the two tables in this example would produce the same result as with the inner join, because when tblFav becomes the driving table there are no values in it that are not in tblCont; thus, only records with ID values 1 and 2 show up in the resulting set.


A child table in a database usually always contains values that are linked to its parent table. Thus, it is rare that a right outer join is used from a child to a parent because there is no benefit from including all child records in a join regardless of a matching parent. Because of this situation, the most common join type in a relational database is a left outer join from a parent table to its child, with the parent as the driving table.


When multiple tables are included in a join, the join statements become "nested." Nesting involves the use of parentheses to track the grouping of SELECT statements and JOIN statements when there are more than one of each in the SQL. Nested joins are simple to create in the query design grid by dragging multiple tables into the grid. When writing nested joins from code, however, you must pay attention to the compound nesting (parenthetical grouping) prescribed in the syntax, as in the example of a four table join in Listing 10.12.

Listing 10.12 Use Parentheses to Group Join Statements Where More Than
One Appear in the SQL

SELECT

table1.field1,

table2.field2,

table3.field3,

table4.field4

FROM table1 INNER JOIN

((table3 INNER JOIN table4 ON table3.field3 = table4.field3)

INNER JOIN table2 ON table2.field2 = table3.field2)

ON table1.field1 = table2.field1

This syntax flow reflects the nesting code generated by the query design grid. When creating nested joins from code, you may prefer to make your code more readable by using the syntax structure shown in Listing 10.13. The syntax in the listing is supported by Jet but is easier to read than the syntax generated by the query grid in Listing 10.12.

Listing 10.13 The Join Statement in Listing 10.12 Rewritten for Readability

SELECT

table1.field1,

table2.field2,

table3.field3,

table4.field4

FROM table1 INNER JOIN

(table2 INNER JOIN

(table3 INNER JOIN table4

ON table3.field3 = table4.field3)

ON table2.field2 = table3.field2)

ON table1.field1 = table2.field1

You can create highly complex inner joins in Jet, using either code or the query design grid. However, outer joins are much more limited. For example, you cannot have multiple tables outer-joined to the same table. In other words, the query design in figure 10.19 is not allowed in Jet SQL.

Fig. 10.19

This type of multiple outer join query is disallowed by Jet.

You can, however, have one table that is inner-joined to another table and whose result set is then outer-joined to still other tables. From an SQL perspective, this means that an INNER JOIN can create a virtual "driving table" for a LEFT JOIN or RIGHT JOIN, but not the reverse. Thus, the relationship in figure 10.21 is allowed, but not the relationship in figure 10.20.

Fig. 10.20

A query like this that attempts an inner join nested in an outer join is disallowed by Jet.

Fig. 10.21

Jet allows queries like this with an outer join driving an inner join.

Notice that joining tables in any of the examples thus far involved an equivalence where a value in one table must exactly equal the matching value in another table (table1.field1 = table2.field1). Joins involving equal comparisons are termed "equi-joins." All table joins in the Access query design view are "equi-joins," meaning that the linked fields between two tables are compared for equality when the join is run.

In the SQL view of query design, or in program code, you can create joins that aren't equi-joins, that cannot be expressed in the query design grid, and that do not use the = operator. Such queries join tables by using field comparisons with the non-equality join operators, such as >=.

To create advanced queries for compound comparisons, use this syntax:

SELECT

field1,

field2

FROM table1 INNER JOIN table2

ON table1.field1 operator table2.field1

The designation operator indicates a relational comparison operator: >, >=, <, <=, and <>.

A join using an inequality operator on record key values has limited usefulness and may produce nonsensical results, but an inequality join on non-key values can be useful in statistical analysis of data. Using our contact database to build an example, the syntax in Listing 10.14 is a standard inner join that uses an equivalence and says "show me the state name for the state in address record 1."

Listing 10.14 A Standard Inner Join Using an Equi-Join

SELECT

tblAddr.AddrID,

tblAddr.StatID,

tlkpStat.StatName

FROM tlkpStat

INNER JOIN tblAddr ON tlkpStat.StatID = tblAddr.StatID

WHERE tblAddr.AddrID=1;

The syntax in Listing 10.15 takes the join from Listing 10.14 and simply changes the equality operator = in the join to <>. The join now says "show me the state names that are not used by address record 1." The result set of the join contains a record for every state lookup value except the state in address record 1.

Listing 10.15 The Join in Listing 10.14 Expressed With an Inequality in
the Join

SELECT

tblAddr.AddrID,

tblAddr.StatID,

tlkpStat.StatName

FROM tlkpStat

INNER JOIN tblAddr ON tlkpStat.StatID <> tblAddr.StatID

WHERE tblAddr.AddrID=1;

A final join strategy available in program code but not the query design grid allows you to join tables on multiple fields in a compound relationship that does not depend on the AND operator. Normal multi-field join operations use AND to link two compound field join statements, but the following example using OR to compound the links is also valid in Jet SQL:

SELECT

table1.field1,

table2.field3

FROM table1 INNER JOIN table2

ON table1.field1 = table2.field1

OR table1.field2 = table2.field2

Again, the utility of this type of statement is limited to somewhat obscure data analysis joins.

In the sample database for this chapter, some very common data management challenges with respect to joins are exhibited. First and foremost is how to express the relationships between the tables within SQL statements and the query design window. The driving queries that produce most of the figures in this chapter are included in the sample database.

Figure 10.22 shows the relationships built for the sample contact management database. Note that while building relationships between the tables, I also defined referential integrity information for the relationships. Referential integrity is the use of the database engine or code to maintain the linkage between parent and child data in two tables. Referential integrity in Jet involves the establishment of cascading update and deletion rules, which are reflected by Access' display of the one and many symbols at the ends of the join lines in the Relationships window. I defined cascading updates for all one-to-many relationships in the sample database.

Fig. 10.22

This Access Relationships dialog shows a complete database structure with defined relationships and referential integrity.

The database diagrams you create during the design phase express relationships conceptually, while the relationships you create in Access itself are specifically for data management and to aid in building queries. Here are two specific examples of the difference:

As an additional preference item for lookup tables, I define cascading updates from a lookup table to its base table(s), but I do not define cascading deletes. When a system administrator changes a lookup table value, I want the new value to cascade into all related transaction table entries. On the other hand, a lookup table value that exists in transaction data should never be deleted, because live data records would be left orphaned. Thus, I disallow cascading deletions from lookup tables into base table data because I assume that the deletion of a lookup table value is an error on the part of the database administrator, and I do not want this error to cascade into any base table records.

Creating Complex Queries

Relying heavily on relation tables and inner joins, as in our example database, causes significant data-retrieval challenges. With inner joins, records are retrieved based upon the presence of a common key in both joined tables. When one of the two tables does not possess a common key value, neither record is returned. When working with data structured like our contact information, this situation can produce undesirable results. For example, people with no addresses are not displayed in the results of a query inner-joining people and their addresses.

Union queries and sub-queries provide you with the tools you need to solve this problem. Consider the queries in figure 10.23 and 10.24. The query in the first figure (fig. 10.23) returns a list of people and their addresses. This query is an inner join and thus will not list people that have no address records.

Fig. 10.23

This inner join query returns only people who have address records.

Fig. 10.24

This query uses a subquery only to return people with no address records.

The second query (figure 10.24) returns only people with no addresses, by virtue of this query criteria:

WHERE tblCont.ContID Not In

(SELECT trelContAddr.ContID FROM trelContAddr)

This criteria returns records that are in the contact table but not in the relation table that links contacts to addresses, thus listing contacts without addresses. In order to display all contacts, whether or not they have addresses, the UNION keyword is applied to append the results of the two queries I've just shown, as in the statement in Listing 10.16.

Listing 10.16 AES_Tbl.Mdb-A Union Query That Appends People With No
Addresses to People With Addresses

SELECT DISTINCTROW

tblCont.*,

trelContAddr.MailStop,

trelContAddr.AddrTypeID,

tblAddr.AddrID,

tblAddr.AddrName,

tblAddr.Addr1,

tblAddr.Addr2,

tblAddr.City,

tblAddr.StatID,

tblAddr.PostalCode

FROM tblCont

INNER JOIN

(tblAddr INNER JOIN trelContAddr

ON tblAddr.AddrID = trelContAddr.AddrID)

ON tblCont.ContID = trelContAddr.ContID

UNION

SELECT DISTINCTROW

tblCont.*,

"" AS MailStop,

"" AS AddrTypeID,

"" AS AddrID,

"<none>" AS AddrName,

"" AS Addr1,

"" AS Addr2,

"" AS City,

"" AS StatID,

"" AS PostalCode

FROM tblCont

WHERE tblCont.ContID Not In

(SELECT trelContAddr.ContID FROM trelContAddr)


Notice that there are placeholder fields (as expressions) in the second SELECT statement in this query. When creating a union query, each participating SELECT statement must return the same number of fields as the first statement.

Consider, however, that there are three tables joined in the first SELECT string, but only a single table in the second SELECT string. Thus, there is a need for the number of fields in both result subsets to match, so the dummy expressions in the second SELECT string are added to facilitate this.


Figure 10.25 shows the result of the query in Listing 10.16. Note that the contact person Gill Bates, who has no address, is displayed in the result set.


In the figure, I've reordered the fields from the order shown in the prior SQL statement to show you how the address data is returned by the union.

Fig. 10.25

A union query between people that have addresses and people that don't displays all people records.


You should establish a style that defines how much clutter you prefer in the Database window with respect to compound queries and their component parts. If you prefer a sparse style, you would create complex queries that perform all required operations in their SQL statement, as shown in my previous example.

If, on the other hand, you have no objection to littering the Database window with query components, you might create each of the two SELECT strings shown in figures 10.23 and 10.24 and save them as two stored queries, then define your UNION query to simply join the results of the component queries, as in this example:


SELECT * FROM qryContAddr

UNION

SELECT * FROM qryContNoAddr

An additional structural challenge exemplified by the current sample contact database structure is that our design allows a given attribute table to service more than one parent table (this is called "multiple parentage"). For example, the contact point table tblCommPoint contains phone numbers for address records as well as direct (non address-related) phone numbers for contact persons.

To properly utilize tables structured this way, you must be careful not to join each of the parent tables to the shared child table in query design. While multiple parentage is an acceptable database layout strategy, queries that utilize it improperly suffer from "circular references" and will not produce accurate results.

Consider the following scenario. Each person in our database can have phone numbers that are attached to the person's address, and also phone numbers that are attached directly to the person (such as a cellular phone). You want to create a query that shows each person's address-related telephones alongside their direct phones.

Following the data structure, your first inclination would be to create the query shown in figure 10.26. This query properly reflects the question you are attempting to answer, but in doing so exhibits a circular reference from tblCommPoint to both the address and the contact tables. While this query does not generate an error, it also generates no matching records.

Fig. 10.26

This query contains a circular reference to tblCommPoint and returns no records.

The more appropriate solution to the problem is to add a second copy of the telephone number table to the query, so that each parent table can reference its own child table. This layout allows one parent to find its matches in tblCommPoint, and the other parent to find its own matches in the aliased tblCommPoint_1, as shown in figure 10.27. The results of the query are shown in figure 10.28.

Fig. 10.27

Placing two copies of tblCommPoint into the query from figure 10.26 removes the circular reference.

Fig. 10.28

The results of running the query in figure 10.27 show address-related phone numbers and direct phone numbers for people.

No discussion of high-powered query options would be complete without a mention of GROUP BY (aggregate) queries. Queries that "rollup" or summarize data are highly valuable when studying and manipulating data.

As you group records in a query, you can apply SQL aggregate functions to the data to affect the rollup and to provide additional information about the data in the query's result set. The SQL aggregate functions apply to a group of records called a domain.

When grouping data, one domain equates to the all of the records in the base table that constitute (roll-up to) a single record in the aggregate result set. In other words, assume that an address table has six records, three for Washington and three for Oregon. When a GROUP BY query is run against this data to group by the State field, the query's result set will have two records (one per unique state). However, each of the two records will be based on a different domain of three records each (the data records for that state). Thus, any domain aggregate function applied in the query will be first applied to the records in the Oregon domain, then to the Washington domain.

Understanding this functionality is important because many Access users misconstrue how aggregate functions, especially First() and Last(), are applied to the underlying records. Essentially, the process is as follows:

  1. Records are sorted according to the fields in the GROUP BY clause in the statement, sorting from left to right.
  2. Each block of records that constitutes a domain is identified. These blocks are the groups of records identified by the lowest level of grouping.
  3. Aggregate functions are applied to each domain, and the results are displayed as a record in the result set.

Here are the domain aggregate functions that can be used in queries:

Using our sample contact database, assume that we want to find all people that have two or more direct phone numbers (cellular and car phones). Figure 10.29 shows a GROUP BY query that answers the question.

Fig. 10.29

This query uses a count to find people that have more than one direct phone number.

The SQL statement in Listing 10.17 drives the query shown in the figure.

Listing 10.16 AES_Tbl.Mdb-The SQL for the Query in Figure 10.29 Uses
the SQL Count Function

SELECT

trelContCommPoint.ContID,

First(tblCont.FirstName) AS FirstName,

First(tblCont.LastName) AS LastName,

Count(trelContCommPoint.ContID) AS Count

FROM tblCont

INNER JOIN trelContCommPoint ON tblCont.ContID = trelContCommPoint.ContID

WHERE trelContCommPoint.CommPointType Like "Phone*"

GROUP BY trelContCommPoint.ContID

HAVING Count(trelContCommPoint.ContID) > 1;

Driving the DAO From Program Code

As the name implies, the most important function of the Data Access Objects is to provide you with a way to manipulate data and the container objects that store it. For many developers, the DAO's ability to create a Recordset object attached to a Database object is the only part of this engine that they see.

Lurking within the DAO's object model however, is the gateway to the actual structure of your Jet databases. DAO can be made to perform these tasks in addition to manipulating recordsets:

Because many of the listed tasks can be performed from the Access user interface, their utility is marginal. For example, I rarely see developers creating and modifying security settings from program code written against the DAO. However, other features of the DAO can be downright exciting, especially the ability to create new tables and fields.

In the next two sections, I provide you with a practical example of writing DAO-based code. The example derives from the "Creating Data Structure Documents" section in Chapter 4, "Creating Design Specifications." In that chapter, I explained that I like to draft our database designs in Excel. Excel as a design tool provides easy sorting, grouping, dragging, copying, automating, printing and similar tasks that are useful during the design process. At the end of the design process, I am left with a workbook filled with information and no database structure. I solve this problem with DAO code.

Structuring and Importing Schema Design Information

From Access, I've written code to extract the database design from Excel and build the tables listed there. The code for this is not complex, but the benefit is huge. My team members can spend several weeks at a customer site working with the users, designing the database structure in Excel and writing the specification document in Word. When the design is complete and approved by the users and by me, we simply run our Schema Builder tool against the workbook and produce a new database. For a large application, the tool can save several hours of typing table definitions from the specification document.


The Schema Builder tool is provided on the CD as SCHEMA.MDB, and the specification worksheet that drives it is there as well, in SPECSHEL.XLS.

Here is an outline of how the Schema Builder tool works:

  1. Populate a specification worksheet with a database definition. During the design phase, we create an Excel worksheet that contains table and field attribute information. A worksheet is a handy tool because it can easily be modified on a laptop during design meetings, e-mailed to design team members for review, and quickly searched/audited in aggregate. Prototyping the database structure in Access during the design phase takes significantly more time than doing so in a worksheet.
  2. Populate a table in the Schema Builder with the design information. We run a routine in the Schema Builder to import the specification information from the Excel worksheet into an Access definition table where it can drive the database-building process.
  3. Create tables and properties based on the database definition. The main Schema Builder routines are run and use the information in the definition table to create tables, fields, properties, and indexes.

The starting point for the schema building process is an Excel worksheet created during the application design phase. The schema definition worksheet is shown in Figure 10.30.

Fig. 10.30

The Excel schema design worksheet used by the Schema Builder tool.

The columns in the Excel worksheet match the items of information required by Jet to build a database structure, plus some informational items that are useful in the design and specification process. The columns we use in our design worksheet are listed in Table 10.12.

Table 10.12 The Columns in Our Excel Database Design Worksheet

Column

Contains

Table Name*

Table name

Order*

Field order within the table

Field Name*

Field name

Caption*

Field Caption property

Type*

Field data Type and Size properties

Description*

Field Description property

Ctl

Default control type to use for this field on forms

Index*

Type of index for this field

Reqd*

Field Required property

Linkage

Relationship of this field to a parent

Format*

Field Format property

Decimals*

Field DecimalPlaces property

Default*

Field DefaultValue property

Input Mask*

Field InputMask property

Value Source

Source of lookup values for field

Validation*

Field ValidationRule property

Comments

Field comments

* These items are translated into Jet database property values by the Schema Builder and used during database creation. Worksheet columns in the table not noted with * are informational for developers but not used by the tool.

Extracting the information from Excel is a simple matter of using Automation code. Each of the columns in the worksheet that will provide structure information to the Schema Builder is named as a range in Excel, so the tool can extract information from the worksheet based on range names and place the information into an Access work table.

The extraction process is table driven so that a developer can customize it for each operation. The process is as follows:

  1. The developer determines the columns that he or she is interested in importing and flags the columns for import in a configuration table lci_sb_tblXLSImportMap. The table is shown in Figure 10.31. The table contains a record for each field in the database definition table that drives the Schema Builder. In turn, the fields in the database definition are mapped to Jet field properties (AllowZeroLength, Attributes, and so on). A field in the definition table holds information about one field's property in the database to be built.
  2. The Schema Builder code looks at the configuration table to determine which fields to extract from the Excel worksheet. If the field is flagged in the configuration table by virtue of a value in the RangeName column, the tool looks for the matching range name in Excel and imports values from it.
  3. The tool iterates through the worksheet and grabs the targeted information, placing each value in a destination table lci_sb_tblSchemaBuilder in the Access database. This is the definition table used by the Schema Builder. The name of the target definition table field for the information extracted from the worksheet is listed in the configuration table's LoadField column.

Fig. 10.31

The Schema Builder imports values from the design worksheet that match the range names in the RangeName column in this configuration table.

The process of importing from Excel is table-driven so that a developer can customize a worksheet for a particular project and still import the worksheet by changing values in the configuration table. For example, a client may be excited about using the worksheet to help us design their database, but may want to simplify the worksheet by removing certain fields such as Format and Validation. Users often expect that we will create these properties during database development using our discretion. A reduced-scope worksheet can still be imported by the tool by removing the appropriate RangeName values from the configuration table.

Using a configuration table also allows the system to grow later. The table in the figure has field names in the schema definition table (in the LoadField column) that do not have a corresponding range (yet) in the design worksheet. Because our code is generic, however, and reads the configuration table, adding more columns to the worksheet to match fields in the definition table would require only the entry of the corresponding range name in the configuration table. No code would need to be changed.


When you create tools or design application processes, try to make them easily reconfigurable like the example described here. Using values in a database table or enumerated constants in code are two ways to provide configuration information to a process that can easily be modified later without changing program logic.


The extraction code that is guided by the configuration table to pull information from the Excel worksheet is shown in the next three listings. All three listings combine to make the extraction routine lci_sb_XLSDefnGet(). I have removed variable declarations and simplified the listings to make them more readable, but you can infer the workings of a variable from its LNC tag and base name. For the complete code, see the example database on the CD-ROM.


If you are interested in DAO coding but not concerned with learning the mechanics of Excel Automation, skip ahead to the section "Building a Database From Code."

Listing 10.17 shows the setup portion of the import routine. The code creates recordsets for the configuration table and for the definition table where the imported information will be placed. The code also opens an Automation instance of Excel and opens a schema worksheet.

Listing 10.17 Schema.Mdb-This Code Begins the Process of Importing Excel
Data into an Access Table

Set dbs = CurrentDb

' Get list of ranges to import

' This recordset contains the records from the configuration table

' that list a range name to import

Set rstXLSMap = dbs.OpenRecordset _

("SELECT * FROM lci_sb_tblXLSImportMap WHERE LoadTable = '" _

& cstrSchemaTbl & "' AND RangeName IS NOT NULL", dbOpenSnapshot)

rstXLSMap.MoveLast

' The strCol array will hold the column number represented by each range

' The array is populated in Listing 10.18

ReDim strCol(rstXLSMap.RecordCount) ' Array for ranges to import

' Get the target table

' This recordset will receive the new definition records created

' with information from the worksheet

Set rstBuilder = dbs.OpenRecordset(cstrSchemaTbl, dbOpenDynaset)

' Open the schema worksheet

Set xlsapp = CreateObject("Excel.Application")

xlsapp.Workbooks.Open "C:\Data\Schema.Xls"

Set xlswkb = xlsapp.ActiveWorkbook

Set xlswks = xlswkb.Worksheets("Schema")

rstXLSMap.MoveFirst

The second block of Excel import code is shown in Listing 10.18. To facilitate the import, the worksheet row that holds the column headers (row 7 in Figure 10.30) has been given a range name of "Headers." The code in the listing identifies this row and places its number in a variable intHeadingRow. The variable is referenced by the import code to determine the starting row for the import process (in other words, the data to import starts on the row after the header row).

Next, the code populates an array with the column numbers that go with the range names to be imported. Each import range in the worksheet was originally defined as one column. The code pulls the column letter, such as "E", from the range and puts it into the array. This column letter is used later when walking down the rows in the worksheet, because the import code must read an entire row by cells in order to create one row for the Access table. To move across one worksheet row and only extract specific cells, the column letters in the array are used to create half of a target cell's address; the row number provides the other half of the address.

Listing 10.18 Schema.Mdb-This Code Establishes an Array and Other
Variables That Map the Excel Data For Import

' Extract the dimensions of the Headers range

' Knowing where the header range is tells us where the data to import starts

' A range reference looks like this: =Schema!$7:$7

strWork = xlswkb.Names("Headers").RefersTo

intLoc = InStr(1, strWork, "!$")

intHeadingRow = Mid(strWork, intLoc + 2, 1)

iintLoop = 1

' Loop through the import range list and build an array with column

' designations, one for each range to import

Do Until rstXLSMap.EOF

intWorksheet = Len(xlswks.Name) ' Length of worksheet name

' The RefersTo property returns the bounds of the range

' referred to by the range name, like =Schema!$7:$7

strWork = xlswkb.Names(rstXLSMap!RangeName).RefersTo

If Left(strWork, intWorksheet + 3) = "=" & xlswks.Name & "!$" Then

' Extract the letter of the range column, such as E

strWork = Mid(strWork, intWorksheet + 4)

intLoc = InStr(1, strWork & ":", ":")

' Add the column reference to the column array

strCol(iintLoop) = Left(strWork, intLoc - 1)

End If

rstXLSMap.MoveNext

iintLoop = iintLoop + 1

Loop

' Find last row in worksheet (as measured by column A data)

intRowMax = xlswks.Range("A1:A1").SpecialCells(xlLastCell).Row

The final portion of the import routine is shown in Listing 10.19. This code uses the values and objects established in the previous two listings to loop through the worksheet, extracting cell values and loading them into the database definition table in Access.

The code loops from the first data row (after the header row) to the end of the active area of the worksheet. As it loops, it creates a row/column cell address from the saved array values built in the previous listing and pulls a value from the identified cell. The cell's value is loaded into the Access database definition table.

Listing 10.19 Schema.Mdb-Desired Cells in the Worksheet are Identified
and Used to Populate the Access Table

' Loop through each row in the worksheet that has field definitions

For iintRow = intHeadingRow + 1 To intRowMax

' Only proceed if the first cell in the row has data

If Not IsEmpty(xlswks.Range("A" & iintRow).Value) Then

rstXLSMap.MoveFirst

iintLoop = 1

rstBuilder.AddNew ' Create a definition table record

' Loop through the list of ranges to import

Do Until rstXLSMap.EOF

' Load the destination table with the worksheet value

' A Boolean field like Required in the definition table must be

' translated from the text on the worksheet, all other values

' are stuffed into the table verbatim

Select Case rstBuilder.Fields(rstXLSMap!LoadField).Type

Case dbBoolean ' Convert Yes/No on worksheet into Boolean

rstBuilder(rstXLSMap!LoadField) = _

(xlswks.Range(strCol(iintLoop) & CStr(iintRow)).Value = "Yes")

Case Else

rstBuilder(rstXLSMap!LoadField) = _

xlswks.Range(strCol(iintLoop) & CStr(iintRow)).Value

End Select

rstXLSMap.MoveNext

iintLoop = iintLoop + 1

Loop

rstBuilder.Update

End If

Next iintRow

xlsapp.Application.[Quit]

Set xlsapp = Nothing


The Excel code in the previous listing is "legacy" code, something we wrote years ago and continue to milk. It has not been upgraded to newer versions of Excel over the years and may not reflect the best approach under Excel 97. Because this chapter is not an Excel tutorial, I did not spend time optimizing the extraction code; you should feel free to do so.

After running the code in the three listings in this section, the schema definition information has been pulled from Excel into an Access table. The populated table is shown in the section that follows.

Building a Database From Code

In the previous section, we explored Automation code to run an extraction process that pulled database definition information from an Excel worksheet. The Excel information was transferred to an Access definition table called lci_sb_tblSchemaBuilder, as shown in Figure 10.32.

Fig. 10.32

This database definition table contains enough information to allow the Schema Builder to build a complete database.


The process here described depends on a database definition table, but the table need not be populated from Excel. The techniques here can still provide value to your development process even if you are not interested in designing databases in a worksheet.


As long as you can populate the database definition table with the required information, you can run the Schema Builder routines against it. To populate the table without Excel, you could import schema information from another modeling program such as InfoModeler, key the information in by hand, or create an Access form to help you define the table and field layouts that populate the table.


The database definition table shown in the figure provides the information for the schema creation process, a process that involves DAO code. The process simply iterates through the records in the schema definition table and builds a database structure from the information.

The following four listings divulge the majority of the core code in the Schema Builder. I have removed variable declarations, error trapping, and special scenario testing in order to simplify the listings to make them more readable. You can infer the type of a variable from its LNC tag and the purpose from the name or value assignment. For the complete code, see the example database on the CD-ROM.

The driving routine for the creation of tables and fields is in Listing 10.20.

Listing 10.20 Schema.Mdb-This Routine Takes Schema Definition Information
From a Table and Creates New Tables and Fields

Public Function lci_sb_TblCreate() As Boolean

' Purpose: Create new tables from field definitions in a table

Const cstrSchemaTbl = "lci_sb_tblSchemaBuilder"

Set dbs = CurrentDb

' Get table names

Set rstTblDefn = dbs.OpenRecordset _

("SELECT sb_TableName, sb_Description FROM " & cstrSchemaTbl _

& " WHERE sb_TableName IS NOT NULL AND sb_OrdinalPosition = 0" _

& " ORDER BY sb_TableName", dbOpenSnapshot)

' Loop through definition table

Do Until rstTblDefn.EOF

' Note: Jet requires that we create the table and fields first,

' then add the properties after the table is appended

' Create the new table

varTbl = rstTblDefn!sb_TableName

Set rstFldDefn = dbs.OpenRecordset _

("SELECT * FROM " & cstrSchemaTbl _

& " WHERE sb_TableName = '" & varTbl & "' AND sb_OrdinalPosition > 0" _

& " ORDER BY sb_OrdinalPosition", dbOpenSnapshot)

Set tdf = dbs.CreateTableDef(varTbl)

' Loop through one table's fields and create them

Do Until rstFldDefn.EOF

' Create fields

blnRet = lci_sb_FldCreate(rstFldDefn, tdf) ' Create one field

rstFldDefn.MoveNext

Loop

dbs.TableDefs.Append tdf ' Append table to database

' Cannot add custom property to table until after it's appended

dbs.TableDefs.Refresh

intRet = lci_sb_TblPrpCreate(tdf, "Description", dbText _

, rstTblDefn!sb_Description)

' Loop through one table's fields and create properties

rstFldDefn.MoveFirst

Do Until rstFldDefn.EOF

' Add field properties

Set fld = tdf.Fields(rstFldDefn!sb_Name)

blnRet = lci_sb_FldPrpCreate(rstFldDefn, fld) ' Create properties

If blnRet And Not IsNull(rstFldDefn!sb_Index) Then

Call lci_sb_IdxCreate(rstFldDefn, tdf) ' Create field index

End If

rstFldDefn.MoveNext

Loop

rstTblDefn.MoveNext

Loop

End Function

The flow of the code in the listing is dissected as follows:

  1. Create the table list. A recordset rstTblDefn is created that contains the table names to be built. The recordset grabs records from the definition table that are flagged as table definition records; my flagging strategy uses a value of 0 in the column order position to designate a table definition record. Records with a non-zero column order value are field definition records and follow after the table definition. Refer back to Figure 10.32 for clarification.
  2. Create a table. Each new table is defined by creating a TableDef object variable, using this line:

    Set tdf = dbs.CreateTableDef(varTbl)

    To build a table using DAO code, you must first create a TableDef object, as shown. Then you must add fields to the table object's Fields collection (see step 3 following). Finally, you must use the TableDefs.Append method to add the defined table to the database structure (see step 4).

  3. Create the fields. After a table is created, its fields are added. The field information for one table is gathered from the definition table by creating a second recordset rstFldDefn. This recordset is looped and each new field for the table is created by the function lci_sb_FldCreate(), as described in Listing 10.21 following.
  4. Save the table. Jet requires that a table and its fields be added to the database structure before properties such as the field caption can be added. Once the code has created and named a table, created and named the fields, and assigned field data types, it adds the table to the database like this:

    dbs.TableDefs.Append tdf ' Append table to database

    The DAO does its best to keep information current for the benefit of your program code, but over the years I have seen instances where code can get ahead of the object structure it is creating in Jet. To prevent this, I habitually refresh DAO collections after populating them, as shown in this line from the listing:

    dbs.TableDefs.Refresh

  5. Add the properties. After a table and its basic field definitions are saved to the database structure, you can add property information to either. The code in the listing makes a second pass through the recordset of table fields in order to attach properties to each field. The work is done by a subordinate function lci_sb_FldPrpCreate() that is described in Listing 10.22:

    Set fld = tdf.Fields(rstFldDefn!sb_Name)

    blnRet = lci_sb_FldPrpCreate(rstFldDefn, fld) ' Create properties

  6. Create the indexes. After properties for a field are added, the field's definition information is inspected to see if the field will be indexed. If so, an Indexes collection object is created using the code for lci_sb_IdxCreate() in Listing 10.23:

    Call lci_sb_IdxCreate(rstFldDefn, tdf) ' Create field index

  7. Repeat. Steps 2 through 5 are repeated for each table listed in the controlling recordset of table names (rstTblDefn).

The looping routine in the previous listing calls several subordinate functions that use DAO code to create database objects. The first function called creates fields in the target table, as shown in Listing 10.21.

Listing 10.21 Schema.Mdb-DAO Code to Create New Fields in a Table
Definition

Public Function lci_sb_FldCreate(rrstDefn As Recordset _

, rtdf As TableDef) As Boolean

' Purpose: Create a new field based on definition table

' Arguments: rrstDefn:=Schema definition recordset

' rtdf:=Tabledef to create field in

Dim fld As Field ' New field

Dim intRet As Integer ' Returned by field type enum

Set fld = rtdf.CreateField(rrstDefn!sb_Name)

intRet = lci_sb_FldSizeGet(rrstDefn!sb_TypeDesc) ' Get TEXT size or -1

If intRet <> -1 Then ' Only set Size if one was returned

fld.Size = intRet

End If

intRet = lci_sb_FldTypeToEnum(rrstDefn!sb_TypeDesc) ' Get field type

fld.Type = intRet

rtdf.Fields.Append fld ' Append field to table

End Function

To create a new field, the code in the previous listing first creates a new field object and names it:

Set fld = rtdf.CreateField(rrstDefn!sb_Name)

Next, the code sets the size and data type of the field. Two functions are called to help with this process: lci_sb_FldSizeGet() and lci_sb_FldSizeToEnum(). The first function simply takes the field type description from the definition table, such as "TEXT 12", and extracts the field size ("12") for use by the Jet Size property. The second function uses this same field type string but translates the field type (such as "TEXT") into an integer value that specifies a valid Jet Type property (for example, a Text field is represented internally by the Jet field type value 10).

Finally, the code in the listing adds the new field to the table definition:

rtdf.Fields.Append fld ' Append field to table

For each new field that is added to the table, several properties must be set to the values described in the definition table. The properties include Caption, DefaultValue, and ValidationRule. Each of these properties must be created and appended to the new field after the table is saved to the database structure.

The code in Listing 10.22 shows how to create new properties and their values or set the value of existing properties. Notice the differentiation I made in the previous sentence: between new and existing properties. When a new field is created, it comes with a base set of properties provided by Jet, plus you can add your own. The difference is not immediately obvious when exploring Access.

In a nutshell, Jet has a core set of properties that define a field. For example, the properties from the definition table described in this section that are native to Jet are:

These properties can be set directly during creation of a new field using the syntax:

fieldobject.property = value

In contrast, Jet is not aware by default of properties that Access creates to benefit the Access interface or application model. Jet allows Access to add custom properties to a Field object, and after they are added Jet treats them as its own. Here are the properties the Schema Builder adds via the database definition table that are actually custom Access properties and not native to Jet:


To see all of the properties that are native to Jet, browse the Field object in the DAO library via the Object Browser.

Because several properties that are known to Jet are created via the Schema Builder, the routine lci_sb_FldCreate() in Listing 10.21 could continue defining Jet-specific properties before appending the field to the new table. Alternately, the Jet-specific and the Access custom properties can all be added at the same time. I have opted for the latter model but either is equally appropriate.

Listing 10.22 shows the routine that is called by the Schema Builder to add property values to a new field. The routine, lci_sb_FldPrpCreate(), accepts a pointer to the newly created field and adds the ten property values from the definition table recordset to the field. (The listing only shows one of the ten properties to save space; the syntax for each removed item is similar to the one shown.)

The second routine in the listing, lci_sb_FldPrpCreateOne(), is called by the first routine (lci_sb_FldPrpCreate()) to do the actual work of property creation. Notice how the routine can set both Jet (native) and Access (custom) properties. The dual personality is provided by the error trapping in the routine. The function first attempts to set the prescribed property value, assuming that the property exists:

rfld.Properties(rstrPrpName).Value = rvarPrpValue

If the property does exist, as in the case of AllowZeroLength and other native properties listed above, the property value is set by this line and no error is generated. On the other hand, if the property is a custom Access property, it will not exist for a new field until it is explicitly created. The code in Listing 10.22 handles this by creating and appending the new property to the field like this:

Set prp = rfld.CreateProperty(rstrPrpName, rintPrpType, rvarPrpValue)

rfld.Properties.Append prp

Listing 10.22 Schema.Mdb-DAO Code to Create New Properties in a Field Definition

Public Function lci_sb_FldPrpCreate(rrstDefn As Recordset _

, rfld As Field) As Boolean

' Purpose: Add field properties based on definition table

' Arguments: rrstDefn:=Schema definition recordset

' rfld:=Field to add properties to

Dim blnRet As Boolean ' Property creator return value

' AllowZeroLength property

If Not IsNull(rrstDefn!sb_AllowZeroLength) Then

blnRet = lci_sb_FldPrpCreateOne(rfld, "AllowZeroLength" _

, dbInteger, rrstDefn!sb_AllowZeroLength)

End If

' ...All other field properties are created here;

' the code has been removed...

End Function

Function lci_sb_FldPrpCreateOne(rfld As Field, rstrPrpName As String _

, rintPrpType As Integer, rvarPrpValue As Variant) As Boolean

' Purpose: Create one field property

' Arguments: rfld:=Field to create property on

' rstrPrpName:=Property name

' rintPrpType:=Property type

' rvarPrpValue:=Property value

Const cstrErrPrpNonExistent = 3270

Dim prp As Property

Err.Clear ' Next line will generate an error if no property

rfld.Properties(rstrPrpName).Value = rvarPrpValue

If Err <> 0 Then ' Property not found

If Err <> cstrErrPrpNonExistent Then ' Non-handleable error

GoTo lci_sb_FldPrpCreateOne_Exit

Else ' Property not found

Err.Clear ' Next lines will generate an error if property info invalid

Set prp = rfld.CreateProperty(rstrPrpName, rintPrpType, rvarPrpValue)

rfld.Properties.Append prp

If Err <> 0 Then ' Abort

GoTo lci_sb_FldPrpCreateOne_Exit

End If

End If

End If

End Function

The final missing puzzle piece in the origination of a new database from DAO-centric code is the creation of indexes. The Schema Builder checks the controlling recordset to see if the index information for the table is not blank. If it is not, the routine in Listing 10.23 is called to create and append an index to the new table.

Listing 10.23 Schema.Mdb-This Routine Creates a New Index
Programmatically

Public Sub lci_sb_IdxCreate(rrstDefn As Recordset, rtdf As TableDef)

' Purpose: Create a new index based on definition table

' Arguments: rrstDefn:=Schema definition recordset

' rtdf:=Tabledef to create index for

Dim fld As Field

Dim idx As Index

Select Case rrstDefn!sb_Index

Case "PK"

Set idx = rtdf.CreateIndex("PrimaryKey")

Set fld = idx.CreateField(rrstDefn!sb_Name)

idx.Primary = True

idx.Fields.Append fld

rtdf.Indexes.Append idx

Case "D"

Set idx = rtdf.CreateIndex(rrstDefn!sb_Name)

Set fld = idx.CreateField(rrstDefn!sb_Name)

idx.Fields.Append fld

rtdf.Indexes.Append idx

Case "U"

Set idx = rtdf.CreateIndex(rrstDefn!sb_Name)

Set fld = idx.CreateField(rrstDefn!sb_Name)

idx.Unique = True

idx.Fields.Append fld

rtdf.Indexes.Append idx

End Select

End Sub

Notice in the listing the specific steps required to create an index. You must first create and name an Index-type object:

Set idx = rtdf.CreateIndex("PrimaryKey")

An Index object is actually a collection of the fields that make up the index. Thus, you must next create a field to place in the index, using the CreateField method:

Set fld = idx.CreateField(rrstDefn!sb_Name)

If an index field will serve as the primary key of the table, set the index's Primary property value accordingly:

idx.Primary = True

Otherwise, set this value to False and use the Unique property to specify whether (True) or not (False) the index allows duplicate values:

idx.Unique = True

Finally, your code must add the field to the index and the index to the table:

idx.Fields.Append fld

rtdf.Indexes.Append idx

Figure 10.33 shows a table that was created directly from an Excel-based definition by the Schema Builder shown in this section.

Fig. 10.33

The Schema Builder created this table and its properties from information in Excel.


It took only one second of execution time for the Schema Builder to create the four tables shown in the definition table in Figure 10.32. We've used this tool to create a database with over a hundred tables in just a few minutes. Compare the few minutes to the manual entry time that would be required to build a hundred tables in the Access interface, and you'll quickly see why programming the DAO can be so rewarding.

Whether or not you choose to use the tool described in this section, the techniques that it demonstrates are exciting and powerful. Few database products provide you with as much control of the database objects and their properties as does Access coupled with the Data Access Objects engine. Not only does coding against the DAO object model allow you to create database objects as shown here, you can modify the properties of objects in bulk, write your own documentation routines, or export your database structure to another platform-all using VBA code.

From Here...

In this chapter, you've explored some of the issues that crop up regularly when designing and building tables and queries. A solid, intelligent database structure is one of the fundamental elements of an expert solution.


1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company.