Watson

At the beginning of this section a number of questions were posed, typical of those that might occur in a university environment. These questions included:

Who is currently majoring in computer science?
Who taught CS 100 in the fall of 2012?
Where is Dr. O’Neal’s office located?

It was then claimed that a database system should make it possible to easily answer these questions.

So far we have discussed two of the three relation database query operators: “select” and “project”. These two operators can be used together to construct tables that answer very specific questions about the data stored in a relation.

For example, to answer the question “Who is currently majoring in computer science?”, we can construct a relation that contains the names of all CS majors. All that is required is the following sequence of two relational queries:

A relation containing the names of all CS majors.

The first command creates a table, named CS_Majors, that is based on the “Students” relation but only contains those tuples that contain information about CS majors. The second statement projects the name attribute from the “CS_Majors” relation, producing a table that contains only the names of CS majors.

It is interesting to note that during the discussion of the programmer’s view of data in , we constructed a program that essentially answered the same question. That program, presented in , is ten lines long and requires knowledge of file systems and programming concepts in order to be developed.[5]

The other questions listed above can be answer by constructing a series of relational commands that use both “select” and “project”. For example, “Who taught CS 100 in the Fall of 2012?” can be answered with the following sequence of commands:

A relation containing the name of the instructor who taught CS 100 in the Fall quarter of 2012.

The first “select” gets information on CS 100 courses. The second “select” narrows this down to CS 100 courses taught in 2012. The third “select” narrows the data even further to the sections of CS 100 taught during the fall quarter of 2012. The final statement projects only the faculty members’ names, so that the resulting relation answers the original question: “Who taught CS 100 in the Fall of 2012?”. For our example university database, the final relation consists of a single entry “ONeal M.B.” Given different data, the final table might have contained multiple entries, since it is possible that a number of different instructors could have taught separate sections of CS 100 in the fall of 2012.

While the above example is easy to understand, writing separate “select” statements for every attribute condition we need to express can become cumbersome. Many database programs allow you to “and” together a number of different condition tests, where all of the tests must yield true for a tuple to be selected. The first three statements of the above example can be reduced to a single expression by using such a “compound select”. Hence, the question of who taught CS 100 in the Fall of 2012 could be determined by the following two relational expressions.

CS100_Fall_2012 ← Select from Courses where Course = “CS100” and
Year = 2012 and
Quarter = “Fall”
CS100_Fall_2012_Instructors ← Project Fname from CS100_Fall_2012

You may have noticed that all of the examples in this subsection applied the “select” operator before the “project”. A common error made by students who are just learning to write queries that involve multiple relational expressions, is to perform the “project” too early, or to leave out necessary attributes when crafting a needed projection.

For example, let’s say that in trying to determine the names of CS 100 Fall 2012 instructors, I began by first making a list of instructor names and then tried to narrow the list to those instructors who taught CS 100 in the Fall of 2012. In other words, I took the following approach:

Instructors ← Project Fname from Courses

CS100_Fall_2012_Instructors ← Select from Instructors where ???????

The problem that I would encounter is that the table with the instructor names does not contain the proper fields to allow me to narrow the list any further – it is just a list of names and nothing more. However, you should not conclude that placing “projects” before “selects” is necessarily wrong. The following sequence of relational expressions, which does begin with a “project” statement, is an acceptable, albeit somewhat longer, solution to the problem.

Course_Data ← Project Fname, Course, Quarter, Year from Courses

CS100_Fall_2012 ← Select from Course_Data where Course = “CS100” and Year = 2012 and Quarter = “Fall”

CS100_Fall_2012_Instructors ← Project Fname from CS100_Fall_2012

This solution concentrates first on determining the attributes needed to solve the problem, rather than the tuples needed to do so. Such an approach makes a lot of sense when dealing with tables that contain large numbers of attributes, since it helps us to focus on which parts of a relation are relevant to the question being answered.

Exercises for

Using the relations presented in , , and , develop relational queries to solve each of the following problems.

Where is Dr. O’Neal’s office located?
What course did Dr. Kurtz teach in fall 2013?
What are the names of all male math majors who are between the ages of 19 and 21, inclusive?
What are the names and telephone numbers of faculty in the math department?
List the student numbers (but no other information) of female CS students.
List the salaries of computer science faculty members (but, to protect confidentiality, don’t include any other information in the table).

Footnotes

[5] Actually, does not contain a “complete” program, only the main parts of one. The entire program would be substantially longer.