Created by Anett Szabo, last modified by Anett Szabo 25 Jul 2007, at 02:28 PM
Suppose that you want to start lumping together information from
multiple rows. For example, you're interested in JOINing users with
their classified ads. That will give you one row per ad posted. But
you want to mush all the rows together for a particular user and just
look at the most recent posting time. What you need is the GROUP BY
construct:
select users.user_id, users.email, max(classified_ads.posted)
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by upper(users.email);
USER_ID EMAIL MAX(CLASSI
---------- ----------------------------------- ----------
39406 102140.1200@compuserve.com 1998-10-08
39842 102144.2651@compuserve.com 1998-12-13
41426 50@seattle.va.gov 1997-01-13
37428 71730.345@compuserve.com 1998-11-24
35970 aaibrahim@earthlink.net 1998-11-08
36671 absolutsci@aol.com 1998-10-06
35781 alevy@agtnet.com 1997-07-14
40111 alexzorba@aol.com 1998-09-25
39060 amchiu@worldnet.att.net 1998-12-11
35989 andrew.c.beckner@bankamerica.com 1998-08-13
33923 andy_roo@mit.edu 1998-12-10
The
group by users.user_id, users.email tells SQL to "lump
together all the rows that have the same values in these two columns."
In addition to the grouped by columns, we can run aggregate functions on
the columns that aren't being grouped. For example, the MAX above
applies to the posting dates for the rows in a particular group. We can
also use COUNT to see at a glance how active and how recently active a
user has been:
select users.user_id, users.email, count(*), max(classified_ads.posted)
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by upper(users.email);
USER_ID EMAIL COUNT(*) MAX(CLASSI
---------- ----------------------------------- ---------- ----------
39406 102140.1200@compuserve.com 3 1998-10-08
39842 102144.2651@compuserve.com 3 1998-12-13
41426 50@seattle.va.gov 1 1997-01-13
37428 71730.345@compuserve.com 3 1998-11-24
35970 aaibrahim@earthlink.net 1 1998-11-08
36671 absolutsci@aol.com 2 1998-10-06
35781 alevy@agtnet.com 1 1997-07-14
40111 alexzorba@aol.com 1 1998-09-25
39060 amchiu@worldnet.att.net 1 1998-12-11
35989 andrew.c.beckner@bankamerica.com 1 1998-08-13
33923 andy_roo@mit.edu 1 1998-12-10
A publisher who was truly curious about this stuff probably wouldn't be
interested in these results alphabetically. Let's find our most
recently active users. At the same time, let's get rid of the unsightly
"MAX(CLASSI" at the top of the report:
select users.user_id,
users.email,
count(*) as how_many,
max(classified_ads.posted) as how_recent
from users, classified_ads
where users.user_id = classified_ads.user_id
group by users.user_id, users.email
order by how_recent desc, how_many desc;
USER_ID EMAIL HOW_MANY HOW_RECENT
---------- ----------------------------------- ---------- ----------
39842 102144.2651@compuserve.com 3 1998-12-13
39968 mkravit@mindspring.com 1 1998-12-13
36758 mccallister@mindspring.com 1 1998-12-13
38513 franjeff@alltel.net 1 1998-12-13
34530 nverdesoto@earthlink.net 3 1998-12-13
34765 jrl@blast.princeton.edu 1 1998-12-13
38497 jeetsukumaran@pd.jaring.my 1 1998-12-12
38879 john.macpherson@btinternet.com 5 1998-12-12
37808 eck@coastalnet.com 1 1998-12-12
37482 dougcan@arn.net 1 1998-12-12
Note that we were able to use our
correlation names of
"how_recent" and "how_many" in the ORDER BY clause. The
desc ("descending") directives in the ORDER BY clause
instruct Oracle to put the largest values at the top. The default sort
order is from smallest to largest ("ascending").
Upon close inspection, the results are confusing. We instructed Oracle
to rank first by date and second by number of postings. Yet for
1998-12-13 we don't see both users with three total postings at the top.
That's because Oracle dates are precise to the second even when the
hour, minute, and second details are not displayed by SQL*Plus. A
better query would include the clause
order by trunc(how_recent) desc, how_many desc
where the built-in Oracle function
trunc truncates each date
to midnight on the day in question.
Finding co-moderators: The HAVING Clause
The WHERE clause restricts which rows are returned. The HAVING clause
operates analogously but on groups of rows. Suppose, for example, that
we're interested in finding those users who've contributed heavily to
our discussion forum:
select user_id, count(*) as how_many
from bboard
group by user_id
order by how_many desc;
USER_ID HOW_MANY
---------- ----------
34474 1922
35164 985
41112 855
37021 834
34004 823
37397 717
40375 639
...
33963 1
33941 1
33918 1
7348 rows selected.
Seventy three hundred rows. That's way too big considering that we are
only interested in nominating a couple of people. Let's restrict to
more recent activity. A posting contributed three years ago is not
necessarily evidence of interest in the community right now.
select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
order by how_many desc;
USER_ID HOW_MANY
---------- ----------
34375 80
34004 79
37903 49
41074 46
...
1120 rows selected.
We wanted to kill rows, not groups, so we did it with a WHERE clause.
Let's get rid of the people who are already serving as maintainers.
select user_id, count(*) as how_many
from bboard
where not exists (select 1 from
bboard_authorized_maintainers bam
where bam.user_id = bboard.user_id)
and posting_time + 60 > sysdate
group by user_id
order by how_many desc;
The concept of User ID makes sense for both rows and groups, so we can
restrict our results either with the extra WHERE clause above or by
letting the relational database management system produce the groups and
then we'll ask that they be tossed out using a HAVING clause:
select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
having not exists (select 1 from
bboard_authorized_maintainers bam
where bam.user_id = bboard.user_id)
order by how_many desc;
This doesn't get to the root cause of our distressingly large query
result: we don't want to see groups where
how_many is less
than 30. Here's the SQL for "show me users who've posted at least 30
messages in the past 60 days, ranked in descending order of volubility":
select user_id, count(*) as how_many
from bboard
where posting_time + 60 > sysdate
group by user_id
having count(*) >= 30
order by how_many desc;
USER_ID HOW_MANY
---------- ----------
34375 80
34004 79
37903 49
41074 46
42485 46
35387 30
42453 30
7 rows selected.
We had to do this in a HAVING clause because the number of rows in a
group is a concept that doesn't make sense at the per-row level on which
WHERE clauses operate.
Oracle 8's SQL parser is too feeble to allow you to use the
how_many correlation variable in the HAVING clause. You
therefore have to repeat the count(*) incantation.
Set Operations: UNION, INTERSECT, and MINUS
Oracle provides set operations that can be used to combine rows
produced by two or more separate SELECT statements. UNION pools
together the rows returned by two queries, removing any duplicate
rows. INTERSECT combines the result sets of two queries by removing any
rows that are not present in both. MINUS combines the results of two
queries by taking the the first result set and subtracting from it any
rows that are also found in the second. Of the three, UNION is the most
useful in practice.
In the ArsDigita Community System ticket tracker, people reporting a bug
or requesting a feature are given a menu of potential deadlines. For
some projects, common project deadlines are stored in the
ticket_deadlines table. These should appear in an HTML
SELECT form element. We also, however, want some options like "today",
"tomorrow", "next week", and "next month". The easiest way to handle
these is to query the dual table to perform some date
arithmetic. Each of these queries will return one row and if we UNION
four of them together with the query from ticket_deadlines,
we can have the basis for a very simple Web script to present the
options:
select
'today - ' || to_char(trunc(sysdate),'Mon FMDDFM'),
trunc(sysdate) as deadline
from dual
UNION
select
'tomorrow - '|| to_char(trunc(sysdate+1),'Mon FMDDFM'),
trunc(sysdate+1) as deadline
from dual
UNION
select
'next week - '|| to_char(trunc(sysdate+7),'Mon FMDDFM'),
trunc(sysdate+7) as deadline
from dual
UNION
select
'next month - '|| to_char(trunc(ADD_MONTHS(sysdate,1)),'Mon FMDDFM'),
trunc(ADD_MONTHS(sysdate,1)) as deadline
from dual
UNION
select
name || ' - ' || to_char(deadline, 'Mon FMDDFM'),
deadline
from ticket_deadlines
where project_id = :project_id
and deadline >= trunc(sysdate)
order by deadline
will produce something like
<form>
<select name="deadline_choice">
<option value="2000-10-28">today - Oct 28
</option><option value="2000-10-29">tomorrow - Oct 29
</option><option value="2000-11-04">next week - Nov 4
</option><option value="2000-11-28">next month - Nov 28
</option><option value="2000-12-01">V2.0 freeze - Dec 1
</option><option value="2000-12-15">V2.0 ship - Dec 15
</option></select>
</form>
The INTERSECT and MINUS operators are seldom used. Here are some
contrived examples. Suppose that you collect contest entries by Web
users, each in a separate table:
create table trip_to_paris_contest (
user_id references users,
entry_date date not null
);
create table camera_giveaway_contest (
user_id references users,
entry_date date not null
);
Now let's populate with some dummy data:
-- all three users love to go to Paris
insert into trip_to_paris_contest values (1,'2000-10-20');
insert into trip_to_paris_contest values (2,'2000-10-22');
insert into trip_to_paris_contest values (3,'2000-10-23');
-- only User #2 is a camera nerd
insert into camera_giveaway_contest values (2,'2000-05-01');
Suppose that we've got a new contest on the site. This time we're
giving away a trip to Churchill, Manitoba to photograph polar bears. We
assume that the most interested users will be those who've entered both
the travel and the camera contests. Let's get their user IDs so that we
can notify them via email (spam) about the new contest:
select user_id from trip_to_paris_contest
intersect
select user_id from camera_giveaway_contest;
USER_ID
----------
2
Or suppose that we're going to organize a personal trip to Paris and
want to find someone to share the cost of a room at the Crillon. We can
assume that anyone who entered the Paris trip contest is interested in
going. So perhaps we should start by sending them all email. On the
other hand, how can one enjoy a quiet evening with the absinthe bottle
if one's companion is constantly blasting away with an electronic flash?
We're interested in people who entered the Paris trip contest but who
did not enter the camera giveaway:
select user_id from trip_to_paris_contest
minus
select user_id from camera_giveaway_contest;
USER_ID
----------
1
3
---
based on SQL for Web Nerds
Created by Anett Szabo, last modified by Anett Szabo 25 Jul 2007, at 02:07 PM
The architect of any new information system must decide how much
responsibility for data management the new custom software should take
and how much should be left to packaged software and the operating
system. This chapter explains what kind of packaged data management
software is available, covering files, flat file database management
systems, the RDBMS, object-relational database management systems, and
object databases. This chapter also introduces the SQL language.
What Do You Need for Transaction Processing?
Data
processing folks like to talk about the "ACID test" when deciding
whether or not a database management system is adequate for handling
transactions. An adequate system has the following properties:
Atomicity
Results
of a transaction's execution are either all committed or all rolled
back. All changes take effect, or none do. That means, for Joe User's
money transfer, that both his savings and checking balances are
adjusted or neither are. For a Web content management example, suppose
that a user is editing a comment. A Web script tells the database to
"copy the old comment value to an audit table and update the live table
with the new text". If the hard drive fills up after the copy but
before the update, the audit table insertion will be rolled back.
Consistency
The
database is transformed from one valid state to another valid state.
This defines a transaction as legal only if it obeys user-defined
integrity constraints. Illegal transactions aren't allowed and, if an
integrity constraint can't be satisfied then the transaction is rolled
back. For example, suppose that you define a rule that postings in a
discussion forum table must be tied to a valid user ID. Then you hire
Joe Novice to write some admin pages. Joe writes a delete-user page
that doesn't bother to check whether or not the deletion will result in
an orphaned discussion forum posting. The DBMS will check, though, and
abort any transaction that would result in you having a discussion
forum posting by a deleted user.
Isolation
The
results of a transaction are invisible to other transactions until the
transaction is complete. For example, if you are running an accounting
report at the same time that Joe is transferring money, the accounting
report program will either see the balances before Joe transferred the
money or after, but never the intermediate state where checking has
been credited but savings not yet debited.
Durability
Once
committed (completed), the results of a transaction are permanent and
survive future system and media failures. If the airline reservation
system computer gives you seat 22A and crashes a millisecond later, it
won't have forgotten that you are sitting in 22A and also give it to
someone else. Furthermore, if a programmer spills coffee into a disk
drive, it will be possible to install a new disk and recover the
transactions up to the coffee spill, showing that you had seat 22A.
That
doesn't sound too tough to implement, does it? And, after all, one of
the most refreshing things about the Web is how it encourages people
without formal computer science backgrounds to program. So why not
build your Internet bank on a transaction system implemented by an
English major who has just discovered Perl?
Because you still need indexing.
Finding Your Data (and Fast)
One
facet of a database management system is processing inserts, updates,
and deletes. This all has to do with putting information into the
database. Sometimes it is also nice, though, to be able to get data
out. And with popular sites getting 100 hits per second, it pays to be
conscious of speed.
Flat files work okay if they are very
small. A Perl script can read the whole file into memory in a split
second and then look through it to pull out the information requested.
But suppose that your on-line bank grows to have 250,000 accounts. A
user types his account number into a Web page and asks for his most
recent deposits. You've got a chronological financial transactions file
with 25 million entries. Crunch, crunch, crunch. Your server
laboriously works through all 25 million to find the ones with an
account number that matches the user's. While it is crunching, 25 other
users come to the Web site and ask for the same information about their
accounts.
You have two choices: (1) buy a 64-processor
Sun E10000 server with 64 GB of RAM, or (2) build an index file. If you
build an index file that maps account numbers to sequential transaction
numbers, your server won't have to search all 25 million records
anymore. However, you have to modify all of your programs that insert,
update, or delete from the database to also keep the index current.
This
works great until two years later when a brand new MBA arrives from
Harvard. She asks your English major cum Perl hacker for "a report of
all customers who have more than $5,000 in checking or live in Oklahoma
and have withdrawn more than $100 from savings in the last 17 days." It
turns out that you didn't anticipate this query so your indexing scheme
doesn't speed things up. Your server has to grind through all the data
over and over again.
Enter the Relational Database
You
are building a cutting-edge Web service. You need the latest and
greatest in computer technology. That's why you use, uh, Unix. Yeah.
Anyway, even if your operating system was developed in 1969, you
definitely can't live without the most modern database management
system available. Maybe this guy E.F. Codd can help:
"Future
users of large data banks must be protected from having to know how the
data is organized in the machine (the internal representation). ...
Activities of users at terminals and most application programs should
remain unaffected when the internal representation of data is changed
and even when some aspects of the external representation are changed.
Changes in data representation will often be needed as a result of
changes in query, update, and report traffic and natural growth in the
types of stored information. "Existing noninferential, formatted data systems provide
users with tree-structured files or slightly more general network
models of the data. In Section 1, inadequacies of these models are
discussed. A model based on n-ary relations, a normal form for
data base relations, and the concept of a universal data sublanguage
are introduced. In Section 2, certain operations on relations (other
than logical inference) are discussed and applied to the problems of
redundancy and consistency in the user's model."
Sounds
pretty spiffy, doesn't it? Just like what you need. That's the abstract
to "A Relational Model of Data for Large Shared Data Banks", a paper
Codd wrote while working at IBM's San Jose research lab. It was
published in the Communications of the ACM in June, 1970.
From an application programmer's point of view, the biggest innovation in the relational database is that one uses a declarative
query language, SQL (an acronym for Structured Query Language and
pronounced "ess-cue-el" or "sequel"). Most computer languages are procedural.
The programmer tells the computer what to do, step by step, specifying
a procedure. In SQL, the programmer says "I want data that meet the
following criteria" and the RDBMS query planner figures out how to get
it. There are two advantages to using a declarative language. The first
is that the queries no longer depend on the data representation. The
RDBMS is free to store data however it wants. The second is increased
software reliability. It is much harder to have "a little bug" in an
SQL query than in a procedural program. Generally it either describes
the data that you want and works all the time or it completely fails in
an obvious way.
Another benefit of declarative languages
is that less sophisticated users are able to write useful programs. For
example, many computing tasks that required professional programmers in
the 1960s can be accomplished by non-technical people with
spreadsheets. In a spreadsheet, you don't tell the computer how to work
out the numbers or in what sequence. You just declare "This cell will be 1.5 times the value of that other cell over there."
How Does This RDBMS Thing Work?
This
is all you need to know to be a Caveman Database Programmer: A
relational database is a big spreadsheet that several people can update
simultaneously.
Each table in the database is one
spreadsheet. You tell the RDBMS how many columns each row has. For
example, in our mailing list database, the table has two columns: name and email.
Each entry in the database consists of one row in this table. An RDBMS
is more restrictive than a spreadsheet in that all the data in one
column must be of the same type, e.g., integer, decimal, character
string, or date. Another difference between a spreadsheet and an RDBMS
is that the rows in an RDBMS are not ordered. You can have a column
named row_number and ask the RDBMS to return the rows
ordered according to the data in this column, but the row numbering is
not implicit as it would be with Visicalc or its derivatives such as
Lotus 1-2-3 and Excel. If you do define a row_number
column or some other unique identifier for rows in a table, it becomes
possible for a row in another table to refer to that row by including
the value of the unique ID.
Here's what some SQL looks like for the mailing list application
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
The table will be called mailing_list and will have two columns, both variable length character strings. We've added a couple of integrity constraints on the email column. The not null will prevent any program from inserting a row where name is specified but email
is not. After all, the whole point of the system is to send people
e-mail so there isn't much value in having a name with no e-mail
address. The primary key tells the database that this
column's value can be used to uniquely identify a row. That means the
system will reject an attempt to insert a row with the same e-mail
address as an existing row. This sounds like a nice feature, but it can
have some unexpected performance implications. For example, every time
anyone tries to insert a row into this table, the RDBMS will have to
look at all the other rows in the table to make sure that there isn't
already one with the same e-mail address. For a really huge table, that
could take minutes, but if you had also asked the RDBMS to create an
index for mailing_list on email then the
check becomes almost instantaneous. However, the integrity constraint
still slows you down because every update to the mailing_list table will also require an update to the index and therefore you'll be doing twice as many writes to the hard disk.
That
is the joy and the agony of SQL. Inserting two innocuous looking words
can cost you a factor of 1000 in performance. Then inserting a sentence
(to create the index) can bring you back so that it is only a factor of
two or three. (Note that many RDBMS implementations, including Oracle,
automatically define an index on a column that is constrained to be
unique.)
Anyway, now that we've executed the Data Definition Language "create table" statement, we can move on to Data Manipulation Language: an INSERT.
insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');
Note that we specify into which columns we are inserting. That way, if someone comes along later and does
alter table mailing_list add (phone_number varchar(20));
the
Oracle syntax for adding a column), our INSERT will still work. Note
also that the string quoting character in SQL is a single quote. If you
want to have a single quote in the string, double the single quote in
"O'Grady":
insert into mailing_list (name, email)
values ('Michael O''Grady','ogrady@fastbuck.com');
Having
created a table and inserted some data, at last we are ready to
experience the awesome power of the SQL SELECT. Want your data back?
SQL> select * from mailing_list;
EMAIL NAME PHONE_NUMBER
------------------------- ------------------------- ------------
philg@mit.edu Philip Greenspun
ogrady@fastbuck.com Michael O'Grady
2 rows selected.
Note that there are no values in the phone_number
column because we haven't set any. As soon as we do start to add phone
numbers, we realize that our data model was inadequate. This is the
Internet and Joe Typical User will have his pants hanging around his
knees under the weight of a cell phone, beeper, and other personal
communication accessories. One phone number column is clearly
inadequate and even work_phone and home_phone
columns won't accommodate the wealth of information users might want to
give us. The clean database-y way to do this is to remove our phone_number column from the mailing_list
table and define a helper table just for the phone numbers. Removing or
renaming a column is easy nowadays, but for the beauty of clean code we
drop the old table and create two new ones:
drop table mailing_list;
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
create table phone_numbers (
email varchar(100) not null references mailing_list(email),
number_type varchar(15) check (number_type in ('work','home','cell','beeper')),
phone_number varchar(20) not null
);
Note that in this table the email column is not
a primary key. That's because we want to allow multiple rows with the
same e-mail address. If you are hanging around with a database nerd
friend, you can say that there is a relationship between the rows in the phone_numbers table and the mailing_list table. In fact, you can say that it is a many-to-one relation because many rows in the phone_numbers table may correspond to only one row in the mailing_list table.
Another item worth noting about our two-table data model is that we do not store the user's name in the phone_numbers table. That would be redundant with the mailing_list
table and potentially self-redundant as well, if, for example,
"robert.loser@fastbuck.com" says he is "Robert Loser" when he types in
his work phone and then "Rob Loser" when he puts in his beeper number,
and "Bob Lsr" when he puts in his cell phone number while typing on his
laptop's cramped keyboard.
Anyway, enough database nerdism. Let's populate the phone_numbers table:
SQL> insert into phone_numbers values ('ogrady@fastbuck.com','work','(800) 555-1212');
ORA-02291: integrity constraint (SCOTT.SYS_C001080) violated - parent key not found
Ooops! When we dropped the mailing_list table, we lost all the rows. The phone_numbers
table has a referential integrity constraint ("references
mailing_list") to make sure that we don't record e-mail addresses for
people whose names we don't know. We have to first insert the two users
into mailing_list:
insert into mailing_list (name, email)
values ('Philip Greenspun','philg@mit.edu');
insert into mailing_list (name, email)
values ('Michael O''Grady','ogrady@fastbuck.com');
insert into phone_numbers values ('ogrady@fastbuck.com','work','(800) 555-1212');
insert into phone_numbers values ('ogrady@fastbuck.com','home','(617) 495-6000');
insert into phone_numbers values ('philg@mit.edu','work','(617) 253-8574');
insert into phone_numbers values ('ogrady@fastbuck.com','beper','(617) 222-3456');
Note
that the last four INSERTs use an evil SQL shortcut and don't specify
the columns into which we are inserting data. The system defaults to
using all the columns in the order that they were defined. Except for
prototyping and playing around, we don't recommend ever using this
shortcut. The first three INSERTs work fine, but what about the last one, where Mr. O'Grady misspelled "beeper"?
ORA-02290: check constraint (SCOTT.SYS_C001079) violated
We asked Oracle at table definition time to check (number_type in ('work','home','cell','beeper')) and it did. The database cannot be left in an inconsistent state.
Let's say we want all of our data out. Email, full name, phone numbers. The most obvious query to try is a join.
SQL> select * from mailing_list, phone_numbers;
EMAIL NAME EMAIL TYPE NUMBER
---------------- ---------------- ---------------- ------ --------------
philg@mit.edu Philip Greenspun ogrady@fastbuck. work (800) 555-1212
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. work (800) 555-1212
philg@mit.edu Philip Greenspun ogrady@fastbuck. home (617) 495-6000
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. home (617) 495-6000
philg@mit.edu Philip Greenspun philg@mit.edu work (617) 253-8574
ogrady@fastbuck. Michael O'Grady philg@mit.edu work (617) 253-8574
6 rows selected.
Yow! What happened? There are only two rows in the mailing_list table and three in the phone_numbers table. Yet here we have six rows back. This is how joins work. They give you the Cartesian product
of the two tables. Each row of one table is paired with all the rows of
the other table in turn. So if you join an N-row table with an M-row
table, you get back a result with N*M rows. In real databases, N and M
can be up in the millions so it is worth being a little more specific
as to which rows you want:
select *
from mailing_list, phone_numbers
where mailing_list.email = phone_numbers.email;
EMAIL NAME EMAIL TYPE NUMBER
---------------- ---------------- ---------------- ------ --------------
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. work (800) 555-1212
ogrady@fastbuck. Michael O'Grady ogrady@fastbuck. home (617) 495-6000
philg@mit.edu Philip Greenspun philg@mit.edu work (617) 253-8574
3 rows selected.
Probably
more like what you had in mind. Refining your SQL statements in this
manner can sometimes be more exciting. For example, let's say that you
want to get rid of Philip Greenspun's phone numbers but aren't sure of
the exact syntax.
SQL> delete from phone_numbers;
3 rows deleted.
Oops. Yes, this does actually delete all the rows in the table. You probably wish you'd typed
delete from phone_numbers where email = 'philg@mit.edu';
but it is too late now. Therefore be careful before executing "delete".
There
is one more fundamental SQL statement to learn. Suppose that Philip
moves to Hollywood to realize his long-standing dream of becoming a
major motion picture producer. Clearly a change of name is in order,
though he'd be reluctant to give up the e-mail address he's had since
1976. Here's the SQL:
SQL> update mailing_list set name = 'Phil-baby Greenspun' where email = 'philg@mit.edu';
1 row updated.
SQL> select * from mailing_list;
EMAIL NAME
-------------------- --------------------
philg@mit.edu Phil-baby Greenspun
ogrady@fastbuck.com Michael O'Grady
2 rows selected.
As with DELETE, don't play around with UPDATE statements unless you have a WHERE clause at the end.
---
based on SQL for Web Nerds