#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The following websites contains a basic overview of many SQL
# commands and concepts. We will be going through the concepts and many
# of the interactive exercises from these websites
#
# https://www.w3schools.com/sql/
#
# https://sqlbolt.com/
#
# https://www.sqlitetutorial.net/
#
# https://mode.com/sql-tutorial/
#
# https://sqlzoo.com/
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# See this page
#
# https://sqlbolt.com/lesson/introduction
#
# For an overview of :
#
# - What is Structured Query Language (SQL) - pronounced "sequel"
#
# - What is a "Relational Database"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15 Relational Databases and SQL (using R)
We will spend the next few sections focusing on “Structured Query Language (SQL)”, the language that is used for accessing data in a relational database. Remember that a relational database stores its information as a collection of several “tables”. Each table arranges its data in rows and columns.
15.1 Web resources for learning SQL
15.2 What software do I need to use SQL?
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The sqldf package in R. ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# To use SQL you must have access to software that interacts with
# a relational database. As mentioned above many such software packages
# exist, e.g.
#
# - Microsoft SQL Server
# - Oracle Relational Database Management System
# - MySql
# - Postgres
# - MariaDb
# - many many others
#
# In addition many custom software packages exist that use
# SQL to manipulate relational databases.
#
# Many of these software packages tend to be overly complex to install.
# However, the underlying SQL language that is used by all of these
# different software packages tends to be very, very similar.
#
# For this class we will be using R to learn the basic SQL commands.
# The sqldf package in R includes the sqldf function that takes a single
# character value. The value is expected to be a valid SQL command.
# Instead of referring to database "tables" the sqldf function interprets
# database table names found in the commands as referring to R dataframes.
# This makes sense since R dataframes are very similar in structure to
# relational database tables.
#
# The version of SQL that is used by default with the sqldf package
# is the "sqlite" version. You can search online for more information
# about sqlite if you want to understand more about the specifics
# of this "flavor" of SQL. The following website is a good place to
# start: https://www.sqlitetutorial.net/
#
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.2.1 The sqldf R package
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Using the sqldf R package ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SQL is NOT an "R" concept ... however we can use SQL in R via
# the sqldf package.
#
# The "sqldf" package in R allows you to use the sqldf() function
# to run sql statements using R dataframes instead of database tables.
# We will be using this package to introduce the first sql commands
# we will learn.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# install and load (i.e. require) the sqldf package
if(!require(sqldf)){install.packages("sqldf");require(sqldf);}
Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
15.3 data for this part of the tutorial
You can download the data for this tutorial from the following files.
grades.csv This is a .csv file. You can read the contents of the file into R with the read.csv function as shown below.
sqlbolt_tables-v007.RData This is a .RData file. You can read the contents of the file into R with the load function as shown below.
You can load these tables into your R session with the following commands.
= read.csv("grades.csv")
grades load("sqlbolt_tables-v007.RData")
Below are the contents of these tables:
# Tables that we will be using in this presentation.
sqldf("select * from grades")
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0200 sue fr FALSE 80 80 79 IDS
3 s0300 anne fr TRUE 90 95 92 IDS
4 s0400 frank so TRUE 100 95 91 ACC
5 s0500 bob so FALSE 30 NA NA IDS
6 s0600 samantha so TRUE 100 70 97 IDS
7 s0700 larry ju FALSE 32 80 NA FIN
8 s0800 bill se FALSE 84 90 92 ACC
9 s0900 lucricia fr TRUE 80 100 88 IDS
10 s1000 margaret se FALSE 59 47 NA IDS
11 s1100 dalia se FALSE 85 92 93 FIN
12 s1200 zvi ju TRUE 90 98 92 ACC
13 s1300 mike ju TRUE 90 86 79 IDS
14 s1400 david se TRUE 90 87 87 IDS
15 s1500 dana so FALSE 100 93 91 FIN
sqldf("select * from movies")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
sqldf("select * from orders")
OrderID CustomerID EmployeeID OrderDate ShipperID
1 10248 90 5 7/4/1996 3
2 10249 81 6 7/5/1996 1
3 10250 34 4 7/8/1996 2
4 10251 84 3 7/8/1996 1
5 10252 76 4 7/9/1996 2
6 10253 34 3 7/10/1996 2
7 10254 14 5 7/11/1996 2
8 10255 68 9 7/12/1996 3
9 10256 88 3 7/15/1996 2
10 10257 35 4 7/16/1996 3
11 10258 20 1 7/17/1996 1
12 10259 13 4 7/18/1996 3
13 10260 55 4 7/19/1996 1
14 10261 61 4 7/19/1996 2
15 10262 65 8 7/22/1996 3
16 10263 20 9 7/23/1996 3
17 10264 24 6 7/24/1996 3
18 10265 7 2 7/25/1996 1
19 10266 87 3 7/26/1996 3
20 10267 25 4 7/29/1996 1
21 10268 33 8 7/30/1996 3
22 10269 89 5 7/31/1996 1
23 10270 87 1 8/1/1996 1
24 10271 75 6 8/1/1996 2
25 10272 65 6 8/2/1996 2
26 10273 63 3 8/5/1996 3
27 10274 85 6 8/6/1996 1
28 10275 49 1 8/7/1996 1
29 10276 80 8 8/8/1996 3
30 10277 52 2 8/9/1996 3
31 10278 5 8 8/12/1996 2
32 10279 44 8 8/13/1996 2
33 10280 5 2 8/14/1996 1
34 10281 69 4 8/14/1996 1
35 10282 69 4 8/15/1996 1
36 10283 46 3 8/16/1996 3
37 10284 44 4 8/19/1996 1
38 10285 63 1 8/20/1996 2
39 10286 63 8 8/21/1996 3
40 10287 67 8 8/22/1996 3
41 10288 66 4 8/23/1996 1
42 10289 11 7 8/26/1996 3
43 10290 15 8 8/27/1996 1
44 10291 61 6 8/27/1996 2
45 10292 81 1 8/28/1996 2
46 10293 80 1 8/29/1996 3
47 10294 65 4 8/30/1996 2
48 10295 85 2 9/2/1996 2
49 10296 46 6 9/3/1996 1
50 10297 7 5 9/4/1996 2
51 10298 37 6 9/5/1996 2
52 10299 67 4 9/6/1996 2
53 10300 49 2 9/9/1996 2
54 10301 86 8 9/9/1996 2
55 10302 76 4 9/10/1996 2
56 10303 30 7 9/11/1996 2
57 10304 80 1 9/12/1996 2
58 10305 55 8 9/13/1996 3
59 10306 69 1 9/16/1996 3
60 10307 48 2 9/17/1996 2
61 10308 2 7 9/18/1996 3
62 10309 37 3 9/19/1996 1
63 10310 77 8 9/20/1996 2
64 10311 18 1 9/20/1996 3
65 10312 86 2 9/23/1996 2
66 10313 63 2 9/24/1996 2
67 10314 65 1 9/25/1996 2
68 10315 38 4 9/26/1996 2
69 10316 65 1 9/27/1996 3
70 10317 48 6 9/30/1996 1
71 10318 38 8 10/1/1996 2
72 10319 80 7 10/2/1996 3
73 10320 87 5 10/3/1996 3
74 10321 38 3 10/3/1996 2
75 10322 58 7 10/4/1996 3
76 10323 39 4 10/7/1996 1
77 10324 71 9 10/8/1996 1
78 10325 39 1 10/9/1996 3
79 10326 8 4 10/10/1996 2
80 10327 24 2 10/11/1996 1
81 10328 28 4 10/14/1996 3
82 10329 75 4 10/15/1996 2
83 10330 46 3 10/16/1996 1
84 10331 9 9 10/16/1996 1
85 10332 51 3 10/17/1996 2
86 10333 87 5 10/18/1996 3
87 10334 84 8 10/21/1996 2
88 10335 37 7 10/22/1996 2
89 10336 60 7 10/23/1996 2
90 10337 25 4 10/24/1996 3
91 10338 55 4 10/25/1996 3
92 10339 51 2 10/28/1996 2
93 10340 9 1 10/29/1996 3
94 10341 73 7 10/29/1996 3
95 10342 25 4 10/30/1996 2
96 10343 44 4 10/31/1996 1
97 10344 89 4 11/1/1996 2
98 10345 63 2 11/4/1996 2
99 10346 65 3 11/5/1996 3
100 10347 21 4 11/6/1996 3
101 10348 86 4 11/7/1996 2
102 10349 75 7 11/8/1996 1
103 10350 41 6 11/11/1996 2
104 10351 20 1 11/11/1996 1
105 10352 28 3 11/12/1996 3
106 10353 59 7 11/13/1996 3
107 10354 58 8 11/14/1996 3
108 10355 4 6 11/15/1996 1
109 10356 86 6 11/18/1996 2
110 10357 46 1 11/19/1996 3
111 10358 41 5 11/20/1996 1
112 10359 72 5 11/21/1996 3
113 10360 7 4 11/22/1996 3
114 10361 63 1 11/22/1996 2
115 10362 9 3 11/25/1996 1
116 10363 17 4 11/26/1996 3
117 10364 19 1 11/26/1996 1
118 10365 3 3 11/27/1996 2
119 10366 29 8 11/28/1996 2
120 10367 83 7 11/28/1996 3
121 10368 20 2 11/29/1996 2
122 10369 75 8 12/2/1996 2
123 10370 14 6 12/3/1996 2
124 10371 41 1 12/3/1996 1
125 10372 62 5 12/4/1996 2
126 10373 37 4 12/5/1996 3
127 10374 91 1 12/5/1996 3
128 10375 36 3 12/6/1996 2
129 10376 51 1 12/9/1996 2
130 10377 72 1 12/9/1996 3
131 10378 24 5 12/10/1996 3
132 10379 61 2 12/11/1996 1
133 10380 37 8 12/12/1996 3
134 10381 46 3 12/12/1996 3
135 10382 20 4 12/13/1996 1
136 10383 4 8 12/16/1996 3
137 10384 5 3 12/16/1996 3
138 10385 75 1 12/17/1996 2
139 10386 21 9 12/18/1996 3
140 10387 70 1 12/18/1996 2
141 10388 72 2 12/19/1996 1
142 10389 10 4 12/20/1996 2
143 10390 20 6 12/23/1996 1
144 10391 17 3 12/23/1996 3
145 10392 59 2 12/24/1996 3
146 10393 71 1 12/25/1996 3
147 10394 36 1 12/25/1996 3
148 10395 35 6 12/26/1996 1
149 10396 25 1 12/27/1996 3
150 10397 60 5 12/27/1996 1
151 10398 71 2 12/30/1996 3
152 10399 83 8 12/31/1996 3
153 10400 19 1 1/1/1997 3
154 10401 65 1 1/1/1997 1
155 10402 20 8 1/2/1997 2
156 10403 20 4 1/3/1997 3
157 10404 49 2 1/3/1997 1
158 10405 47 1 1/6/1997 1
159 10406 62 7 1/7/1997 1
160 10407 56 2 1/7/1997 2
161 10408 23 8 1/8/1997 1
162 10409 54 3 1/9/1997 1
163 10410 10 3 1/10/1997 3
164 10411 10 9 1/10/1997 3
165 10412 87 8 1/13/1997 2
166 10413 41 3 1/14/1997 2
167 10414 21 2 1/14/1997 3
168 10415 36 3 1/15/1997 1
169 10416 87 8 1/16/1997 3
170 10417 73 4 1/16/1997 3
171 10418 63 4 1/17/1997 1
172 10419 68 4 1/20/1997 2
173 10420 88 3 1/21/1997 1
174 10421 61 8 1/21/1997 1
175 10422 27 2 1/22/1997 1
176 10423 31 6 1/23/1997 3
177 10424 51 7 1/23/1997 2
178 10425 41 6 1/24/1997 2
179 10426 29 4 1/27/1997 1
180 10427 59 4 1/27/1997 2
181 10428 66 7 1/28/1997 1
182 10429 37 3 1/29/1997 2
183 10430 20 4 1/30/1997 1
184 10431 10 4 1/30/1997 2
185 10432 75 3 1/31/1997 2
186 10433 60 3 2/3/1997 3
187 10434 24 3 2/3/1997 2
188 10435 16 8 2/4/1997 2
189 10436 7 3 2/5/1997 2
190 10437 87 8 2/5/1997 1
191 10438 79 3 2/6/1997 2
192 10439 51 6 2/7/1997 3
193 10440 71 4 2/10/1997 2
194 10441 55 3 2/10/1997 2
195 10442 20 3 2/11/1997 2
196 10443 66 8 2/12/1997 1
sqldf("select * from customers")
CustomerID CustomerName ContactName Address City PostalCode Country
1 1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constituci\xf3n 2222 M\xe9xico D.F. 5021 Mexico
3 3 Antonio Moreno Taquer\xeda Antonio Moreno Mataderos 2312 M\xe9xico D.F. 5023 Mexico
4 4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 5 Berglunds snabbk\xf6p Christina Berglund Berguvsv\xe4gen 8 Lule\xe5 S-958 22 Sweden
6 6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 7 Blondel p\xe8re et fils Fr\xe9d\xe9rique Citeaux 24, place Kl\xe9ber Strasbourg 67000 France
8 8 B\xf3lido Comidas preparadas Mart\xedn Sommer C/ Araquil, 67 Madrid 28023 Spain
9 9 Bon app' Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
10 10 Bottom-Dollar Marketse Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen T2F 8M4 Canada
11 11 B's Beverages Victoria Ashworth Fauntleroy Circus London EC2 5NT UK
12 12 Cactus Comidas para llevar Patricio Simpson Cerrito 333 Buenos Aires 1010 Argentina
13 13 Centro comercial Moctezuma Francisco Chang Sierras de Granada 9993 M\xe9xico D.F. 5022 Mexico
14 14 Chop-suey Chinese Yang Wang Hauptstr. 29 Bern 3012 Switzerland
15 15 Com\xe9rcio Mineiro Pedro Afonso Av. dos Lus\xedadas, 23 S\xe3o Paulo 05432-043 Brazil
16 16 Consolidated Holdings Elizabeth Brown Berkeley Gardens 12 Brewery London WX1 6LT UK
17 17 Drachenblut Delikatessend Sven Ottlieb Walserweg 21 Aachen 52066 Germany
18 18 Du monde entier Janine Labrune 67, rue des Cinquante Otages Nantes 44000 France
19 19 Eastern Connection Ann Devon 35 King George London WX3 6FW UK
20 20 Ernst Handel Roland Mendel Kirchgasse 6 Graz 8010 Austria
21 21 Familia Arquibaldo Aria Cruz Rua Or\xf3s, 92 S\xe3o Paulo 05442-030 Brazil
22 22 FISSA Fabrica Inter. Salchichas S.A. Diego Roel C/ Moralzarzal, 86 Madrid 28034 Spain
23 23 Folies gourmandes Martine Ranc\xe9 184, chauss\xe9e de Tournai Lille 59000 France
24 24 Folk och f\xe4 HB Maria Larsson \xc5kergatan 24 Br\xe4cke S-844 67 Sweden
25 25 Frankenversand Peter Franken Berliner Platz 43 M\xfcnchen 80805 Germany
26 26 France restauration Carine Schmitt 54, rue Royale Nantes 44000 France
27 27 Franchi S.p.A. Paolo Accorti Via Monte Bianco 34 Torino 10100 Italy
28 28 Furia Bacalhau e Frutos do Mar Lino Rodriguez Jardim das rosas n. 32 Lisboa 1675 Portugal
29 29 Galer\xeda del gastr\xf3nomo Eduardo Saavedra Rambla de Catalu\xf1a, 23 Barcelona 8022 Spain
30 30 Godos Cocina T\xedpica Jos\xe9 Pedro Freyre C/ Romero, 33 Sevilla 41101 Spain
31 31 Gourmet Lanchonetes Andr\xe9 Fonseca Av. Brasil, 442 Campinas 04876-786 Brazil
32 32 Great Lakes Food Market Howard Snyder 2732 Baker Blvd. Eugene 97403 USA
33 33 GROSELLA-Restaurante Manuel Pereira 5\xaa Ave. Los Palos Grandes Caracas 1081 Venezuela
34 34 Hanari Carnes Mario Pontes Rua do Pa\xe7o, 67 Rio de Janeiro 05454-876 Brazil
35 35 HILARI\xd3N-Abastos Carlos Hern\xe1ndez Carrera 22 con Ave. Carlos Soublette #8-35 San Crist\xf3bal 5022 Venezuela
36 36 Hungry Coyote Import Store Yoshi Latimer City Center Plaza 516 Main St. Elgin 97827 USA
37 37 Hungry Owl All-Night Grocers Patricia McKenna 8 Johnstown Road Cork Ireland
38 38 Island Trading Helen Bennett Garden House Crowther Way Cowes PO31 7PJ UK
39 39 K\xf6niglich Essen Philip Cramer Maubelstr. 90 Brandenburg 14776 Germany
40 40 La corne d'abondance Daniel Tonini 67, avenue de l'Europe Versailles 78000 France
41 41 La maison d'Asie Annette Roulet 1 rue Alsace-Lorraine Toulouse 31000 France
42 42 Laughing Bacchus Wine Cellars Yoshi Tannamuri 1900 Oak St. Vancouver V3F 2K1 Canada
43 43 Lazy K Kountry Store John Steel 12 Orchestra Terrace Walla Walla 99362 USA
44 44 Lehmanns Marktstand Renate Messner Magazinweg 7 Frankfurt a.M. 60528 Germany
45 45 Let's Stop N Shop Jaime Yorres 87 Polk St. Suite 5 San Francisco 94117 USA
46 46 LILA-Supermercado Carlos Gonz\xe1lez Carrera 52 con Ave. Bol\xedvar #65-98 Llano Largo Barquisimeto 3508 Venezuela
47 47 LINO-Delicateses Felipe Izquierdo Ave. 5 de Mayo Porlamar I. de Margarita 4980 Venezuela
48 48 Lonesome Pine Restaurant Fran Wilson 89 Chiaroscuro Rd. Portland 97219 USA
49 49 Magazzini Alimentari Riuniti Giovanni Rovelli Via Ludovico il Moro 22 Bergamo 24100 Italy
50 50 Maison Dewey Catherine Dewey Rue Joseph-Bens 532 Bruxelles B-1180 Belgium
51 51 M\xe8re Paillarde Jean Fresni\xe8re 43 rue St. Laurent Montr\xe9al H1J 1C3 Canada
52 52 Morgenstern Gesundkost Alexander Feuer Heerstr. 22 Leipzig 4179 Germany
53 53 North/South Simon Crowther South House 300 Queensbridge London SW7 1RZ UK
54 54 Oc\xe9ano Atl\xe1ntico Ltda. Yvonne Moncada Ing. Gustavo Moncada 8585 Piso 20-A Buenos Aires 1010 Argentina
55 55 Old World Delicatessen Rene Phillips 2743 Bering St. Anchorage 99508 USA
56 56 Ottilies K\xe4seladen Henriette Pfalzheim Mehrheimerstr. 369 K\xf6ln 50739 Germany
57 57 Paris sp\xe9cialit\xe9s Marie Bertrand 265, boulevard Charonne Paris 75012 France
58 58 Pericles Comidas cl\xe1sicas Guillermo Fern\xe1ndez Calle Dr. Jorge Cash 321 M\xe9xico D.F. 5033 Mexico
59 59 Piccolo und mehr Georg Pipps Geislweg 14 Salzburg 5020 Austria
60 60 Princesa Isabel Vinhoss Isabel de Castro Estrada da sa\xfade n. 58 Lisboa 1756 Portugal
61 61 Que Del\xedcia Bernardo Batista Rua da Panificadora, 12 Rio de Janeiro 02389-673 Brazil
62 62 Queen Cozinha L\xfacia Carvalho Alameda dos Can\xe0rios, 891 S\xe3o Paulo 05487-020 Brazil
63 63 QUICK-Stop Horst Kloss Taucherstra\xdfe 10 Cunewalde 1307 Germany
64 64 Rancho grande Sergio Guti\xe9rrez Av. del Libertador 900 Buenos Aires 1010 Argentina
65 65 Rattlesnake Canyon Grocery Paula Wilson 2817 Milton Dr. Albuquerque 87110 USA
66 66 Reggiani Caseifici Maurizio Moroni Strada Provinciale 124 Reggio Emilia 42100 Italy
67 67 Ricardo Adocicados Janete Limeira Av. Copacabana, 267 Rio de Janeiro 02389-890 Brazil
68 68 Richter Supermarkt Michael Holz Grenzacherweg 237 Gen\xe8ve 1203 Switzerland
69 69 Romero y tomillo Alejandra Camino Gran V\xeda, 1 Madrid 28001 Spain
70 70 Sant\xe9 Gourmet Jonas Bergulfsen Erling Skakkes gate 78 Stavern 4110 Norway
71 71 Save-a-lot Markets Jose Pavarotti 187 Suffolk Ln. Boise 83720 USA
72 72 Seven Seas Imports Hari Kumar 90 Wadhurst Rd. London OX15 4NB UK
73 73 Simons bistro Jytte Petersen Vinb\xe6ltet 34 K\xf8benhavn 1734 Denmark
74 74 Sp\xe9cialit\xe9s du monde Dominique Perrier 25, rue Lauriston Paris 75016 France
75 75 Split Rail Beer & Ale Art Braunschweiger P.O. Box 555 Lander 82520 USA
76 76 Supr\xeames d\xe9lices Pascale Cartrain Boulevard Tirou, 255 Charleroi B-6000 Belgium
77 77 The Big Cheese Liz Nixon 89 Jefferson Way Suite 2 Portland 97201 USA
78 78 The Cracker Box Liu Wong 55 Grizzly Peak Rd. Butte 59801 USA
79 79 Toms Spezialit\xe4ten Karin Josephs Luisenstr. 48 M\xfcnster 44087 Germany
80 80 Tortuga Restaurante Miguel Angel Paolino Avda. Azteca 123 M\xe9xico D.F. 5033 Mexico
81 81 Tradi\xe7\xe3o Hipermercados Anabela Domingues Av. In\xeas de Castro, 414 S\xe3o Paulo 05634-030 Brazil
82 82 Trail's Head Gourmet Provisioners Helvetius Nagy 722 DaVinci Blvd. Kirkland 98034 USA
83 83 Vaffeljernet Palle Ibsen Smagsl\xf8get 45 \xc5rhus 8200 Denmark
84 84 Victuailles en stock Mary Saveley 2, rue du Commerce Lyon 69004 France
85 85 Vins et alcools Chevalier Paul Henriot 59 rue de l'Abbaye Reims 51100 France
86 86 Die Wandernde Kuh Rita M\xfcller Adenauerallee 900 Stuttgart 70563 Germany
87 87 Wartian Herkku Pirkko Koskitalo Torikatu 38 Oulu 90110 Finland
88 88 Wellington Importadora Paula Parente Rua do Mercado, 12 Resende 08737-363 Brazil
89 89 White Clover Markets Karl Jablonski 305 - 14th Ave. S. Suite 3B Seattle 98128 USA
90 90 Wilman Kala Matti Karttunen Keskuskatu 45 Helsinki 21240 Finland
91 91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland
sqldf("select * from north_american_cities")
city country population latitude longitude
1 Guadalajara Mexico 1500800 20.65970 -103.34961
2 Toronto Canada 2795060 43.65323 -79.38318
3 Houston United States 2195914 29.76043 -95.36980
4 New York United States 8405837 40.71278 -74.00594
5 Philadelphia United States 1553165 39.95258 -75.16522
6 Havana Cuba 2106146 23.05407 -82.34519
7 Mexico City Mexico 8555500 19.43261 -99.13321
8 Phoenix United States 1513367 33.44838 -112.07404
9 Los Angeles United States 3884307 34.05223 -118.24368
10 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
11 Montreal Canada 1717767 45.50169 -73.56726
12 Chicago United States 2718782 41.87811 -87.62980
15.4 SQL SELECT statement
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SQL select statement ####
#
# The select statement is used to retrieve information from a relational
# database table (or in our case, from an R dataframe). The select statement
# has several "clauses". The following are the allowed clauses. The ... after
# each clause name represents additional text that needs to be typed. We will
# learn how to use each of these different clauses below.
#
# The following are the clauses of a SQL select statement:
#
# select ...
# from ...
# where ...
# group by ..
# having ...
# order by ..
# limit ... (NOTE: the limit clause is found in many SQL implementations but is not standard)
#
#
# NOTES:
#
# 1. All of the clauses are optional, except for the select clause.
#
# 2. The clauses that appear in a particular select statement must be
# in the order shown above.
#
# 3. The limit clause is not a standard part of SQL but is commonly used in many flavors
# of the SQL language.
#
# 4. According to the ANSI SQL standard, SQL is NOT case sensitive.
# In other words you can type "SELECT ..." or "select ...".
#
# Often, programmers will capitalize the clause names in select statements however,
# this is not required.
#
# Some SQL products do make SQL case sensitive. This is especially true
# when it comes to names of tables and names of columns in the tables.
#
# The sqldf package that we will be using is case-insensitve (i.e. case does
# NOT matter) for everything EXCEPT for table names (i.e. dataframe names).
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.5 How to use the R sqldf() function
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The sqldf function accepts a single argument that is expected to
# be a SQL SELECT command. The table names in the SELECT command are expected to
# be R dataframes.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# See help info about the sqldf package
help(package="sqldf")
# See help about the sqldf function that is part of the sqldf package
?sqldf
15.6 Intro to the SQL SELECT command
# select clause specifies which columns you'd like to
# see in the results.
sqldf("select student, test1, test2
from grades")
student test1 test2
1 joe 70 82
2 sue 80 80
3 anne 90 95
4 frank 100 95
5 bob 30 NA
6 samantha 100 70
7 larry 32 80
8 bill 84 90
9 lucricia 80 100
10 margaret 59 47
11 dalia 85 92
12 zvi 90 98
13 mike 90 86
14 david 90 87
15 dana 100 93
15.7 SELECT * FROM …..
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SELECT * FROM ....
#
# You can type * instead of the column names to get all of the columns.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sqldf("select * from grades") # display the entire grades table
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0200 sue fr FALSE 80 80 79 IDS
3 s0300 anne fr TRUE 90 95 92 IDS
4 s0400 frank so TRUE 100 95 91 ACC
5 s0500 bob so FALSE 30 NA NA IDS
6 s0600 samantha so TRUE 100 70 97 IDS
7 s0700 larry ju FALSE 32 80 NA FIN
8 s0800 bill se FALSE 84 90 92 ACC
9 s0900 lucricia fr TRUE 80 100 88 IDS
10 s1000 margaret se FALSE 59 47 NA IDS
11 s1100 dalia se FALSE 85 92 93 FIN
12 s1200 zvi ju TRUE 90 98 92 ACC
13 s1300 mike ju TRUE 90 86 79 IDS
14 s1400 david se TRUE 90 87 87 IDS
15 s1500 dana so FALSE 100 93 91 FIN
15.8 capturing the output of sqldf in R
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The output of sqldf is another dataframe ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# In a relational database there is usually a way to
# capture the output of a select statement in a different table
# however, the actual mechanism is not standard. MANY sql database
# software products use the "into" clause of the select statement
# to do so. I am not going to talk about that here because that
# really depends on the exact sql product you are using.
#
# In R, when using the sqldf function, you can capture the output of sqldf
# in another variable. This simply creates a new R dataframe (which in sqldf
# is analogous to a database table).
#
# Remember though that the actual sqldf() function in R is NOT technically
# part of the SQL language. It is simply how we can use SQL commands to
# manipulate R dataframes.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example: Capture the output of the sql command in a new dataframe
= sqldf("select student, test1, test2
justTest1 from grades")
# Here is the original table
sqldf("select * from grades")
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0200 sue fr FALSE 80 80 79 IDS
3 s0300 anne fr TRUE 90 95 92 IDS
4 s0400 frank so TRUE 100 95 91 ACC
5 s0500 bob so FALSE 30 NA NA IDS
6 s0600 samantha so TRUE 100 70 97 IDS
7 s0700 larry ju FALSE 32 80 NA FIN
8 s0800 bill se FALSE 84 90 92 ACC
9 s0900 lucricia fr TRUE 80 100 88 IDS
10 s1000 margaret se FALSE 59 47 NA IDS
11 s1100 dalia se FALSE 85 92 93 FIN
12 s1200 zvi ju TRUE 90 98 92 ACC
13 s1300 mike ju TRUE 90 86 79 IDS
14 s1400 david se TRUE 90 87 87 IDS
15 s1500 dana so FALSE 100 93 91 FIN
# Here is the new table
sqldf("select * from justTest1")
student test1 test2
1 joe 70 82
2 sue 80 80
3 anne 90 95
4 frank 100 95
5 bob 30 NA
6 samantha 100 70
7 larry 32 80
8 bill 84 90
9 lucricia 80 100
10 margaret 59 47
11 dalia 85 92
12 zvi 90 98
13 mike 90 86
14 david 90 87
15 dana 100 93
15.9 case-sensitivity in SQL
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# A note about case-sensitivity ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# NOTE: as mentioned above, sqldf is case-INsensitive regarding parts of the
# command OTHER THAN the table names (i.e. the dataframe names).
# To demonstrate, the following will produce the same results as the
# command above.
# same results as above
sqldf("SELECT STUDENT, TEST1, TEST2
FROM grades")
student test1 test2
1 joe 70 82
2 sue 80 80
3 anne 90 95
4 frank 100 95
5 bob 30 NA
6 samantha 100 70
7 larry 32 80
8 bill 84 90
9 lucricia 80 100
10 margaret 59 47
11 dalia 85 92
12 zvi 90 98
13 mike 90 86
14 david 90 87
15 dana 100 93
# However, the following command will NOT work since there is no variable
# named GRADES (the name is grades)
# ERROR - should be "grades", not "GRADES"
sqldf("SELECT STUDENT, TEST1, TEST2
FROM GRADES") # Error: no such table: GRADES
Error: no such table: GRADES
15.10 order by
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ORDER BY ####
#
# The ORDER BY clause allows you to specify the order that
# the rows should appear in the results.
# Numbers are ordered numerically.
# Character data is ordered alphabetically.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Order by test1
sqldf("select student, year, test1, test2
from grades
order by test1")
student year test1 test2
1 bob so 30 NA
2 larry ju 32 80
3 margaret se 59 47
4 joe fr 70 82
5 sue fr 80 80
6 lucricia fr 80 100
7 bill se 84 90
8 dalia se 85 92
9 anne fr 90 95
10 zvi ju 90 98
11 mike ju 90 86
12 david se 90 87
13 frank so 100 95
14 samantha so 100 70
15 dana so 100 93
# Order by student
sqldf("select student, year, test1, test2
from grades
order by student")
student year test1 test2
1 anne fr 90 95
2 bill se 84 90
3 bob so 30 NA
4 dalia se 85 92
5 dana so 100 93
6 david se 90 87
7 frank so 100 95
8 joe fr 70 82
9 larry ju 32 80
10 lucricia fr 80 100
11 margaret se 59 47
12 mike ju 90 86
13 samantha so 100 70
14 sue fr 80 80
15 zvi ju 90 98
# Order by year
sqldf("select student, year, test1, test2
from grades
order by year")
student year test1 test2
1 joe fr 70 82
2 sue fr 80 80
3 anne fr 90 95
4 lucricia fr 80 100
5 larry ju 32 80
6 zvi ju 90 98
7 mike ju 90 86
8 bill se 84 90
9 margaret se 59 47
10 dalia se 85 92
11 david se 90 87
12 frank so 100 95
13 bob so 30 NA
14 samantha so 100 70
15 dana so 100 93
#..........................................................................
# You can specify asc to indicate "ascending" order (this is the default)
#
# You can specify desc to indicate "descending" order
#..........................................................................
# specify asc for order (i.e. ascending) - this is the default.
# if you dont' specify asc you will get the same results.
sqldf("select student, year, test1, test2
from grades
order by test1 asc")
student year test1 test2
1 bob so 30 NA
2 larry ju 32 80
3 margaret se 59 47
4 joe fr 70 82
5 sue fr 80 80
6 lucricia fr 80 100
7 bill se 84 90
8 dalia se 85 92
9 anne fr 90 95
10 zvi ju 90 98
11 mike ju 90 86
12 david se 90 87
13 frank so 100 95
14 samantha so 100 70
15 dana so 100 93
# reversed - order is descending, ie. desc
sqldf("select student, year, test1, test2
from grades
order by test1 desc")
student year test1 test2
1 frank so 100 95
2 samantha so 100 70
3 dana so 100 93
4 anne fr 90 95
5 zvi ju 90 98
6 mike ju 90 86
7 david se 90 87
8 dalia se 85 92
9 bill se 84 90
10 sue fr 80 80
11 lucricia fr 80 100
12 joe fr 70 82
13 margaret se 59 47
14 larry ju 32 80
15 bob so 30 NA
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# You can specify more than one column in the order by clause
# The 2nd specified column will only have an effect for rows
# that have the same value in the first specified column.
#
# Notice that there is a separate asc or desc indicator
# for each of the columns in the order by clause
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#.............................
# ordering by a single column
#.............................
# order the data in increasing order of test1 value
# Notice that the year values are NOT in order
sqldf("select student, year, test1
from grades
order by test1 asc")
student year test1
1 bob so 30
2 larry ju 32
3 margaret se 59
4 joe fr 70
5 sue fr 80
6 lucricia fr 80
7 bill se 84
8 dalia se 85
9 anne fr 90
10 zvi ju 90
11 mike ju 90
12 david se 90
13 frank so 100
14 samantha so 100
15 dana so 100
# order the data in increasing order of year value (alphabetical order)
# Notice that the test values are NOT in order
sqldf("select student, year, test1
from grades
order by year asc")
student year test1
1 joe fr 70
2 sue fr 80
3 anne fr 90
4 lucricia fr 80
5 larry ju 32
6 zvi ju 90
7 mike ju 90
8 bill se 84
9 margaret se 59
10 dalia se 85
11 david se 90
12 frank so 100
13 bob so 30
14 samantha so 100
15 dana so 100
#..................................................
# order the data by two different columns
#..................................................
# Order the data alphabetically by year.
# For all of the rows for a particular year, sort the test1 values
# in ascending numeric order.
sqldf("select student, year, test1, test2
from grades
order by year asc, test1 asc")
student year test1 test2
1 joe fr 70 82
2 sue fr 80 80
3 lucricia fr 80 100
4 anne fr 90 95
5 larry ju 32 80
6 zvi ju 90 98
7 mike ju 90 86
8 margaret se 59 47
9 bill se 84 90
10 dalia se 85 92
11 david se 90 87
12 bob so 30 NA
13 frank so 100 95
14 samantha so 100 70
15 dana so 100 93
# Order the data by alphabetically by year.
# For all of the rows for a particular year, sort the test1 values
# in descending numeric order.
sqldf("select student, year, test1, test2
from grades
order by year asc, test1 desc")
student year test1 test2
1 anne fr 90 95
2 sue fr 80 80
3 lucricia fr 80 100
4 joe fr 70 82
5 zvi ju 90 98
6 mike ju 90 86
7 larry ju 32 80
8 david se 90 87
9 dalia se 85 92
10 bill se 84 90
11 margaret se 59 47
12 frank so 100 95
13 samantha so 100 70
14 dana so 100 93
15 bob so 30 NA
# Order the data by test1.
# For all the rows that have the same value for test1 sort the rows by test2
sqldf("select student, test1, test2
from grades
order by test1 desc, test2 desc")
student test1 test2
1 frank 100 95
2 dana 100 93
3 samantha 100 70
4 zvi 90 98
5 anne 90 95
6 david 90 87
7 mike 90 86
8 dalia 85 92
9 bill 84 90
10 lucricia 80 100
11 sue 80 80
12 joe 70 82
13 margaret 59 47
14 larry 32 80
15 bob 30 NA
# Show the rows in descending order of the test1 grades.
# If several students got the same grade on test1 then show
# those rows in alphabetical order of the student names.
sqldf("select student, test1, test2
from grades
order by test1 desc, student asc")
student test1 test2
1 dana 100 93
2 frank 100 95
3 samantha 100 70
4 anne 90 95
5 david 90 87
6 mike 90 86
7 zvi 90 98
8 dalia 85 92
9 bill 84 90
10 lucricia 80 100
11 sue 80 80
12 joe 70 82
13 margaret 59 47
14 larry 32 80
15 bob 30 NA
#..................................................
# order the data by three different columns
#..................................................
# Order all of the rows by year
#
# among the rows that have the same year
# order within those rows by test1
#
# among the rows that have the same year and test1 values
# order within those rows by test2
sqldf("select student, year, test1, test2
from grades
order by year, test1 desc, test2 desc")
student year test1 test2
1 anne fr 90 95
2 lucricia fr 80 100
3 sue fr 80 80
4 joe fr 70 82
5 zvi ju 90 98
6 mike ju 90 86
7 larry ju 32 80
8 david se 90 87
9 dalia se 85 92
10 bill se 84 90
11 margaret se 59 47
12 frank so 100 95
13 dana so 100 93
14 samantha so 100 70
15 bob so 30 NA
15.11 limit clause
###############################################################.
# LIMIT <M> and LIMIT <M> OFFSET <N>
###############################################################.
#
# The LIMIT clause must come at the very end of the SQL
# statement. You can use it to display only some of the
# rows that would otherwise have been displayed.
###############################################################.
For example:
# Display the first 3 students
sqldf("SELECT *
FROM grades
ORDER BY student
LIMIT 3")
id student year honors test1 test2 test3 major
1 s0300 anne fr TRUE 90 95 92 IDS
2 s0800 bill se FALSE 84 90 92 ACC
3 s0500 bob so FALSE 30 NA NA IDS
###############################################################.
# In general the limit clause has two different forms:
#
# <SOME SELECT STATEMENT>
# limit M
#
# (where M is an integer) results in only the first
# M rows of data from what would normally have been displayed
# had the LIMIT clause not been specified.
#
# <SOME SELECT STATEMENT>
# limit M offset N
#
# (where both M and N are integers)
# starts the output from the N+1'th row of what would normally
# have been displayed without the limit clause and then
# displays the next M rows of data.
#
# LIMIT is often used in combination with the ORDER BY clause.
###############################################################.
For example:
# Display the first 3 students
sqldf("SELECT *
FROM grades
ORDER BY student
LIMIT 3")
id student year honors test1 test2 test3 major
1 s0300 anne fr TRUE 90 95 92 IDS
2 s0800 bill se FALSE 84 90 92 ACC
3 s0500 bob so FALSE 30 NA NA IDS
# Display the next 3 students
sqldf("SELECT *
FROM grades
ORDER BY student
LIMIT 3 OFFSET 3")
id student year honors test1 test2 test3 major
1 s1100 dalia se FALSE 85 92 93 FIN
2 s1500 dana so FALSE 100 93 91 FIN
3 s1400 david se TRUE 90 87 87 IDS
# Display the next 3 students
sqldf("SELECT *
FROM grades
ORDER BY student
LIMIT 3 OFFSET 6")
id student year honors test1 test2 test3 major
1 s0400 frank so TRUE 100 95 91 ACC
2 s0100 joe fr FALSE 70 82 NA IDS
3 s0700 larry ju FALSE 32 80 NA FIN
15.12 Using calculations in the select clause
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Using calculations in the select clause ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# You can use math or SQL functions to modify the values
# that are returned. We will discuss SQL functions later.
# For now, let's demonstrate the concept using math.
# See the questions and answers below:
#/////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Display the student, test1 and test2 columns from the grades table.
# In addition, display a column named that contains
# the values from the test2 column with 10 points added to each number.
#/////////////////////////////////////////////////////////////////////////
# ANSWER
sqldf("select student, test1, test2, test2 + 10
from grades")
student test1 test2 test2 + 10
1 joe 70 82 92
2 sue 80 80 90
3 anne 90 95 105
4 frank 100 95 105
5 bob 30 NA NA
6 samantha 100 70 80
7 larry 32 80 90
8 bill 84 90 100
9 lucricia 80 100 110
10 margaret 59 47 57
11 dalia 85 92 102
12 zvi 90 98 108
13 mike 90 86 96
14 david 90 87 97
15 dana 100 93 103
15.13 Renaming columns
#/////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Answer the same question, but this time change the name of the new
# column to "test2_curved".
#/////////////////////////////////////////////////////////////////////////
# ANSWER
# Use AS to change the name of the column in the output
sqldf("select student, test1, test2, test2 + 10 as test2_curved
from grades")
student test1 test2 test2_curved
1 joe 70 82 92
2 sue 80 80 90
3 anne 90 95 105
4 frank 100 95 105
5 bob 30 NA NA
6 samantha 100 70 80
7 larry 32 80 90
8 bill 84 90 100
9 lucricia 80 100 110
10 margaret 59 47 57
11 dalia 85 92 102
12 zvi 90 98 108
13 mike 90 86 96
14 david 90 87 97
15 dana 100 93 103
# You don't actually need to write "AS"
sqldf("select student, test1, test2, test2 + 10 test2_curved
from grades")
student test1 test2 test2_curved
1 joe 70 82 92
2 sue 80 80 90
3 anne 90 95 105
4 frank 100 95 105
5 bob 30 NA NA
6 samantha 100 70 80
7 larry 32 80 90
8 bill 84 90 100
9 lucricia 80 100 110
10 margaret 59 47 57
11 dalia 85 92 102
12 zvi 90 98 108
13 mike 90 86 96
14 david 90 87 97
15 dana 100 93 103
15.14 ‘Quoting’ column names that contain spaces (or other special characters)
In general, column names and table names should only contain letters and numbers. If the name of a column or a table needs to contain spaces or other “special characters” (e.g. !@#$%^&*()_+~~ etc.) you need to ‘quote’ the name using ‘apostrophes’ or “quotes”. See the following example which contains a space in the new column name.
# You must 'quote column names that contain spaces'
sqldf("select student, test1, test2, test2 + 10 'test2 curved'
from grades")
student test1 test2 test2 curved
1 joe 70 82 92
2 sue 80 80 90
3 anne 90 95 105
4 frank 100 95 105
5 bob 30 NA NA
6 samantha 100 70 80
7 larry 32 80 90
8 bill 84 90 100
9 lucricia 80 100 110
10 margaret 59 47 57
11 dalia 85 92 102
12 zvi 90 98 108
13 mike 90 86 96
14 david 90 87 97
15 dana 100 93 103
Note that in the SQLite dialect of SQL (which is what is being used in R with sqldf) you can use ‘apostrophes’ in place of quotes. We used ‘apostrophes’ instead of “quotes” to quote the new column name to avoid issues that would arise if we used “quotes” since we are already using “quotes” to surround the entire SQL statement. See the next section for more info.
15.15 ‘single quotes’ vs “double quotes”
In many areas of technology, ‘apostrophes’ are used to ‘quote’ text instead of “quotation marks”. Apostrophes are used in this way, are referred to as ‘single quotes’ whereas quotation marks are referred to as “double quotes” (since an apostophe has one short line while a single quotation mark is comprised of two short lines).
In SQL, there are two different situations where something needs to be quoted. One situation is as shown above to create column headings that contain spaces or special characters. Another situation is where you want to include some literal text (see the description of “concatenation” below).
In this presentation we are using ‘single quotes’ for most purposes inside our SQL statements. This is because SQLite (i.e. the dialect of SQL we are using here) allows for the use of either ‘single quotes’ or “double quotes” wherever you need to quote something. Using ‘single quotes’ allows us to sidestep issues that would arise with “double quotes”. Using double quotes inside the SQL statement would interfere with the “double quotes” that we are using to “quote” the entire sql statement in our call to sqldf(” … “). Other SQL envirnoments that do not rely on sqldf() do not have this issue.
It should be noted though that the ANSI SQL standard, calls for using double quotes for column names (as in the example above) and for using single quotes to quote textual values (as in the concatenation example below).
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 'single quotes' (ie. 'apostrophes') vs "double quotes" (ie. "quotes") ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Explicit textual values need to be 'quoted'.
#
# Standard SQL uses single quotes for quoting values.
#
# Many sql software packages allow for "double quotes" however, that is not
# officially part of the standard.
#
# The sqlite dialect of SQL (which is the default dialect used by sqldf)
# DOES recognize "double quotes" too. However, we will try to stick to
# 'single quotes' in order to adhere to the standard SQL notation.
#
# The following example combines the id and the student name into a single
# value with a dash (i.e. - ) between them. Notice the 'single quotes' around
# the dash.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.16 Concatenation: ‘abc’ || ‘def’ is ‘abcdef’
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# To "concatenate" two values means to "paste" them together
# into a single character value.
# ANSI SQL defines the || as the "concatenation" operator.
# This is similar to the & operator in Excel.
#
# Sqlite (which is used by R's sqldf package) uses the ANSI standard || operator.
#
# See the examples below.
#
# NOTE: Some other database software products use non-standard
# operators for concatenation. For example:
#
# * MySql does not have a concatenation operator. MySql uses || for logical or.
# Rather, it uses the function, concat(a, b), to concatenate a with b.
#
# * Microsoft SQL Server uses + for concatenation.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#///////////////////////////////////////////////////////////////
# QUESTION
#
# Show a column named id_student that contains the values from the
# the id and student columns concatenated together. Also show the
# test1, test2 and test3 columns.
#///////////////////////////////////////////////////////////////
# ANSWER
sqldf("select id || student as id_student, test1, test2, test3
from grades")
id_student test1 test2 test3
1 s0100joe 70 82 NA
2 s0200sue 80 80 79
3 s0300anne 90 95 92
4 s0400frank 100 95 91
5 s0500bob 30 NA NA
6 s0600samantha 100 70 97
7 s0700larry 32 80 NA
8 s0800bill 84 90 92
9 s0900lucricia 80 100 88
10 s1000margaret 59 47 NA
11 s1100dalia 85 92 93
12 s1200zvi 90 98 92
13 s1300mike 90 86 79
14 s1400david 90 87 87
15 s1500dana 100 93 91
#///////////////////////////////////////////////////////////////
# QUESTION
#
# Modify the answer to the previous question so that the
# "id_student" column contains a dash between the
# id and the name of the student.
#///////////////////////////////////////////////////////////////
# ANSWER:
#
# You need to use 'quotes' around the '-' dash.
# First concatenate the id with '-' (don't forget the 'single quotes').
# Then concatenate the student's name.
sqldf("select id || '-' || student as id_student, test1, test2, test3
from grades")
id_student test1 test2 test3
1 s0100-joe 70 82 NA
2 s0200-sue 80 80 79
3 s0300-anne 90 95 92
4 s0400-frank 100 95 91
5 s0500-bob 30 NA NA
6 s0600-samantha 100 70 97
7 s0700-larry 32 80 NA
8 s0800-bill 84 90 92
9 s0900-lucricia 80 100 88
10 s1000-margaret 59 47 NA
11 s1100-dalia 85 92 93
12 s1200-zvi 90 98 92
13 s1300-mike 90 86 79
14 s1400-david 90 87 87
15 s1500-dana 100 93 91
15.17 Rules for naming tables and columns
NOTE:
- “IDS1020-Intro to IDS” students can ignore this section.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Rules for naming tables and columns:
#
# What happens if you use non-standard characters in the names.
#
# - ANSI standard SQL - use "double quotes" for non-standard column or
# table names ( NOT 'single quotes' )
#
# - some other flavors of SQL may use other symbols for this purpose.
# For example:
# o Microsoft Access uses [square brackets] and
# o MySql uses `grave accent` characters (AKA `backticks`)
# (https://en.wikipedia.org/wiki/Grave_accent)
#
# - sqlite (i.e. the default version of SQL for sqldf) allows all of the
# above, i.e. "double quotes", `back ticks` and [square brackets]
# to surround non-standard names. However, it is best to use
# "double quotes" as that is the ANSI standard.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Every version of SQL has its own rules for which characters are allowed
# to be used in table names and column names. To be safe that your code will work
# on any version of SQL the best bet is to stick to the following rules for
# both table names and for column names:
#
# 1. only use letters, numbers and underscores
# 2. start the name with a letter
#
# However, many SQL versions allow for non-standard characters to be included
# in a table name or column name. However, if you use a non-standard character
# then you must 'quote' the name of the table or name of the column
# in SQL commands using the quoting rules provided by the version of SQL
# you are using. When using sqldf, if you use a non-standard character
# (e.g. a period, a space, etc) then you should surround the name of the
# table or column with "double quotes" (`backticks` and [square brackets] also
# work but "double quotes" are preferred as that is the ANSI standard).
# You cannot use 'single quotes' for this purpose.
#
# To demonstrate we can use the iris dataframe that is built into R.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.17.1 WARNING: In SQLite, you must use “double quotes” for non standard table or column names.
NOTE:
“IDS1020-Intro to IDS” students can ignore this section.
“IDS2460-Data Management” you should be aware of the following issue.
This doesn’t come up too often but if you are using sqldf to work with R dataframes you should be aware of the following issue. R dataframes often use periods in the column names. This is non-standard for SQL.
# Show the first 10 rows of the iris dataframe that comes built into R.
head(iris,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
#//////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Use sqldf to display just the Sepal.Length, Sepal.Width and Species columns
# from the "iris" dataframe.
#//////////////////////////////////////////////////////////////////////////
# ANSWER
#
# This dataframe has column names that contain periods. A period
# is NOT a standard character for table and column names. Therefore
# we must use "double quotes" or `back ticks` to quote these column names.
#
# This does NOT work because of the periods in the column names
# sqldf("select Sepal.Length, Sepal.Width, Species
# from iris") # ERROR: no such column Sepal.Length
# This works - i.e. use "double quotes". Note that we used
# 'single quotes' to quote the entire select statement.
sqldf('select "Sepal.Length", "Sepal.Width", Species from iris')
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5.0 3.4 setosa
9 4.4 2.9 setosa
10 4.9 3.1 setosa
11 5.4 3.7 setosa
12 4.8 3.4 setosa
13 4.8 3.0 setosa
14 4.3 3.0 setosa
15 5.8 4.0 setosa
16 5.7 4.4 setosa
17 5.4 3.9 setosa
18 5.1 3.5 setosa
19 5.7 3.8 setosa
20 5.1 3.8 setosa
21 5.4 3.4 setosa
22 5.1 3.7 setosa
23 4.6 3.6 setosa
24 5.1 3.3 setosa
25 4.8 3.4 setosa
26 5.0 3.0 setosa
27 5.0 3.4 setosa
28 5.2 3.5 setosa
29 5.2 3.4 setosa
30 4.7 3.2 setosa
31 4.8 3.1 setosa
32 5.4 3.4 setosa
33 5.2 4.1 setosa
34 5.5 4.2 setosa
35 4.9 3.1 setosa
36 5.0 3.2 setosa
37 5.5 3.5 setosa
38 4.9 3.6 setosa
39 4.4 3.0 setosa
40 5.1 3.4 setosa
41 5.0 3.5 setosa
42 4.5 2.3 setosa
43 4.4 3.2 setosa
44 5.0 3.5 setosa
45 5.1 3.8 setosa
46 4.8 3.0 setosa
47 5.1 3.8 setosa
48 4.6 3.2 setosa
49 5.3 3.7 setosa
50 5.0 3.3 setosa
51 7.0 3.2 versicolor
52 6.4 3.2 versicolor
53 6.9 3.1 versicolor
54 5.5 2.3 versicolor
55 6.5 2.8 versicolor
56 5.7 2.8 versicolor
57 6.3 3.3 versicolor
58 4.9 2.4 versicolor
59 6.6 2.9 versicolor
60 5.2 2.7 versicolor
61 5.0 2.0 versicolor
62 5.9 3.0 versicolor
63 6.0 2.2 versicolor
64 6.1 2.9 versicolor
65 5.6 2.9 versicolor
66 6.7 3.1 versicolor
67 5.6 3.0 versicolor
68 5.8 2.7 versicolor
69 6.2 2.2 versicolor
70 5.6 2.5 versicolor
71 5.9 3.2 versicolor
72 6.1 2.8 versicolor
73 6.3 2.5 versicolor
74 6.1 2.8 versicolor
75 6.4 2.9 versicolor
76 6.6 3.0 versicolor
77 6.8 2.8 versicolor
78 6.7 3.0 versicolor
79 6.0 2.9 versicolor
80 5.7 2.6 versicolor
81 5.5 2.4 versicolor
82 5.5 2.4 versicolor
83 5.8 2.7 versicolor
84 6.0 2.7 versicolor
85 5.4 3.0 versicolor
86 6.0 3.4 versicolor
87 6.7 3.1 versicolor
88 6.3 2.3 versicolor
89 5.6 3.0 versicolor
90 5.5 2.5 versicolor
91 5.5 2.6 versicolor
92 6.1 3.0 versicolor
93 5.8 2.6 versicolor
94 5.0 2.3 versicolor
95 5.6 2.7 versicolor
96 5.7 3.0 versicolor
97 5.7 2.9 versicolor
98 6.2 2.9 versicolor
99 5.1 2.5 versicolor
100 5.7 2.8 versicolor
101 6.3 3.3 virginica
102 5.8 2.7 virginica
103 7.1 3.0 virginica
104 6.3 2.9 virginica
105 6.5 3.0 virginica
106 7.6 3.0 virginica
107 4.9 2.5 virginica
108 7.3 2.9 virginica
109 6.7 2.5 virginica
110 7.2 3.6 virginica
111 6.5 3.2 virginica
112 6.4 2.7 virginica
113 6.8 3.0 virginica
114 5.7 2.5 virginica
115 5.8 2.8 virginica
116 6.4 3.2 virginica
117 6.5 3.0 virginica
118 7.7 3.8 virginica
119 7.7 2.6 virginica
120 6.0 2.2 virginica
121 6.9 3.2 virginica
122 5.6 2.8 virginica
123 7.7 2.8 virginica
124 6.3 2.7 virginica
125 6.7 3.3 virginica
126 7.2 3.2 virginica
127 6.2 2.8 virginica
128 6.1 3.0 virginica
129 6.4 2.8 virginica
130 7.2 3.0 virginica
131 7.4 2.8 virginica
132 7.9 3.8 virginica
133 6.4 2.8 virginica
134 6.3 2.8 virginica
135 6.1 2.6 virginica
136 7.7 3.0 virginica
137 6.3 3.4 virginica
138 6.4 3.1 virginica
139 6.0 3.0 virginica
140 6.9 3.1 virginica
141 6.7 3.1 virginica
142 6.9 3.1 virginica
143 5.8 2.7 virginica
144 6.8 3.2 virginica
145 6.7 3.3 virginica
146 6.7 3.0 virginica
147 6.3 2.5 virginica
148 6.5 3.0 virginica
149 6.2 3.4 virginica
150 5.9 3.0 virginica
# Same thing but we used "double quotes" to quote the entire select statement.
# Therefore we used \" for every double quote that is inside the command.
sqldf("select \"Sepal.Length\", \"Sepal.Width\", Species from iris")
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5.0 3.4 setosa
9 4.4 2.9 setosa
10 4.9 3.1 setosa
11 5.4 3.7 setosa
12 4.8 3.4 setosa
13 4.8 3.0 setosa
14 4.3 3.0 setosa
15 5.8 4.0 setosa
16 5.7 4.4 setosa
17 5.4 3.9 setosa
18 5.1 3.5 setosa
19 5.7 3.8 setosa
20 5.1 3.8 setosa
21 5.4 3.4 setosa
22 5.1 3.7 setosa
23 4.6 3.6 setosa
24 5.1 3.3 setosa
25 4.8 3.4 setosa
26 5.0 3.0 setosa
27 5.0 3.4 setosa
28 5.2 3.5 setosa
29 5.2 3.4 setosa
30 4.7 3.2 setosa
31 4.8 3.1 setosa
32 5.4 3.4 setosa
33 5.2 4.1 setosa
34 5.5 4.2 setosa
35 4.9 3.1 setosa
36 5.0 3.2 setosa
37 5.5 3.5 setosa
38 4.9 3.6 setosa
39 4.4 3.0 setosa
40 5.1 3.4 setosa
41 5.0 3.5 setosa
42 4.5 2.3 setosa
43 4.4 3.2 setosa
44 5.0 3.5 setosa
45 5.1 3.8 setosa
46 4.8 3.0 setosa
47 5.1 3.8 setosa
48 4.6 3.2 setosa
49 5.3 3.7 setosa
50 5.0 3.3 setosa
51 7.0 3.2 versicolor
52 6.4 3.2 versicolor
53 6.9 3.1 versicolor
54 5.5 2.3 versicolor
55 6.5 2.8 versicolor
56 5.7 2.8 versicolor
57 6.3 3.3 versicolor
58 4.9 2.4 versicolor
59 6.6 2.9 versicolor
60 5.2 2.7 versicolor
61 5.0 2.0 versicolor
62 5.9 3.0 versicolor
63 6.0 2.2 versicolor
64 6.1 2.9 versicolor
65 5.6 2.9 versicolor
66 6.7 3.1 versicolor
67 5.6 3.0 versicolor
68 5.8 2.7 versicolor
69 6.2 2.2 versicolor
70 5.6 2.5 versicolor
71 5.9 3.2 versicolor
72 6.1 2.8 versicolor
73 6.3 2.5 versicolor
74 6.1 2.8 versicolor
75 6.4 2.9 versicolor
76 6.6 3.0 versicolor
77 6.8 2.8 versicolor
78 6.7 3.0 versicolor
79 6.0 2.9 versicolor
80 5.7 2.6 versicolor
81 5.5 2.4 versicolor
82 5.5 2.4 versicolor
83 5.8 2.7 versicolor
84 6.0 2.7 versicolor
85 5.4 3.0 versicolor
86 6.0 3.4 versicolor
87 6.7 3.1 versicolor
88 6.3 2.3 versicolor
89 5.6 3.0 versicolor
90 5.5 2.5 versicolor
91 5.5 2.6 versicolor
92 6.1 3.0 versicolor
93 5.8 2.6 versicolor
94 5.0 2.3 versicolor
95 5.6 2.7 versicolor
96 5.7 3.0 versicolor
97 5.7 2.9 versicolor
98 6.2 2.9 versicolor
99 5.1 2.5 versicolor
100 5.7 2.8 versicolor
101 6.3 3.3 virginica
102 5.8 2.7 virginica
103 7.1 3.0 virginica
104 6.3 2.9 virginica
105 6.5 3.0 virginica
106 7.6 3.0 virginica
107 4.9 2.5 virginica
108 7.3 2.9 virginica
109 6.7 2.5 virginica
110 7.2 3.6 virginica
111 6.5 3.2 virginica
112 6.4 2.7 virginica
113 6.8 3.0 virginica
114 5.7 2.5 virginica
115 5.8 2.8 virginica
116 6.4 3.2 virginica
117 6.5 3.0 virginica
118 7.7 3.8 virginica
119 7.7 2.6 virginica
120 6.0 2.2 virginica
121 6.9 3.2 virginica
122 5.6 2.8 virginica
123 7.7 2.8 virginica
124 6.3 2.7 virginica
125 6.7 3.3 virginica
126 7.2 3.2 virginica
127 6.2 2.8 virginica
128 6.1 3.0 virginica
129 6.4 2.8 virginica
130 7.2 3.0 virginica
131 7.4 2.8 virginica
132 7.9 3.8 virginica
133 6.4 2.8 virginica
134 6.3 2.8 virginica
135 6.1 2.6 virginica
136 7.7 3.0 virginica
137 6.3 3.4 virginica
138 6.4 3.1 virginica
139 6.0 3.0 virginica
140 6.9 3.1 virginica
141 6.7 3.1 virginica
142 6.9 3.1 virginica
143 5.8 2.7 virginica
144 6.8 3.2 virginica
145 6.7 3.3 virginica
146 6.7 3.0 virginica
147 6.3 2.5 virginica
148 6.5 3.0 virginica
149 6.2 3.4 virginica
150 5.9 3.0 virginica
# This also works - use `back ticks`
sqldf('select `Sepal.Length`, `Sepal.Width`, Species from iris')
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5.0 3.4 setosa
9 4.4 2.9 setosa
10 4.9 3.1 setosa
11 5.4 3.7 setosa
12 4.8 3.4 setosa
13 4.8 3.0 setosa
14 4.3 3.0 setosa
15 5.8 4.0 setosa
16 5.7 4.4 setosa
17 5.4 3.9 setosa
18 5.1 3.5 setosa
19 5.7 3.8 setosa
20 5.1 3.8 setosa
21 5.4 3.4 setosa
22 5.1 3.7 setosa
23 4.6 3.6 setosa
24 5.1 3.3 setosa
25 4.8 3.4 setosa
26 5.0 3.0 setosa
27 5.0 3.4 setosa
28 5.2 3.5 setosa
29 5.2 3.4 setosa
30 4.7 3.2 setosa
31 4.8 3.1 setosa
32 5.4 3.4 setosa
33 5.2 4.1 setosa
34 5.5 4.2 setosa
35 4.9 3.1 setosa
36 5.0 3.2 setosa
37 5.5 3.5 setosa
38 4.9 3.6 setosa
39 4.4 3.0 setosa
40 5.1 3.4 setosa
41 5.0 3.5 setosa
42 4.5 2.3 setosa
43 4.4 3.2 setosa
44 5.0 3.5 setosa
45 5.1 3.8 setosa
46 4.8 3.0 setosa
47 5.1 3.8 setosa
48 4.6 3.2 setosa
49 5.3 3.7 setosa
50 5.0 3.3 setosa
51 7.0 3.2 versicolor
52 6.4 3.2 versicolor
53 6.9 3.1 versicolor
54 5.5 2.3 versicolor
55 6.5 2.8 versicolor
56 5.7 2.8 versicolor
57 6.3 3.3 versicolor
58 4.9 2.4 versicolor
59 6.6 2.9 versicolor
60 5.2 2.7 versicolor
61 5.0 2.0 versicolor
62 5.9 3.0 versicolor
63 6.0 2.2 versicolor
64 6.1 2.9 versicolor
65 5.6 2.9 versicolor
66 6.7 3.1 versicolor
67 5.6 3.0 versicolor
68 5.8 2.7 versicolor
69 6.2 2.2 versicolor
70 5.6 2.5 versicolor
71 5.9 3.2 versicolor
72 6.1 2.8 versicolor
73 6.3 2.5 versicolor
74 6.1 2.8 versicolor
75 6.4 2.9 versicolor
76 6.6 3.0 versicolor
77 6.8 2.8 versicolor
78 6.7 3.0 versicolor
79 6.0 2.9 versicolor
80 5.7 2.6 versicolor
81 5.5 2.4 versicolor
82 5.5 2.4 versicolor
83 5.8 2.7 versicolor
84 6.0 2.7 versicolor
85 5.4 3.0 versicolor
86 6.0 3.4 versicolor
87 6.7 3.1 versicolor
88 6.3 2.3 versicolor
89 5.6 3.0 versicolor
90 5.5 2.5 versicolor
91 5.5 2.6 versicolor
92 6.1 3.0 versicolor
93 5.8 2.6 versicolor
94 5.0 2.3 versicolor
95 5.6 2.7 versicolor
96 5.7 3.0 versicolor
97 5.7 2.9 versicolor
98 6.2 2.9 versicolor
99 5.1 2.5 versicolor
100 5.7 2.8 versicolor
101 6.3 3.3 virginica
102 5.8 2.7 virginica
103 7.1 3.0 virginica
104 6.3 2.9 virginica
105 6.5 3.0 virginica
106 7.6 3.0 virginica
107 4.9 2.5 virginica
108 7.3 2.9 virginica
109 6.7 2.5 virginica
110 7.2 3.6 virginica
111 6.5 3.2 virginica
112 6.4 2.7 virginica
113 6.8 3.0 virginica
114 5.7 2.5 virginica
115 5.8 2.8 virginica
116 6.4 3.2 virginica
117 6.5 3.0 virginica
118 7.7 3.8 virginica
119 7.7 2.6 virginica
120 6.0 2.2 virginica
121 6.9 3.2 virginica
122 5.6 2.8 virginica
123 7.7 2.8 virginica
124 6.3 2.7 virginica
125 6.7 3.3 virginica
126 7.2 3.2 virginica
127 6.2 2.8 virginica
128 6.1 3.0 virginica
129 6.4 2.8 virginica
130 7.2 3.0 virginica
131 7.4 2.8 virginica
132 7.9 3.8 virginica
133 6.4 2.8 virginica
134 6.3 2.8 virginica
135 6.1 2.6 virginica
136 7.7 3.0 virginica
137 6.3 3.4 virginica
138 6.4 3.1 virginica
139 6.0 3.0 virginica
140 6.9 3.1 virginica
141 6.7 3.1 virginica
142 6.9 3.1 virginica
143 5.8 2.7 virginica
144 6.8 3.2 virginica
145 6.7 3.3 virginica
146 6.7 3.0 virginica
147 6.3 2.5 virginica
148 6.5 3.0 virginica
149 6.2 3.4 virginica
150 5.9 3.0 virginica
# *** WARNING!!! ***
#
# BE CAREFUL - don't use 'single quotes' here. Single quotes will NOT
# work to quote column names or table names.
#
# The following will simply display the words 'Sepal.Length' and 'Sepal.Width'
# for every row in the output.
sqldf("select 'Sepal.Length', 'Sepal.Width', Species from iris")
'Sepal.Length' 'Sepal.Width' Species
1 Sepal.Length Sepal.Width setosa
2 Sepal.Length Sepal.Width setosa
3 Sepal.Length Sepal.Width setosa
4 Sepal.Length Sepal.Width setosa
5 Sepal.Length Sepal.Width setosa
6 Sepal.Length Sepal.Width setosa
7 Sepal.Length Sepal.Width setosa
8 Sepal.Length Sepal.Width setosa
9 Sepal.Length Sepal.Width setosa
10 Sepal.Length Sepal.Width setosa
11 Sepal.Length Sepal.Width setosa
12 Sepal.Length Sepal.Width setosa
13 Sepal.Length Sepal.Width setosa
14 Sepal.Length Sepal.Width setosa
15 Sepal.Length Sepal.Width setosa
16 Sepal.Length Sepal.Width setosa
17 Sepal.Length Sepal.Width setosa
18 Sepal.Length Sepal.Width setosa
19 Sepal.Length Sepal.Width setosa
20 Sepal.Length Sepal.Width setosa
21 Sepal.Length Sepal.Width setosa
22 Sepal.Length Sepal.Width setosa
23 Sepal.Length Sepal.Width setosa
24 Sepal.Length Sepal.Width setosa
25 Sepal.Length Sepal.Width setosa
26 Sepal.Length Sepal.Width setosa
27 Sepal.Length Sepal.Width setosa
28 Sepal.Length Sepal.Width setosa
29 Sepal.Length Sepal.Width setosa
30 Sepal.Length Sepal.Width setosa
31 Sepal.Length Sepal.Width setosa
32 Sepal.Length Sepal.Width setosa
33 Sepal.Length Sepal.Width setosa
34 Sepal.Length Sepal.Width setosa
35 Sepal.Length Sepal.Width setosa
36 Sepal.Length Sepal.Width setosa
37 Sepal.Length Sepal.Width setosa
38 Sepal.Length Sepal.Width setosa
39 Sepal.Length Sepal.Width setosa
40 Sepal.Length Sepal.Width setosa
41 Sepal.Length Sepal.Width setosa
42 Sepal.Length Sepal.Width setosa
43 Sepal.Length Sepal.Width setosa
44 Sepal.Length Sepal.Width setosa
45 Sepal.Length Sepal.Width setosa
46 Sepal.Length Sepal.Width setosa
47 Sepal.Length Sepal.Width setosa
48 Sepal.Length Sepal.Width setosa
49 Sepal.Length Sepal.Width setosa
50 Sepal.Length Sepal.Width setosa
51 Sepal.Length Sepal.Width versicolor
52 Sepal.Length Sepal.Width versicolor
53 Sepal.Length Sepal.Width versicolor
54 Sepal.Length Sepal.Width versicolor
55 Sepal.Length Sepal.Width versicolor
56 Sepal.Length Sepal.Width versicolor
57 Sepal.Length Sepal.Width versicolor
58 Sepal.Length Sepal.Width versicolor
59 Sepal.Length Sepal.Width versicolor
60 Sepal.Length Sepal.Width versicolor
61 Sepal.Length Sepal.Width versicolor
62 Sepal.Length Sepal.Width versicolor
63 Sepal.Length Sepal.Width versicolor
64 Sepal.Length Sepal.Width versicolor
65 Sepal.Length Sepal.Width versicolor
66 Sepal.Length Sepal.Width versicolor
67 Sepal.Length Sepal.Width versicolor
68 Sepal.Length Sepal.Width versicolor
69 Sepal.Length Sepal.Width versicolor
70 Sepal.Length Sepal.Width versicolor
71 Sepal.Length Sepal.Width versicolor
72 Sepal.Length Sepal.Width versicolor
73 Sepal.Length Sepal.Width versicolor
74 Sepal.Length Sepal.Width versicolor
75 Sepal.Length Sepal.Width versicolor
76 Sepal.Length Sepal.Width versicolor
77 Sepal.Length Sepal.Width versicolor
78 Sepal.Length Sepal.Width versicolor
79 Sepal.Length Sepal.Width versicolor
80 Sepal.Length Sepal.Width versicolor
81 Sepal.Length Sepal.Width versicolor
82 Sepal.Length Sepal.Width versicolor
83 Sepal.Length Sepal.Width versicolor
84 Sepal.Length Sepal.Width versicolor
85 Sepal.Length Sepal.Width versicolor
86 Sepal.Length Sepal.Width versicolor
87 Sepal.Length Sepal.Width versicolor
88 Sepal.Length Sepal.Width versicolor
89 Sepal.Length Sepal.Width versicolor
90 Sepal.Length Sepal.Width versicolor
91 Sepal.Length Sepal.Width versicolor
92 Sepal.Length Sepal.Width versicolor
93 Sepal.Length Sepal.Width versicolor
94 Sepal.Length Sepal.Width versicolor
95 Sepal.Length Sepal.Width versicolor
96 Sepal.Length Sepal.Width versicolor
97 Sepal.Length Sepal.Width versicolor
98 Sepal.Length Sepal.Width versicolor
99 Sepal.Length Sepal.Width versicolor
100 Sepal.Length Sepal.Width versicolor
101 Sepal.Length Sepal.Width virginica
102 Sepal.Length Sepal.Width virginica
103 Sepal.Length Sepal.Width virginica
104 Sepal.Length Sepal.Width virginica
105 Sepal.Length Sepal.Width virginica
106 Sepal.Length Sepal.Width virginica
107 Sepal.Length Sepal.Width virginica
108 Sepal.Length Sepal.Width virginica
109 Sepal.Length Sepal.Width virginica
110 Sepal.Length Sepal.Width virginica
111 Sepal.Length Sepal.Width virginica
112 Sepal.Length Sepal.Width virginica
113 Sepal.Length Sepal.Width virginica
114 Sepal.Length Sepal.Width virginica
115 Sepal.Length Sepal.Width virginica
116 Sepal.Length Sepal.Width virginica
117 Sepal.Length Sepal.Width virginica
118 Sepal.Length Sepal.Width virginica
119 Sepal.Length Sepal.Width virginica
120 Sepal.Length Sepal.Width virginica
121 Sepal.Length Sepal.Width virginica
122 Sepal.Length Sepal.Width virginica
123 Sepal.Length Sepal.Width virginica
124 Sepal.Length Sepal.Width virginica
125 Sepal.Length Sepal.Width virginica
126 Sepal.Length Sepal.Width virginica
127 Sepal.Length Sepal.Width virginica
128 Sepal.Length Sepal.Width virginica
129 Sepal.Length Sepal.Width virginica
130 Sepal.Length Sepal.Width virginica
131 Sepal.Length Sepal.Width virginica
132 Sepal.Length Sepal.Width virginica
133 Sepal.Length Sepal.Width virginica
134 Sepal.Length Sepal.Width virginica
135 Sepal.Length Sepal.Width virginica
136 Sepal.Length Sepal.Width virginica
137 Sepal.Length Sepal.Width virginica
138 Sepal.Length Sepal.Width virginica
139 Sepal.Length Sepal.Width virginica
140 Sepal.Length Sepal.Width virginica
141 Sepal.Length Sepal.Width virginica
142 Sepal.Length Sepal.Width virginica
143 Sepal.Length Sepal.Width virginica
144 Sepal.Length Sepal.Width virginica
145 Sepal.Length Sepal.Width virginica
146 Sepal.Length Sepal.Width virginica
147 Sepal.Length Sepal.Width virginica
148 Sepal.Length Sepal.Width virginica
149 Sepal.Length Sepal.Width virginica
150 Sepal.Length Sepal.Width virginica
15.17.2 Aside: using R “raw strings” (only for those learning R)
NOTE:
- “IDS1020-Intro to IDS” students can ignore this section.
If you are learning SQL but are not interested in learning more about R right now you can safely skip this section.
# RAW STRINGS in R - i.e. r"( ... )"
#
# You can also use a "raw string" - introduced in R 4.0
# Any string (ie. character value) that appears between r"( .... )"
# is quoted as exactly the value that appears between the r"( and )"
# symbols. For detailed info about R's new raw string feature see this page:
#
# https://r4ds.hadley.nz/strings.html#sec-raw-strings
#
# You can use an R "raw string" to quote anything without
# resorting to backslahes or other techniques.
#
# To create a "raw string" in R place the text that you want to quote
# between r"( and )"
# The r stands for raw, not "R".
# The text being quoted can safely include any characters you like.
# For example, the following "strangeValue" gets displayed just fine.
= r"( Backslash: \ Quote: " Apostrophe: ' )"
strangeValue
cat(strangeValue) # Backslash: \ Quote: " Apostrophe: '
Backslash: \ Quote: " Apostrophe: '
# You can use r"(raw strings)" to very simply quote any SQL command
# that you want to pass to sqldf, regardless of what is in the SQL.
sqldf( r"(select "Sepal.Length", "Sepal.Width", Species from iris)" )
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5.0 3.4 setosa
9 4.4 2.9 setosa
10 4.9 3.1 setosa
11 5.4 3.7 setosa
12 4.8 3.4 setosa
13 4.8 3.0 setosa
14 4.3 3.0 setosa
15 5.8 4.0 setosa
16 5.7 4.4 setosa
17 5.4 3.9 setosa
18 5.1 3.5 setosa
19 5.7 3.8 setosa
20 5.1 3.8 setosa
21 5.4 3.4 setosa
22 5.1 3.7 setosa
23 4.6 3.6 setosa
24 5.1 3.3 setosa
25 4.8 3.4 setosa
26 5.0 3.0 setosa
27 5.0 3.4 setosa
28 5.2 3.5 setosa
29 5.2 3.4 setosa
30 4.7 3.2 setosa
31 4.8 3.1 setosa
32 5.4 3.4 setosa
33 5.2 4.1 setosa
34 5.5 4.2 setosa
35 4.9 3.1 setosa
36 5.0 3.2 setosa
37 5.5 3.5 setosa
38 4.9 3.6 setosa
39 4.4 3.0 setosa
40 5.1 3.4 setosa
41 5.0 3.5 setosa
42 4.5 2.3 setosa
43 4.4 3.2 setosa
44 5.0 3.5 setosa
45 5.1 3.8 setosa
46 4.8 3.0 setosa
47 5.1 3.8 setosa
48 4.6 3.2 setosa
49 5.3 3.7 setosa
50 5.0 3.3 setosa
51 7.0 3.2 versicolor
52 6.4 3.2 versicolor
53 6.9 3.1 versicolor
54 5.5 2.3 versicolor
55 6.5 2.8 versicolor
56 5.7 2.8 versicolor
57 6.3 3.3 versicolor
58 4.9 2.4 versicolor
59 6.6 2.9 versicolor
60 5.2 2.7 versicolor
61 5.0 2.0 versicolor
62 5.9 3.0 versicolor
63 6.0 2.2 versicolor
64 6.1 2.9 versicolor
65 5.6 2.9 versicolor
66 6.7 3.1 versicolor
67 5.6 3.0 versicolor
68 5.8 2.7 versicolor
69 6.2 2.2 versicolor
70 5.6 2.5 versicolor
71 5.9 3.2 versicolor
72 6.1 2.8 versicolor
73 6.3 2.5 versicolor
74 6.1 2.8 versicolor
75 6.4 2.9 versicolor
76 6.6 3.0 versicolor
77 6.8 2.8 versicolor
78 6.7 3.0 versicolor
79 6.0 2.9 versicolor
80 5.7 2.6 versicolor
81 5.5 2.4 versicolor
82 5.5 2.4 versicolor
83 5.8 2.7 versicolor
84 6.0 2.7 versicolor
85 5.4 3.0 versicolor
86 6.0 3.4 versicolor
87 6.7 3.1 versicolor
88 6.3 2.3 versicolor
89 5.6 3.0 versicolor
90 5.5 2.5 versicolor
91 5.5 2.6 versicolor
92 6.1 3.0 versicolor
93 5.8 2.6 versicolor
94 5.0 2.3 versicolor
95 5.6 2.7 versicolor
96 5.7 3.0 versicolor
97 5.7 2.9 versicolor
98 6.2 2.9 versicolor
99 5.1 2.5 versicolor
100 5.7 2.8 versicolor
101 6.3 3.3 virginica
102 5.8 2.7 virginica
103 7.1 3.0 virginica
104 6.3 2.9 virginica
105 6.5 3.0 virginica
106 7.6 3.0 virginica
107 4.9 2.5 virginica
108 7.3 2.9 virginica
109 6.7 2.5 virginica
110 7.2 3.6 virginica
111 6.5 3.2 virginica
112 6.4 2.7 virginica
113 6.8 3.0 virginica
114 5.7 2.5 virginica
115 5.8 2.8 virginica
116 6.4 3.2 virginica
117 6.5 3.0 virginica
118 7.7 3.8 virginica
119 7.7 2.6 virginica
120 6.0 2.2 virginica
121 6.9 3.2 virginica
122 5.6 2.8 virginica
123 7.7 2.8 virginica
124 6.3 2.7 virginica
125 6.7 3.3 virginica
126 7.2 3.2 virginica
127 6.2 2.8 virginica
128 6.1 3.0 virginica
129 6.4 2.8 virginica
130 7.2 3.0 virginica
131 7.4 2.8 virginica
132 7.9 3.8 virginica
133 6.4 2.8 virginica
134 6.3 2.8 virginica
135 6.1 2.6 virginica
136 7.7 3.0 virginica
137 6.3 3.4 virginica
138 6.4 3.1 virginica
139 6.0 3.0 virginica
140 6.9 3.1 virginica
141 6.7 3.1 virginica
142 6.9 3.1 virginica
143 5.8 2.7 virginica
144 6.8 3.2 virginica
145 6.7 3.3 virginica
146 6.7 3.0 virginica
147 6.3 2.5 virginica
148 6.5 3.0 virginica
149 6.2 3.4 virginica
150 5.9 3.0 virginica
# NOTE - R's new "raw string" syntax is very useful for
# writing regular expression patterns in R
# For example:
#
# without raw strings - you NEED \\double-backslashes
= "\\d+\\.\\d+"
pattern cat(pattern) # \d+\.\d+
\d+\.\d+
# with raw strings - you DON'T need the \\double-backslashes
= r"(\d+\.\d)"
pattern cat(pattern) # \d+\.\d+
\d+\.\d
= c("one", "1.593", "278.123", "999")
charNums grep(pattern, charNums, value=TRUE)
[1] "1.593" "278.123"
15.18 where clause
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# where clause ####
#
# The where clause allows you to specify the rows that you want.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The where clause include a logical expression. The expression is evaluated
# separately for each row in the table (i.e. dataframe in our case).
# The data from the row is substituted into the logical expression. If the
# logical expression results in TRUE then information from the row becomes
# encorporated into the output. If the logical expression results in FALSE
# for the row then the row does not appear in the output.
# Only the rows that have a TRUE result are displayed
sqldf("select student, test1, test2
from grades
where test1 > test2")
student test1 test2
1 frank 100 95
2 samantha 100 70
3 margaret 59 47
4 mike 90 86
5 david 90 87
6 dana 100 93
# You can get all of the columns by using the * instead of the column names
sqldf("select *
from grades
where test1 > test2")
id student year honors test1 test2 test3 major
1 s0400 frank so TRUE 100 95 91 ACC
2 s0600 samantha so TRUE 100 70 97 IDS
3 s1000 margaret se FALSE 59 47 NA IDS
4 s1300 mike ju TRUE 90 86 79 IDS
5 s1400 david se TRUE 90 87 87 IDS
6 s1500 dana so FALSE 100 93 91 FIN
sqldf("select student, test1, test2
from grades
where test1 >= test2")
student test1 test2
1 sue 80 80
2 frank 100 95
3 samantha 100 70
4 margaret 59 47
5 mike 90 86
6 david 90 87
7 dana 100 93
sqldf("select student, honors, test1, test2
from grades
where honors == TRUE and test1 < 90 and test2 < 90")
[1] student honors test1 test2
<0 rows> (or 0-length row.names)
sqldf("select student, honors, test1, test2
from grades
where honors != TRUE and test1 > 90 and test2 > 90")
student honors test1 test2
1 dana FALSE 100 93
# NOTE:
#
# When using the relational operators > < >= <=
# with character data, values that would appear earlier in a dictionary
# are considered "less than" values that would appear later in the dictionary.
15.19 SQL operators
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SQL operators
#
# Different flavors of SQL sometimes use different operators.
# Look up the documentation for your particular flavor of SQL.
# See the SQLite operators here
# https://www.tutorialspoint.com/sqlite/sqlite_operators.htm
# They are summarized below.
#
# R operator sqlite operator
# ----------- ---------------
#
# ARITHMETIC OPERATORS
# addition + same as R
# subtraction - same as R
# multiplication * same as R
# regular division / / depends on the type of data
# integer division %/% / depends on the type of data
# modulus (remainder) %% %
#
# LOGICAL OPERATORS
# equal to == = or == (they both work)
# not equal to != != or <> (they both work)
# greater than > same as R
# less than < same as R
# greater or equal >= same as R
# less or equal <= same as R
#
# logical not ! not
# logical and && and
# logical or || or
#
# CHARACTER OPERATORS
# concatenation paste0(a,b) a || b
#
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#
# NOTE - the following operators exist in sqlite but mean different
# things than what they mean in R.
# We will probably not cover the details of these operators
# (unless we have extra time) but you should be aware that
# they do NOT do the same thing as in R.
#
# R operator sqlite operator
# ----------- ---------------
# BITWISE OPERATORS - we will probably not cover these unless we have extra time.
#
# bitwise "and" not avaialble &
# bitwise "or" not avaialble |
# bitwise "not" not avaialble ~
# left shift bits not avaialble <<
# right shift bits not avaialble >>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.20 Performing calculations with SQL
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Using SQL to perform calculations ####
#
# In the sqlite flavor of SQL you can perform calculations by simply
# using SELECT without a FROM clause.
#
# Some other versions of SQL (e.g. Oracle) require a FROM clause in every select.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# RESULTS of following query
#
# 3+2 23/5 23.0/5
# 5 4 4.6
sqldf("select 3+2, 23/5, 23.0 / 5") # 5 4 4.6 (headings show calculations)
3+2 23/5 23.0 / 5
1 5 4 4.6
# Same result but headings show Sum, IntegerDivision, RealDivision
#
# Sum IntegerDivision RealDivision
# 5 4 4.6
sqldf("select 3+2 as Sum,
23/5 as IntegerDivision,
23.0 / 5 as RealDivision") # 5 4 4.6 (same results with differnet headings)
Sum IntegerDivision RealDivision
1 5 4 4.6
# You don't actullay need the word "as" but it makes the code more readable.
# The following will produce the same results as above.
sqldf("select 3+2 Sum,
23/5 IntegerDivision,
23.0 / 5 RealDivision") # 5 4 4.6 (same results with differnet headings)
Sum IntegerDivision RealDivision
1 5 4 4.6
# The column heading must only use legal column heading characters.
# If you want to add non-legal column-heading characters, you can put the
# column heading in "double quotes".
# The following adds spaces to some of the column headings.
sqldf('select 3+2 "Sum",
23/5 "Integer Division",
23.0 / 5 "Real Division"') # 5 4 4.6 (same results with differnet headings)
Sum Integer Division Real Division
1 5 4 4.6
# If you perform calculations that do not refer to the columns in a table,
# you will get back the result of the calculation for every row of the table.
#
# The following will return multiple rows of the answers, one row for each
# of the grades table.
#
# You generally don't want to do this, which is why we left off the FROM
# clause in our earlier examples.
sqldf('select 3+2 "Sum",
23/5 "Integer Division",
23.0 / 5 "Real Division"
from grades') # you probably dont want the from clause here
Sum Integer Division Real Division
1 5 4 4.6
2 5 4 4.6
3 5 4 4.6
4 5 4 4.6
5 5 4 4.6
6 5 4 4.6
7 5 4 4.6
8 5 4 4.6
9 5 4 4.6
10 5 4 4.6
11 5 4 4.6
12 5 4 4.6
13 5 4 4.6
14 5 4 4.6
15 5 4 4.6
sqldf("select 3 as curve, student, test1, test1 + 3 as curved_test1 from grades")
curve student test1 curved_test1
1 3 joe 70 73
2 3 sue 80 83
3 3 anne 90 93
4 3 frank 100 103
5 3 bob 30 33
6 3 samantha 100 103
7 3 larry 32 35
8 3 bill 84 87
9 3 lucricia 80 83
10 3 margaret 59 62
11 3 dalia 85 88
12 3 zvi 90 93
13 3 mike 90 93
14 3 david 90 93
15 3 dana 100 103
15.21 Reminder - how NA works in R
NOTE:
- “IDS1020-Intro to IDS” students can ignore this section.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Remember how NA works in R ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ANY expression that contains NA results in NA
= 100
franksSalary = 200
tinasSalary = NA
nancysSalary = NA
nedsSalary
# all of the following result in NA
> nancysSalary # NA franksSalary
[1] NA
== nedsSalary # NA nancysSalary
[1] NA
+ nancysSalary # NA franksSalary
[1] NA
# The only way to test for NA in R is with the is.na function
= c(100,200,NA,NA)
salarys == NA # NA NA NA NA salarys
[1] NA NA NA NA
is.na(salarys) # FALSE FALSE TRUE TRUE
[1] FALSE FALSE TRUE TRUE
# In R, you can check for non-na values using not operator ( ! )
!is.na(salarys) # TRUE TRUE FALSE FALSE
[1] TRUE TRUE FALSE FALSE
#..................................................................
# Remember in R that when you perform any operation with NA
# the result is NA. For example:
#
# > NA + 10
# [1] NA
#
# > NA == NA
# [1] NA
#
# > NA != NA
# [1] NA
#
# SQL has the same approach to NULL values. Any operation with NULL
# results in NULL.
#..................................................................
15.22 NULL in SQL
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# NULL values in SQL ####
#
# In relational database tables, the value NULL is used to record
# a value that is "not known". For example a salary value of NULL
# is treated as a salary that hasn't been recorded in the database yet.
# This is very different from a salary of 0 which would be the case if
# someone is volunteering or donating their time.
#
# Note that when using sqldf to run SQL commands in R
# the data is stored in an R dataframe. Therefore these types of values
# appear in the dataframe as NA but the SQL statements use the term NULL.
# In an actual relational database these values would also appear in the
# table as NULL (not as NA)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#...............................................................
# Check for values that are NULL in the following way:
#
# WHERE COLUMN_NAME IS NULL
# (see examples below)
#
# NOTE: DO NOT USE "== NULL". The == operator will NOT work with NULL
#...............................................................
# show all rows for which the test3 grade is NULL
#
# NOTE: Remember that R does not understand NULL, R uses NA.
# Remember that SQL understands NULL and does not understand NA.
# Since sqldf uses R dataframes, the sqldf function automatically
# translates NULL into NA and vice versa when using SQL code to
# access data in R dataframes.
sqldf("select *
from grades
where test3 is NULL")
id student year honors test1 test2 test3 major
1 s0100 joe fr FALSE 70 82 NA IDS
2 s0500 bob so FALSE 30 NA NA IDS
3 s0700 larry ju FALSE 32 80 NA FIN
4 s1000 margaret se FALSE 59 47 NA IDS
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!
sqldf("select *
from grades
where test3 = NULL") # use IS NULL!!!
[1] id student year honors test1 test2 test3 major
<0 rows> (or 0-length row.names)
#...............................................................
# Check for values that aren't NULL in the following way:
#
# WHERE COLUMN_NAME IS NOT NULL
# (see examples below)
#
# NOTE: DO NOT USE "!= NULL". The != operator will NOT work with NULL
#...............................................................
# show all rows for which the test3 grade is NOT NULL (i.e. isn't NA)
sqldf("select *
from grades
where test3 is NOT NULL")
id student year honors test1 test2 test3 major
1 s0200 sue fr FALSE 80 80 79 IDS
2 s0300 anne fr TRUE 90 95 92 IDS
3 s0400 frank so TRUE 100 95 91 ACC
4 s0600 samantha so TRUE 100 70 97 IDS
5 s0800 bill se FALSE 84 90 92 ACC
6 s0900 lucricia fr TRUE 80 100 88 IDS
7 s1100 dalia se FALSE 85 92 93 FIN
8 s1200 zvi ju TRUE 90 98 92 ACC
9 s1300 mike ju TRUE 90 86 79 IDS
10 s1400 david se TRUE 90 87 87 IDS
11 s1500 dana so FALSE 100 93 91 FIN
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!
sqldf("select *
from grades
where test3 != NULL") # use IS NOT NULL !!!
[1] id student year honors test1 test2 test3 major
<0 rows> (or 0-length row.names)
15.23 Practice
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# QUESTION
#
# curve all test3 grades by 10 points (NULLs remain NULL)
#
# (remember that since we are using R, the NULL values
# appear as NA)
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
sqldf("select student, test3, test3 + 10 as curved_test3
from grades
order by student")
student test3 curved_test3
1 anne 92 102
2 bill 92 102
3 bob NA NA
4 dalia 93 103
5 dana 91 101
6 david 87 97
7 frank 91 101
8 joe NA NA
9 larry NA NA
10 lucricia 88 98
11 margaret NA NA
12 mike 79 89
13 samantha 97 107
14 sue 79 89
15 zvi 92 102
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# QUESTION
#
# Show rows where both test2 and test3 are NULL
# REMEMBER when looking for NULL values:
#
# USE: value IS NULL # correct
# DON'T use: value = NULL # wrong
#
# USE: value IS NOT NULL # correct
# DON'T use: value != NULL # wrong
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
sqldf("select student, test1, test2, test3
from grades
where test2 IS NULL and test3 IS NULL
order by student")
student test1 test2 test3
1 bob 30 NA NA
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!
sqldf("select student, test1, test2, test3
from grades
where test2 == NULL and test3 == NULL
order by student")
[1] student test1 test2 test3
<0 rows> (or 0-length row.names)
15.24 Tutorials sqlbolt.com and w3chools.com
Please see the tutorials at the following locations.
- https://sqlbolt.com/ These pages cover basic SELECT statements. They refer to the following tables
- movies
- north_american_cities
- https://www.w3schools.com/sql/sql_join.asp This page teaches how to work with more than one table. They refer to the following tables
- customers
- orders
The file <sqlbolt_tables-v007.RData> contains the dataframes with the information for each table that are used on these websites. These dataframes can be loaded into R from this file with the following command.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# sqlbolt.com and https://www.w3schools.com/sql/sql_join.asp ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
load("sqlbolt_tables-v007.RData")
Below are the contents of these tables:
sqldf("select * from movies")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
sqldf("select * from orders")
OrderID CustomerID EmployeeID OrderDate ShipperID
1 10248 90 5 7/4/1996 3
2 10249 81 6 7/5/1996 1
3 10250 34 4 7/8/1996 2
4 10251 84 3 7/8/1996 1
5 10252 76 4 7/9/1996 2
6 10253 34 3 7/10/1996 2
7 10254 14 5 7/11/1996 2
8 10255 68 9 7/12/1996 3
9 10256 88 3 7/15/1996 2
10 10257 35 4 7/16/1996 3
11 10258 20 1 7/17/1996 1
12 10259 13 4 7/18/1996 3
13 10260 55 4 7/19/1996 1
14 10261 61 4 7/19/1996 2
15 10262 65 8 7/22/1996 3
16 10263 20 9 7/23/1996 3
17 10264 24 6 7/24/1996 3
18 10265 7 2 7/25/1996 1
19 10266 87 3 7/26/1996 3
20 10267 25 4 7/29/1996 1
21 10268 33 8 7/30/1996 3
22 10269 89 5 7/31/1996 1
23 10270 87 1 8/1/1996 1
24 10271 75 6 8/1/1996 2
25 10272 65 6 8/2/1996 2
26 10273 63 3 8/5/1996 3
27 10274 85 6 8/6/1996 1
28 10275 49 1 8/7/1996 1
29 10276 80 8 8/8/1996 3
30 10277 52 2 8/9/1996 3
31 10278 5 8 8/12/1996 2
32 10279 44 8 8/13/1996 2
33 10280 5 2 8/14/1996 1
34 10281 69 4 8/14/1996 1
35 10282 69 4 8/15/1996 1
36 10283 46 3 8/16/1996 3
37 10284 44 4 8/19/1996 1
38 10285 63 1 8/20/1996 2
39 10286 63 8 8/21/1996 3
40 10287 67 8 8/22/1996 3
41 10288 66 4 8/23/1996 1
42 10289 11 7 8/26/1996 3
43 10290 15 8 8/27/1996 1
44 10291 61 6 8/27/1996 2
45 10292 81 1 8/28/1996 2
46 10293 80 1 8/29/1996 3
47 10294 65 4 8/30/1996 2
48 10295 85 2 9/2/1996 2
49 10296 46 6 9/3/1996 1
50 10297 7 5 9/4/1996 2
51 10298 37 6 9/5/1996 2
52 10299 67 4 9/6/1996 2
53 10300 49 2 9/9/1996 2
54 10301 86 8 9/9/1996 2
55 10302 76 4 9/10/1996 2
56 10303 30 7 9/11/1996 2
57 10304 80 1 9/12/1996 2
58 10305 55 8 9/13/1996 3
59 10306 69 1 9/16/1996 3
60 10307 48 2 9/17/1996 2
61 10308 2 7 9/18/1996 3
62 10309 37 3 9/19/1996 1
63 10310 77 8 9/20/1996 2
64 10311 18 1 9/20/1996 3
65 10312 86 2 9/23/1996 2
66 10313 63 2 9/24/1996 2
67 10314 65 1 9/25/1996 2
68 10315 38 4 9/26/1996 2
69 10316 65 1 9/27/1996 3
70 10317 48 6 9/30/1996 1
71 10318 38 8 10/1/1996 2
72 10319 80 7 10/2/1996 3
73 10320 87 5 10/3/1996 3
74 10321 38 3 10/3/1996 2
75 10322 58 7 10/4/1996 3
76 10323 39 4 10/7/1996 1
77 10324 71 9 10/8/1996 1
78 10325 39 1 10/9/1996 3
79 10326 8 4 10/10/1996 2
80 10327 24 2 10/11/1996 1
81 10328 28 4 10/14/1996 3
82 10329 75 4 10/15/1996 2
83 10330 46 3 10/16/1996 1
84 10331 9 9 10/16/1996 1
85 10332 51 3 10/17/1996 2
86 10333 87 5 10/18/1996 3
87 10334 84 8 10/21/1996 2
88 10335 37 7 10/22/1996 2
89 10336 60 7 10/23/1996 2
90 10337 25 4 10/24/1996 3
91 10338 55 4 10/25/1996 3
92 10339 51 2 10/28/1996 2
93 10340 9 1 10/29/1996 3
94 10341 73 7 10/29/1996 3
95 10342 25 4 10/30/1996 2
96 10343 44 4 10/31/1996 1
97 10344 89 4 11/1/1996 2
98 10345 63 2 11/4/1996 2
99 10346 65 3 11/5/1996 3
100 10347 21 4 11/6/1996 3
101 10348 86 4 11/7/1996 2
102 10349 75 7 11/8/1996 1
103 10350 41 6 11/11/1996 2
104 10351 20 1 11/11/1996 1
105 10352 28 3 11/12/1996 3
106 10353 59 7 11/13/1996 3
107 10354 58 8 11/14/1996 3
108 10355 4 6 11/15/1996 1
109 10356 86 6 11/18/1996 2
110 10357 46 1 11/19/1996 3
111 10358 41 5 11/20/1996 1
112 10359 72 5 11/21/1996 3
113 10360 7 4 11/22/1996 3
114 10361 63 1 11/22/1996 2
115 10362 9 3 11/25/1996 1
116 10363 17 4 11/26/1996 3
117 10364 19 1 11/26/1996 1
118 10365 3 3 11/27/1996 2
119 10366 29 8 11/28/1996 2
120 10367 83 7 11/28/1996 3
121 10368 20 2 11/29/1996 2
122 10369 75 8 12/2/1996 2
123 10370 14 6 12/3/1996 2
124 10371 41 1 12/3/1996 1
125 10372 62 5 12/4/1996 2
126 10373 37 4 12/5/1996 3
127 10374 91 1 12/5/1996 3
128 10375 36 3 12/6/1996 2
129 10376 51 1 12/9/1996 2
130 10377 72 1 12/9/1996 3
131 10378 24 5 12/10/1996 3
132 10379 61 2 12/11/1996 1
133 10380 37 8 12/12/1996 3
134 10381 46 3 12/12/1996 3
135 10382 20 4 12/13/1996 1
136 10383 4 8 12/16/1996 3
137 10384 5 3 12/16/1996 3
138 10385 75 1 12/17/1996 2
139 10386 21 9 12/18/1996 3
140 10387 70 1 12/18/1996 2
141 10388 72 2 12/19/1996 1
142 10389 10 4 12/20/1996 2
143 10390 20 6 12/23/1996 1
144 10391 17 3 12/23/1996 3
145 10392 59 2 12/24/1996 3
146 10393 71 1 12/25/1996 3
147 10394 36 1 12/25/1996 3
148 10395 35 6 12/26/1996 1
149 10396 25 1 12/27/1996 3
150 10397 60 5 12/27/1996 1
151 10398 71 2 12/30/1996 3
152 10399 83 8 12/31/1996 3
153 10400 19 1 1/1/1997 3
154 10401 65 1 1/1/1997 1
155 10402 20 8 1/2/1997 2
156 10403 20 4 1/3/1997 3
157 10404 49 2 1/3/1997 1
158 10405 47 1 1/6/1997 1
159 10406 62 7 1/7/1997 1
160 10407 56 2 1/7/1997 2
161 10408 23 8 1/8/1997 1
162 10409 54 3 1/9/1997 1
163 10410 10 3 1/10/1997 3
164 10411 10 9 1/10/1997 3
165 10412 87 8 1/13/1997 2
166 10413 41 3 1/14/1997 2
167 10414 21 2 1/14/1997 3
168 10415 36 3 1/15/1997 1
169 10416 87 8 1/16/1997 3
170 10417 73 4 1/16/1997 3
171 10418 63 4 1/17/1997 1
172 10419 68 4 1/20/1997 2
173 10420 88 3 1/21/1997 1
174 10421 61 8 1/21/1997 1
175 10422 27 2 1/22/1997 1
176 10423 31 6 1/23/1997 3
177 10424 51 7 1/23/1997 2
178 10425 41 6 1/24/1997 2
179 10426 29 4 1/27/1997 1
180 10427 59 4 1/27/1997 2
181 10428 66 7 1/28/1997 1
182 10429 37 3 1/29/1997 2
183 10430 20 4 1/30/1997 1
184 10431 10 4 1/30/1997 2
185 10432 75 3 1/31/1997 2
186 10433 60 3 2/3/1997 3
187 10434 24 3 2/3/1997 2
188 10435 16 8 2/4/1997 2
189 10436 7 3 2/5/1997 2
190 10437 87 8 2/5/1997 1
191 10438 79 3 2/6/1997 2
192 10439 51 6 2/7/1997 3
193 10440 71 4 2/10/1997 2
194 10441 55 3 2/10/1997 2
195 10442 20 3 2/11/1997 2
196 10443 66 8 2/12/1997 1
sqldf("select * from customers")
CustomerID CustomerName ContactName Address City PostalCode Country
1 1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
2 2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constituci\xf3n 2222 M\xe9xico D.F. 5021 Mexico
3 3 Antonio Moreno Taquer\xeda Antonio Moreno Mataderos 2312 M\xe9xico D.F. 5023 Mexico
4 4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 5 Berglunds snabbk\xf6p Christina Berglund Berguvsv\xe4gen 8 Lule\xe5 S-958 22 Sweden
6 6 Blauer See Delikatessen Hanna Moos Forsterstr. 57 Mannheim 68306 Germany
7 7 Blondel p\xe8re et fils Fr\xe9d\xe9rique Citeaux 24, place Kl\xe9ber Strasbourg 67000 France
8 8 B\xf3lido Comidas preparadas Mart\xedn Sommer C/ Araquil, 67 Madrid 28023 Spain
9 9 Bon app' Laurence Lebihans 12, rue des Bouchers Marseille 13008 France
10 10 Bottom-Dollar Marketse Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen T2F 8M4 Canada
11 11 B's Beverages Victoria Ashworth Fauntleroy Circus London EC2 5NT UK
12 12 Cactus Comidas para llevar Patricio Simpson Cerrito 333 Buenos Aires 1010 Argentina
13 13 Centro comercial Moctezuma Francisco Chang Sierras de Granada 9993 M\xe9xico D.F. 5022 Mexico
14 14 Chop-suey Chinese Yang Wang Hauptstr. 29 Bern 3012 Switzerland
15 15 Com\xe9rcio Mineiro Pedro Afonso Av. dos Lus\xedadas, 23 S\xe3o Paulo 05432-043 Brazil
16 16 Consolidated Holdings Elizabeth Brown Berkeley Gardens 12 Brewery London WX1 6LT UK
17 17 Drachenblut Delikatessend Sven Ottlieb Walserweg 21 Aachen 52066 Germany
18 18 Du monde entier Janine Labrune 67, rue des Cinquante Otages Nantes 44000 France
19 19 Eastern Connection Ann Devon 35 King George London WX3 6FW UK
20 20 Ernst Handel Roland Mendel Kirchgasse 6 Graz 8010 Austria
21 21 Familia Arquibaldo Aria Cruz Rua Or\xf3s, 92 S\xe3o Paulo 05442-030 Brazil
22 22 FISSA Fabrica Inter. Salchichas S.A. Diego Roel C/ Moralzarzal, 86 Madrid 28034 Spain
23 23 Folies gourmandes Martine Ranc\xe9 184, chauss\xe9e de Tournai Lille 59000 France
24 24 Folk och f\xe4 HB Maria Larsson \xc5kergatan 24 Br\xe4cke S-844 67 Sweden
25 25 Frankenversand Peter Franken Berliner Platz 43 M\xfcnchen 80805 Germany
26 26 France restauration Carine Schmitt 54, rue Royale Nantes 44000 France
27 27 Franchi S.p.A. Paolo Accorti Via Monte Bianco 34 Torino 10100 Italy
28 28 Furia Bacalhau e Frutos do Mar Lino Rodriguez Jardim das rosas n. 32 Lisboa 1675 Portugal
29 29 Galer\xeda del gastr\xf3nomo Eduardo Saavedra Rambla de Catalu\xf1a, 23 Barcelona 8022 Spain
30 30 Godos Cocina T\xedpica Jos\xe9 Pedro Freyre C/ Romero, 33 Sevilla 41101 Spain
31 31 Gourmet Lanchonetes Andr\xe9 Fonseca Av. Brasil, 442 Campinas 04876-786 Brazil
32 32 Great Lakes Food Market Howard Snyder 2732 Baker Blvd. Eugene 97403 USA
33 33 GROSELLA-Restaurante Manuel Pereira 5\xaa Ave. Los Palos Grandes Caracas 1081 Venezuela
34 34 Hanari Carnes Mario Pontes Rua do Pa\xe7o, 67 Rio de Janeiro 05454-876 Brazil
35 35 HILARI\xd3N-Abastos Carlos Hern\xe1ndez Carrera 22 con Ave. Carlos Soublette #8-35 San Crist\xf3bal 5022 Venezuela
36 36 Hungry Coyote Import Store Yoshi Latimer City Center Plaza 516 Main St. Elgin 97827 USA
37 37 Hungry Owl All-Night Grocers Patricia McKenna 8 Johnstown Road Cork Ireland
38 38 Island Trading Helen Bennett Garden House Crowther Way Cowes PO31 7PJ UK
39 39 K\xf6niglich Essen Philip Cramer Maubelstr. 90 Brandenburg 14776 Germany
40 40 La corne d'abondance Daniel Tonini 67, avenue de l'Europe Versailles 78000 France
41 41 La maison d'Asie Annette Roulet 1 rue Alsace-Lorraine Toulouse 31000 France
42 42 Laughing Bacchus Wine Cellars Yoshi Tannamuri 1900 Oak St. Vancouver V3F 2K1 Canada
43 43 Lazy K Kountry Store John Steel 12 Orchestra Terrace Walla Walla 99362 USA
44 44 Lehmanns Marktstand Renate Messner Magazinweg 7 Frankfurt a.M. 60528 Germany
45 45 Let's Stop N Shop Jaime Yorres 87 Polk St. Suite 5 San Francisco 94117 USA
46 46 LILA-Supermercado Carlos Gonz\xe1lez Carrera 52 con Ave. Bol\xedvar #65-98 Llano Largo Barquisimeto 3508 Venezuela
47 47 LINO-Delicateses Felipe Izquierdo Ave. 5 de Mayo Porlamar I. de Margarita 4980 Venezuela
48 48 Lonesome Pine Restaurant Fran Wilson 89 Chiaroscuro Rd. Portland 97219 USA
49 49 Magazzini Alimentari Riuniti Giovanni Rovelli Via Ludovico il Moro 22 Bergamo 24100 Italy
50 50 Maison Dewey Catherine Dewey Rue Joseph-Bens 532 Bruxelles B-1180 Belgium
51 51 M\xe8re Paillarde Jean Fresni\xe8re 43 rue St. Laurent Montr\xe9al H1J 1C3 Canada
52 52 Morgenstern Gesundkost Alexander Feuer Heerstr. 22 Leipzig 4179 Germany
53 53 North/South Simon Crowther South House 300 Queensbridge London SW7 1RZ UK
54 54 Oc\xe9ano Atl\xe1ntico Ltda. Yvonne Moncada Ing. Gustavo Moncada 8585 Piso 20-A Buenos Aires 1010 Argentina
55 55 Old World Delicatessen Rene Phillips 2743 Bering St. Anchorage 99508 USA
56 56 Ottilies K\xe4seladen Henriette Pfalzheim Mehrheimerstr. 369 K\xf6ln 50739 Germany
57 57 Paris sp\xe9cialit\xe9s Marie Bertrand 265, boulevard Charonne Paris 75012 France
58 58 Pericles Comidas cl\xe1sicas Guillermo Fern\xe1ndez Calle Dr. Jorge Cash 321 M\xe9xico D.F. 5033 Mexico
59 59 Piccolo und mehr Georg Pipps Geislweg 14 Salzburg 5020 Austria
60 60 Princesa Isabel Vinhoss Isabel de Castro Estrada da sa\xfade n. 58 Lisboa 1756 Portugal
61 61 Que Del\xedcia Bernardo Batista Rua da Panificadora, 12 Rio de Janeiro 02389-673 Brazil
62 62 Queen Cozinha L\xfacia Carvalho Alameda dos Can\xe0rios, 891 S\xe3o Paulo 05487-020 Brazil
63 63 QUICK-Stop Horst Kloss Taucherstra\xdfe 10 Cunewalde 1307 Germany
64 64 Rancho grande Sergio Guti\xe9rrez Av. del Libertador 900 Buenos Aires 1010 Argentina
65 65 Rattlesnake Canyon Grocery Paula Wilson 2817 Milton Dr. Albuquerque 87110 USA
66 66 Reggiani Caseifici Maurizio Moroni Strada Provinciale 124 Reggio Emilia 42100 Italy
67 67 Ricardo Adocicados Janete Limeira Av. Copacabana, 267 Rio de Janeiro 02389-890 Brazil
68 68 Richter Supermarkt Michael Holz Grenzacherweg 237 Gen\xe8ve 1203 Switzerland
69 69 Romero y tomillo Alejandra Camino Gran V\xeda, 1 Madrid 28001 Spain
70 70 Sant\xe9 Gourmet Jonas Bergulfsen Erling Skakkes gate 78 Stavern 4110 Norway
71 71 Save-a-lot Markets Jose Pavarotti 187 Suffolk Ln. Boise 83720 USA
72 72 Seven Seas Imports Hari Kumar 90 Wadhurst Rd. London OX15 4NB UK
73 73 Simons bistro Jytte Petersen Vinb\xe6ltet 34 K\xf8benhavn 1734 Denmark
74 74 Sp\xe9cialit\xe9s du monde Dominique Perrier 25, rue Lauriston Paris 75016 France
75 75 Split Rail Beer & Ale Art Braunschweiger P.O. Box 555 Lander 82520 USA
76 76 Supr\xeames d\xe9lices Pascale Cartrain Boulevard Tirou, 255 Charleroi B-6000 Belgium
77 77 The Big Cheese Liz Nixon 89 Jefferson Way Suite 2 Portland 97201 USA
78 78 The Cracker Box Liu Wong 55 Grizzly Peak Rd. Butte 59801 USA
79 79 Toms Spezialit\xe4ten Karin Josephs Luisenstr. 48 M\xfcnster 44087 Germany
80 80 Tortuga Restaurante Miguel Angel Paolino Avda. Azteca 123 M\xe9xico D.F. 5033 Mexico
81 81 Tradi\xe7\xe3o Hipermercados Anabela Domingues Av. In\xeas de Castro, 414 S\xe3o Paulo 05634-030 Brazil
82 82 Trail's Head Gourmet Provisioners Helvetius Nagy 722 DaVinci Blvd. Kirkland 98034 USA
83 83 Vaffeljernet Palle Ibsen Smagsl\xf8get 45 \xc5rhus 8200 Denmark
84 84 Victuailles en stock Mary Saveley 2, rue du Commerce Lyon 69004 France
85 85 Vins et alcools Chevalier Paul Henriot 59 rue de l'Abbaye Reims 51100 France
86 86 Die Wandernde Kuh Rita M\xfcller Adenauerallee 900 Stuttgart 70563 Germany
87 87 Wartian Herkku Pirkko Koskitalo Torikatu 38 Oulu 90110 Finland
88 88 Wellington Importadora Paula Parente Rua do Mercado, 12 Resende 08737-363 Brazil
89 89 White Clover Markets Karl Jablonski 305 - 14th Ave. S. Suite 3B Seattle 98128 USA
90 90 Wilman Kala Matti Karttunen Keskuskatu 45 Helsinki 21240 Finland
91 91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland
sqldf("select * from north_american_cities")
city country population latitude longitude
1 Guadalajara Mexico 1500800 20.65970 -103.34961
2 Toronto Canada 2795060 43.65323 -79.38318
3 Houston United States 2195914 29.76043 -95.36980
4 New York United States 8405837 40.71278 -74.00594
5 Philadelphia United States 1553165 39.95258 -75.16522
6 Havana Cuba 2106146 23.05407 -82.34519
7 Mexico City Mexico 8555500 19.43261 -99.13321
8 Phoenix United States 1513367 33.44838 -112.07404
9 Los Angeles United States 3884307 34.05223 -118.24368
10 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
11 Montreal Canada 1717767 45.50169 -73.56726
12 Chicago United States 2718782 41.87811 -87.62980
15.24.1 Practice - select clause
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# https://sqlbolt.com/lesson/select_queries_introduction
#
# Concepts:
#
# Select query for a specific columns
#
# SELECT column, another_column, .
# FROM mytable;
#
#
# Select query for all columns
#
# SELECT *
# FROM mytable;
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#https://sqlbolt.com/lesson/select_queries_introduction
movies
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
###########################
#Exercise 1 - Tasks
###########################
#Find the title of each film
=
sql "select title
from movies"
sql
[1] "select title\n from movies"
sqldf(sql)
title
1 Toy Story
2 A Bug's Life
3 Toy Story 2
4 Monsters, Inc.
5 Finding Nemo
6 The Incredibles
7 Cars
8 Ratatouille
9 WALL-E
10 Up
11 Toy Story 3
12 Cars 2
13 Brave
14 Monsters University
15 WALL-G
#Find the director of each film
=
sql "select director
from movies"
sqldf(sql)
director
1 John Lasseter
2 John Lasseter
3 John Lasseter
4 Pete Docter
5 Andrew Stanton
6 Brad Bird
7 John Lasseter
8 Brad Bird
9 Andrew Stanton
10 Pete Docter
11 Lee Unkrich
12 John Lasseter
13 Brenda Chapman
14 Dan Scanlon
15 Brenda Chapman
#Find the title and director of each film
=
sql "select title, director
from movies"
sqldf(sql)
title director
1 Toy Story John Lasseter
2 A Bug's Life John Lasseter
3 Toy Story 2 John Lasseter
4 Monsters, Inc. Pete Docter
5 Finding Nemo Andrew Stanton
6 The Incredibles Brad Bird
7 Cars John Lasseter
8 Ratatouille Brad Bird
9 WALL-E Andrew Stanton
10 Up Pete Docter
11 Toy Story 3 Lee Unkrich
12 Cars 2 John Lasseter
13 Brave Brenda Chapman
14 Monsters University Dan Scanlon
15 WALL-G Brenda Chapman
#Find the title and year of each film
=
sql "select title, year
from movies"
sqldf(sql)
title year
1 Toy Story 1995
2 A Bug's Life 1998
3 Toy Story 2 1999
4 Monsters, Inc. 2001
5 Finding Nemo 2003
6 The Incredibles 2004
7 Cars 2006
8 Ratatouille 2007
9 WALL-E 2008
10 Up 2009
11 Toy Story 3 2010
12 Cars 2 2011
13 Brave 2012
14 Monsters University 2013
15 WALL-G 2042
#Find all the information about each film
sqldf("select *
from movies")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
# The following produces the exact same results as above
sqldf("select id, title, director, year, length_minutes
from movies")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
Practice - ADDITIONAL QUESTIONS (not on sqlbolt.com)
# QUESTION
# Show the movie name, length_minutes and a third column that you
# calculate which is the length in hours
# ANSWER
#
# when you divide an integer by an integer the result in sql in an integer
# if you divide a floating point number by an integer or vice versa the
# result is floating point (i.e. a number that has values after the decimal point)
# This is a good start but the length_hours column will always show up
# as a whole number - this is NOT what we wanted.
= "select title, length_minutes, length_minutes/60 as length_hours
sql from movies"
sqldf(sql)
title length_minutes length_hours
1 Toy Story 81 1
2 A Bug's Life 95 1
3 Toy Story 2 93 1
4 Monsters, Inc. 92 1
5 Finding Nemo 107 1
6 The Incredibles 116 1
7 Cars 117 1
8 Ratatouille 115 1
9 WALL-E 104 1
10 Up 101 1
11 Toy Story 3 103 1
12 Cars 2 120 2
13 Brave 102 1
14 Monsters University 110 1
15 WALL-G 92 1
# Change 60 to 60.0 to produce a "floating point" value (i.e. a value
# that contains numbers after the decimal point)
= "select title, length_minutes, length_minutes/60.0 as length_hours
sql from movies"
sqldf(sql)
title length_minutes length_hours
1 Toy Story 81 1.350000
2 A Bug's Life 95 1.583333
3 Toy Story 2 93 1.550000
4 Monsters, Inc. 92 1.533333
5 Finding Nemo 107 1.783333
6 The Incredibles 116 1.933333
7 Cars 117 1.950000
8 Ratatouille 115 1.916667
9 WALL-E 104 1.733333
10 Up 101 1.683333
11 Toy Story 3 103 1.716667
12 Cars 2 120 2.000000
13 Brave 102 1.700000
14 Monsters University 110 1.833333
15 WALL-G 92 1.533333
# QUESTION
# show the length in hours, minutes - eg. 81 minutes show as 2 hours and 21 minutes
# ANSWER
=
sql "select title, length_minutes, length_minutes / 60 as hours,
length_minutes - 60 * (length_minutes/60) as minutes
from movies"
sqldf(sql)
title length_minutes hours minutes
1 Toy Story 81 1 21
2 A Bug's Life 95 1 35
3 Toy Story 2 93 1 33
4 Monsters, Inc. 92 1 32
5 Finding Nemo 107 1 47
6 The Incredibles 116 1 56
7 Cars 117 1 57
8 Ratatouille 115 1 55
9 WALL-E 104 1 44
10 Up 101 1 41
11 Toy Story 3 103 1 43
12 Cars 2 120 2 0
13 Brave 102 1 42
14 Monsters University 110 1 50
15 WALL-G 92 1 32
# ANSWER - using sqlite's % operator
=
sql "select title, length_minutes, length_minutes / 60 as hours,
length_minutes % 60 as minutes
from movies"
sqldf(sql)
title length_minutes hours minutes
1 Toy Story 81 1 21
2 A Bug's Life 95 1 35
3 Toy Story 2 93 1 33
4 Monsters, Inc. 92 1 32
5 Finding Nemo 107 1 47
6 The Incredibles 116 1 56
7 Cars 117 1 57
8 Ratatouille 115 1 55
9 WALL-E 104 1 44
10 Up 101 1 41
11 Toy Story 3 103 1 43
12 Cars 2 120 2 0
13 Brave 102 1 42
14 Monsters University 110 1 50
15 WALL-G 92 1 32
# Note that while it is tempting to refer to hours on line 2 of the following
# query, R's version of SQL (ie. sqlite) will not allow for this.
# Some SQL flavors will allow this.
#
# The following WILL NOT WORK IN R's VERSION OF SQL
=
sql "select title, length_minutes, length_minutes / 60 as hours,
length_minutes - 60 * hours as minutes
from movies"
# sqldf(sql) # ERROR - no such column: hours
15.24.2 Practice - where clause
#-------------------------------------------------------------------------
# https://sqlbolt.com/lesson/select_queries_with_constraints
#
# CONCEPTS
#
# Select query with constraints
# SELECT column, another_column, .
# FROM mytable
# WHERE condition
# AND/OR another_condition
# AND/OR .;
#
#
#
# # List of SQL operators (see webpage)
# Operator Condition SQL Example
# =, !=, < <=, >, >= Standard numerical operators col_name != 4
# BETWEEN . AND . Number is within range of two values (inclusive) col_name BETWEEN 1.5 AND 10.5
# NOT BETWEEN . AND . Number is not within range of two values (inclusive) col_name NOT BETWEEN 1 AND 10
# IN (.) Number exists in a list col_name IN (2, 4, 6)
# NOT IN (.) Number does not exist in a list col_name NOT IN (1, 3, 5)
###########################
# Exercise 2 - Tasks
###########################
# Find the movie with a row id of 6
= "
sql select *
from movies
where id=6"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
###############################################################.
# BETWEEN ... AND ...
#
# and
#
# NOT BETWEEN ... AND ...
###############################################################.
# Find the movies released in the years between 2000 and 2010
= "select *
sql from movies
where year >= 2000 and year <= 2010 "
sqldf(sql)
id title director year length_minutes
1 4 Monsters, Inc. Pete Docter 2001 92
2 5 Finding Nemo Andrew Stanton 2003 107
3 6 The Incredibles Brad Bird 2004 116
4 7 Cars John Lasseter 2006 117
5 8 Ratatouille Brad Bird 2007 115
6 9 WALL-E Andrew Stanton 2008 104
7 10 Up Pete Docter 2009 101
8 11 Toy Story 3 Lee Unkrich 2010 103
# you can combine the above where clause with a different select clause:
# For example:
= "select title, length_minutes, length_minutes / 60 as hours,
sql length_minutes - 60 * (length_minutes/60) as minutes,
year
from movies
where year >= 2000 and year <= 2010 "
sqldf(sql)
title length_minutes hours minutes year
1 Monsters, Inc. 92 1 32 2001
2 Finding Nemo 107 1 47 2003
3 The Incredibles 116 1 56 2004
4 Cars 117 1 57 2006
5 Ratatouille 115 1 55 2007
6 WALL-E 104 1 44 2008
7 Up 101 1 41 2009
8 Toy Story 3 103 1 43 2010
# use the BETWEEN ... AND ...
# to do the same thing
= "select *
sql from movies
where year between 2000 and 2010"
sqldf(sql)
id title director year length_minutes
1 4 Monsters, Inc. Pete Docter 2001 92
2 5 Finding Nemo Andrew Stanton 2003 107
3 6 The Incredibles Brad Bird 2004 116
4 7 Cars John Lasseter 2006 117
5 8 Ratatouille Brad Bird 2007 115
6 9 WALL-E Andrew Stanton 2008 104
7 10 Up Pete Docter 2009 101
8 11 Toy Story 3 Lee Unkrich 2010 103
# Find the movies not released in the years between 2000 and 2010
= "select *
sql from movies
where year < 2000 or year > 2010"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 12 Cars 2 John Lasseter 2011 120
5 13 Brave Brenda Chapman 2012 102
6 14 Monsters University Dan Scanlon 2013 110
7 87 WALL-G Brenda Chapman 2042 92
# use the NOT BETWEEN ... AND ...
# to do the same thing
= "select *
sql from movies
where year not between 2000 and 2010"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 12 Cars 2 John Lasseter 2011 120
5 13 Brave Brenda Chapman 2012 102
6 14 Monsters University Dan Scanlon 2013 110
7 87 WALL-G Brenda Chapman 2042 92
15.24.3 Practice - limit clause
###############################################################.
# LIMIT <M> and LIMIT <M> OFFSET <N>
###############################################################.
#
# The LIMIT clause must come at the very end of the SQL
# statement. For example:
#
# SELECT *
# FROM grades
# ORDER BY student
# LIMIT 3
#
# Displayes the first 3 students.
#
# In general:
#
# <SOME SELECT STATEMENT>
# limit M
#
# (where M is an integer) results in only the first
# M rows of data from what would normally have been displayed
# had the LIMIT clause not been specified.
#
# <SOME SELECT STATEMENT>
# limit M offset N
#
# (where both M and N are integers)
# starts the output from the N+1'th row of what would normally
# have been displayed without the limit clause and then
# displays the next M rows of data.
###############################################################.
# Find the first 5 movies and their release year
# We will learn the best way to do this a little later. The
# answer on the sqlbolt website is not ideal.
sqldf("select *
from movies
order by year")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
# The limit clause allows to retrieve the first several values from an
# ordered query
sqldf("select *
from movies
order by year
limit 5")
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
# Get the next 2 movies
sqldf("select *
from movies
order by year
limit 2 offset 5")
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
2 7 Cars John Lasseter 2006 117
# Find all movies that were released in 2000, 2004 or 2008
=
sql "select *
from movies
where year = 2000 or year = 2004 or year = 2008"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
2 9 WALL-E Andrew Stanton 2008 104
# Do this again using the IN operator
#
# The IN operators takes a LIST of information
# A LIST is a set of values in parentheses, separated by commans.
=
sql "select *
from movies
where year in (2000,2004,2008)"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
2 9 WALL-E Andrew Stanton 2008 104
# show the movies that were released in years other than 2000, 2004, 2008
=
sql "select *
from movies
where year != 2000 and year != 2004 and year != 2008"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 7 Cars John Lasseter 2006 117
7 8 Ratatouille Brad Bird 2007 115
8 10 Up Pete Docter 2009 101
9 11 Toy Story 3 Lee Unkrich 2010 103
10 12 Cars 2 John Lasseter 2011 120
11 13 Brave Brenda Chapman 2012 102
12 14 Monsters University Dan Scanlon 2013 110
13 87 WALL-G Brenda Chapman 2042 92
# A student asked if we could change the and's in the previous question
# to be or's instead. The answer is you can but in addition to
# changing the and's to or's you must also modify the logical expression
# in other ways. Specifically, DeMorgan's law describes how to
# rewrite any logical expression that uses and's and or's into an
# equivalent expression.
#
# DeMorgans law says: to convert a logical expression
# step 1: negate it twice (ie. use two not's)
# Step 2: (a) distribute one of the not's over the expression
# (b) change the and's to or's
# (c) change the or's to and's
=
sql "select *
from movies
where not ( year = 2000 or year = 2004 or year = 2008)"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 7 Cars John Lasseter 2006 117
7 8 Ratatouille Brad Bird 2007 115
8 10 Up Pete Docter 2009 101
9 11 Toy Story 3 Lee Unkrich 2010 103
10 12 Cars 2 John Lasseter 2011 120
11 13 Brave Brenda Chapman 2012 102
12 14 Monsters University Dan Scanlon 2013 110
13 87 WALL-G Brenda Chapman 2042 92
# do this with the "not in" operator
=
sql "select *
from movies
where year not in (2000,2004,2008)"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 7 Cars John Lasseter 2006 117
7 8 Ratatouille Brad Bird 2007 115
8 10 Up Pete Docter 2009 101
9 11 Toy Story 3 Lee Unkrich 2010 103
10 12 Cars 2 John Lasseter 2011 120
11 13 Brave Brenda Chapman 2012 102
12 14 Monsters University Dan Scanlon 2013 110
13 87 WALL-G Brenda Chapman 2042 92
# warning - the "in" operator is followed by a list of values.
# The list of values is in parentheses.
#
# if the values are numbers there are no quotes e.g. (2000,2004,2005)
#
# If the values are character values then in SQL you must use
# 'single quotes' around the different values.
#
# See the next question
# Write a SQL SELECT statment to show all the movies that
# were directed by any of the following people:
# Pete Docter , Brad Bird, Dan Scanlon
#
# (a) Write it using the IN operator
# (b) write it without using the IN operator
# ANSWER - part (a)
= "select *
sql from movies
where director IN ('Pete Docter', 'Brad Bird', 'Dan Scanlon')
order by director"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
2 8 Ratatouille Brad Bird 2007 115
3 14 Monsters University Dan Scanlon 2013 110
4 4 Monsters, Inc. Pete Docter 2001 92
5 10 Up Pete Docter 2009 101
# ANSWER - part (b) - without the IN operator
= "select *
sql from movies
where director='Pete Docter' or director='Brad Bird' or director = 'Dan Scanlon'
order by director"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
2 8 Ratatouille Brad Bird 2007 115
3 14 Monsters University Dan Scanlon 2013 110
4 4 Monsters, Inc. Pete Docter 2001 92
5 10 Up Pete Docter 2009 101
15.24.4 Practice - “like” operator
######################################################################.
# The "like" operator
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The like operator in sql uses a "pattern" to match data values
# This is similar to regular expressions. However, the SQL like
# operator uses very different meta characters.
#
# The two standard metacharacters that are used by the sql like
# operator in the pattern are:
#
# % will "match" 0 or more characters (equivalent to .* in regex)
#
# _ i.e. an underscore, will "match" exactly one character (equivalent to . in regex)
#
# Some DBMS software allows for additional metacharacters in LIKE clauses
# For example, MySql allows for the following. (sqlite does not)
#
# [abc] MySQL DBMS software recognizes these "character classes"
# [a-c] similar to regular expressions. Not all DBMS software recognizes
# [^a-c] these. sqlite does not.
######################################################################.
# exercise 3
# Exercise 3 - Tasks
# Find all the Toy Story movies
= "select *
sql from movies
where title like 'Toy Story%'"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 3 Toy Story 2 John Lasseter 1999 93
3 11 Toy Story 3 Lee Unkrich 2010 103
# Find all the movies directed by John Lasseter
= "select *
sql from movies
where director = 'John Lasseter'"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 7 Cars John Lasseter 2006 117
5 12 Cars 2 John Lasseter 2011 120
# Find all the movies (and director) not directed by John Lasseter
= "select *
sql from movies
where director != 'John Lasseter'"
sqldf(sql)
id title director year length_minutes
1 4 Monsters, Inc. Pete Docter 2001 92
2 5 Finding Nemo Andrew Stanton 2003 107
3 6 The Incredibles Brad Bird 2004 116
4 8 Ratatouille Brad Bird 2007 115
5 9 WALL-E Andrew Stanton 2008 104
6 10 Up Pete Docter 2009 101
7 11 Toy Story 3 Lee Unkrich 2010 103
8 13 Brave Brenda Chapman 2012 102
9 14 Monsters University Dan Scanlon 2013 110
10 87 WALL-G Brenda Chapman 2042 92
# The following will work but is NOT recommended (no pun intended)
= "select *
sql from movies
where NOT ( director = 'John Lasseter' ) "
sqldf(sql)
id title director year length_minutes
1 4 Monsters, Inc. Pete Docter 2001 92
2 5 Finding Nemo Andrew Stanton 2003 107
3 6 The Incredibles Brad Bird 2004 116
4 8 Ratatouille Brad Bird 2007 115
5 9 WALL-E Andrew Stanton 2008 104
6 10 Up Pete Docter 2009 101
7 11 Toy Story 3 Lee Unkrich 2010 103
8 13 Brave Brenda Chapman 2012 102
9 14 Monsters University Dan Scanlon 2013 110
10 87 WALL-G Brenda Chapman 2042 92
# Find all the WALL-* movies
= "select *
sql from movies
where title like 'WALL-_'"
sqldf(sql)
id title director year length_minutes
1 9 WALL-E Andrew Stanton 2008 104
2 87 WALL-G Brenda Chapman 2042 92
movies
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
# QUESTION
# Show all movies that start with a "T" and end with an "s"
= "select *
sql from movies
where title like 'T%s'
order by title"
sqldf(sql)
id title director year length_minutes
1 6 The Incredibles Brad Bird 2004 116
# QUESTION
# Show all movies whose title is exactly 6 charcters (including spaces)
= "select *
sql from movies
where title like '______'
order by title"
sqldf(sql)
id title director year length_minutes
1 12 Cars 2 John Lasseter 2011 120
2 9 WALL-E Andrew Stanton 2008 104
3 87 WALL-G Brenda Chapman 2042 92
# QUESTION
# Show all movies whose title starts with A,B or C
# and whose title is 10 characters or longer
="select title
sql from movies
where title BETWEEN 'A' AND 'D' and
title LIKE '__________%'"
sqldf(sql)
title
1 A Bug's Life
# QUESTION
# Show all movies whose title starts with A,B or C
# and whose title is fewer than 10 characters or long
="select title
sql from movies
where title BETWEEN 'A' AND 'D' and
title NOT LIKE '__________%'"
sqldf(sql)
title
1 Cars
2 Cars 2
3 Brave
="select substr(title, 1,1) from movies order by 1"
sqlsqldf(sql)
substr(title, 1,1)
1 A
2 B
3 C
4 C
5 F
6 M
7 M
8 R
9 T
10 T
11 T
12 T
13 U
14 W
15 W
sqldf("select * from movies where substr(title,1,1) = 'M'")
id title director year length_minutes
1 4 Monsters, Inc. Pete Docter 2001 92
2 14 Monsters University Dan Scanlon 2013 110
sqldf("select * from movies where title < 'D'")
id title director year length_minutes
1 2 A Bug's Life John Lasseter 1998 95
2 7 Cars John Lasseter 2006 117
3 12 Cars 2 John Lasseter 2011 120
4 13 Brave Brenda Chapman 2012 102
15.24.5 Practice - select distinct
################################################################.
# SELECT DISTINCT ...
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Depending on which values are selected, a SELECT ... statement
# might return exactly the same row in the output multiple times.
#
# To instead, show only one copy of each row in the output,
# start the statement with SELECT DISTINCT ...
# (instead of just SELECT ...)
#
# See the examples below.
################################################################.
# The output of the following query includes some rows
# that are exactly the same as other rows.
sqldf("select year, honors, major
from grades
where honors = TRUE and major = 'IDS'
order by year")
year honors major
1 fr TRUE IDS
2 fr TRUE IDS
3 ju TRUE IDS
4 se TRUE IDS
5 so TRUE IDS
# The following query is exactly the same as the previous query, except
# that the following query includes the word DISTINCT at the beginning
# of the query.
#
# This version of the query does not include any duplicated rows in the output.
sqldf("select DISTINCT year, honors, major
from grades
where honors = TRUE and major = 'IDS'
order by year, major")
year honors major
1 fr TRUE IDS
2 ju TRUE IDS
3 se TRUE IDS
4 so TRUE IDS
sqldf("select DISTINCT student, year, honors, major
from grades
where honors = TRUE and major = 'IDS'
order by year, major")
student year honors major
1 anne fr TRUE IDS
2 lucricia fr TRUE IDS
3 mike ju TRUE IDS
4 david se TRUE IDS
5 samantha so TRUE IDS
# Exercise 4
# Show just the director column
= "select director
sql from movies"
sqldf(sql)
director
1 John Lasseter
2 John Lasseter
3 John Lasseter
4 Pete Docter
5 Andrew Stanton
6 Brad Bird
7 John Lasseter
8 Brad Bird
9 Andrew Stanton
10 Pete Docter
11 Lee Unkrich
12 John Lasseter
13 Brenda Chapman
14 Dan Scanlon
15 Brenda Chapman
# List all directors of movies, without duplicates
= "select DISTINCT director
sql from movies"
sqldf(sql)
director
1 John Lasseter
2 Pete Docter
3 Andrew Stanton
4 Brad Bird
5 Lee Unkrich
6 Brenda Chapman
7 Dan Scanlon
# List all directors of movies (alphabetically), without duplicates
= "select DISTINCT director
sql from movies
order by director"
sqldf(sql)
director
1 Andrew Stanton
2 Brad Bird
3 Brenda Chapman
4 Dan Scanlon
5 John Lasseter
6 Lee Unkrich
7 Pete Docter
# Both the where clause and the order by clause may refer to information
# that is not actually displayed by the select clause
#
# For example
# List the title and year for all movies directed by John Lasseter.
# Do not John Lasseter's name in the output.
= "select title, year
sql from movies
where director = 'John Lasseter'"
sqldf(sql)
title year
1 Toy Story 1995
2 A Bug's Life 1998
3 Toy Story 2 1999
4 Cars 2006
5 Cars 2 2011
# List the title of all movies in "reverse chronological order".
# In other words, the most recent movie should be listed first and the
# oldest movie listed last.
# Do NOT show the actual year in the output.
= "select title
sql from movies
order by year desc"
sqldf(sql)
title
1 WALL-G
2 Monsters University
3 Brave
4 Cars 2
5 Toy Story 3
6 Up
7 WALL-E
8 Ratatouille
9 Cars
10 The Incredibles
11 Finding Nemo
12 Monsters, Inc.
13 Toy Story 2
14 A Bug's Life
15 Toy Story
# List the last four movies released (ordered from least recent to most recent)
= "select *
sql from movies
order by year"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
# the following does the same thing
= "select *
sql from movies
order by year asc"
sqldf(sql)
id title director year length_minutes
1 1 Toy Story John Lasseter 1995 81
2 2 A Bug's Life John Lasseter 1998 95
3 3 Toy Story 2 John Lasseter 1999 93
4 4 Monsters, Inc. Pete Docter 2001 92
5 5 Finding Nemo Andrew Stanton 2003 107
6 6 The Incredibles Brad Bird 2004 116
7 7 Cars John Lasseter 2006 117
8 8 Ratatouille Brad Bird 2007 115
9 9 WALL-E Andrew Stanton 2008 104
10 10 Up Pete Docter 2009 101
11 11 Toy Story 3 Lee Unkrich 2010 103
12 12 Cars 2 John Lasseter 2011 120
13 13 Brave Brenda Chapman 2012 102
14 14 Monsters University Dan Scanlon 2013 110
15 87 WALL-G Brenda Chapman 2042 92
# List the last four movies released (ordered from most recent to last)
#
# To order a column in descending order follow the name of the column with desc
= "select *
sql from movies
order by year desc"
sqldf(sql)
id title director year length_minutes
1 87 WALL-G Brenda Chapman 2042 92
2 14 Monsters University Dan Scanlon 2013 110
3 13 Brave Brenda Chapman 2012 102
4 12 Cars 2 John Lasseter 2011 120
5 11 Toy Story 3 Lee Unkrich 2010 103
6 10 Up Pete Docter 2009 101
7 9 WALL-E Andrew Stanton 2008 104
8 8 Ratatouille Brad Bird 2007 115
9 7 Cars John Lasseter 2006 117
10 6 The Incredibles Brad Bird 2004 116
11 5 Finding Nemo Andrew Stanton 2003 107
12 4 Monsters, Inc. Pete Docter 2001 92
13 3 Toy Story 2 John Lasseter 1999 93
14 2 A Bug's Life John Lasseter 1998 95
15 1 Toy Story John Lasseter 1995 81
= "select *
sql from movies
order by year desc
limit 4"
sqldf(sql)
id title director year length_minutes
1 87 WALL-G Brenda Chapman 2042 92
2 14 Monsters University Dan Scanlon 2013 110
3 13 Brave Brenda Chapman 2012 102
4 12 Cars 2 John Lasseter 2011 120
# List all movies sorted alphabetically by title
= "select *
sql from movies
order by title"
sqldf(sql)
id title director year length_minutes
1 2 A Bug's Life John Lasseter 1998 95
2 13 Brave Brenda Chapman 2012 102
3 7 Cars John Lasseter 2006 117
4 12 Cars 2 John Lasseter 2011 120
5 5 Finding Nemo Andrew Stanton 2003 107
6 14 Monsters University Dan Scanlon 2013 110
7 4 Monsters, Inc. Pete Docter 2001 92
8 8 Ratatouille Brad Bird 2007 115
9 6 The Incredibles Brad Bird 2004 116
10 1 Toy Story John Lasseter 1995 81
11 3 Toy Story 2 John Lasseter 1999 93
12 11 Toy Story 3 Lee Unkrich 2010 103
13 10 Up Pete Docter 2009 101
14 9 WALL-E Andrew Stanton 2008 104
15 87 WALL-G Brenda Chapman 2042 92
# List the first five movies sorted alphabetically
= "select *
sql from movies
order by title
limit 5"
sqldf(sql)
id title director year length_minutes
1 2 A Bug's Life John Lasseter 1998 95
2 13 Brave Brenda Chapman 2012 102
3 7 Cars John Lasseter 2006 117
4 12 Cars 2 John Lasseter 2011 120
5 5 Finding Nemo Andrew Stanton 2003 107
# List the next five movies sorted alphabetically
= "select *
sql from movies
order by title
limit 5 offset 5"
sqldf(sql)
id title director year length_minutes
1 14 Monsters University Dan Scanlon 2013 110
2 4 Monsters, Inc. Pete Docter 2001 92
3 8 Ratatouille Brad Bird 2007 115
4 6 The Incredibles Brad Bird 2004 116
5 1 Toy Story John Lasseter 1995 81
# List the next five movies sorted alphabetically
= "select *
sql from movies
order by title
limit 5 offset 10"
sqldf(sql)
id title director year length_minutes
1 3 Toy Story 2 John Lasseter 1999 93
2 11 Toy Story 3 Lee Unkrich 2010 103
3 10 Up Pete Docter 2009 101
4 9 WALL-E Andrew Stanton 2008 104
5 87 WALL-G Brenda Chapman 2042 92
15.24.6 Practice - more practice - north_american_cities
# Review 1 - Tasks
#
# https://sqlbolt.com/lesson/select_queries_review
# See the table (i.e. dataframe)
sqldf("select * north_american_cities")
Error: near "north_american_cities": syntax error
# List all the Canadian cities and their populations
= "select country, city, population
sql from north_american_cities
where country='Canada'"
sqldf(sql)
country city population
1 Canada Toronto 2795060
2 Canada Montreal 1717767
# Order all the cities in the United States by their latitude from north to south
= "
sql select *
from north_american_cities
where country = 'United States'
order by latitude desc"
sqldf(sql)
city country population latitude longitude
1 Chicago United States 2718782 41.87811 -87.62980
2 New York United States 8405837 40.71278 -74.00594
3 Philadelphia United States 1553165 39.95258 -75.16522
4 Los Angeles United States 3884307 34.05223 -118.24368
5 Phoenix United States 1513367 33.44838 -112.07404
6 Houston United States 2195914 29.76043 -95.36980
# List all the cities west of Chicago, ordered from west to east
= "select *
sql from north_american_cities
where longitude < -87.62980
order by longitude
"
sqldf(sql)
city country population latitude longitude
1 Los Angeles United States 3884307 34.05223 -118.24368
2 Phoenix United States 1513367 33.44838 -112.07404
3 Guadalajara Mexico 1500800 20.65970 -103.34961
4 Mexico City Mexico 8555500 19.43261 -99.13321
5 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
6 Houston United States 2195914 29.76043 -95.36980
# List the two largest cities in Mexico (by population)
# Start by listing all of the cities in Mexico in order of decreasing population
= "select *
sql from north_american_cities
where country = 'Mexico'
order by population desc"
sqldf(sql)
city country population latitude longitude
1 Mexico City Mexico 8555500 19.43261 -99.13321
2 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
3 Guadalajara Mexico 1500800 20.65970 -103.34961
# use the limit clause to limit results to only a specified number of values
# Start by listing all of the cities in Mexico
= "select *
sql from north_american_cities
where country = 'Mexico'
order by population desc
limit 2"
sqldf(sql)
city country population latitude longitude
1 Mexico City Mexico 8555500 19.43261 -99.13321
2 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
# The limit clause takes an option "offset"
# Each row in the data has an imaginary number starting with 0 for the 1st row
# and 1 for the 2nd row , etc.
= "select * from north_american_cities
sql order by population desc"
sqldf(sql)
city country population latitude longitude
1 Mexico City Mexico 8555500 19.43261 -99.13321
2 New York United States 8405837 40.71278 -74.00594
3 Los Angeles United States 3884307 34.05223 -118.24368
4 Toronto Canada 2795060 43.65323 -79.38318
5 Chicago United States 2718782 41.87811 -87.62980
6 Houston United States 2195914 29.76043 -95.36980
7 Havana Cuba 2106146 23.05407 -82.34519
8 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
9 Montreal Canada 1717767 45.50169 -73.56726
10 Philadelphia United States 1553165 39.95258 -75.16522
11 Phoenix United States 1513367 33.44838 -112.07404
12 Guadalajara Mexico 1500800 20.65970 -103.34961
# Do the same thing but only get the 3 largest cities
= "select * from north_american_cities
sql order by population desc
limit 3"
sqldf(sql)
city country population latitude longitude
1 Mexico City Mexico 8555500 19.43261 -99.13321
2 New York United States 8405837 40.71278 -74.00594
3 Los Angeles United States 3884307 34.05223 -118.24368
# Do the same thing but get the 4th, 5th and 6th largest cities
= "select * from north_american_cities
sql order by population desc
limit 3 offset 3"
sqldf(sql)
city country population latitude longitude
1 Toronto Canada 2795060 43.65323 -79.38318
2 Chicago United States 2718782 41.87811 -87.62980
3 Houston United States 2195914 29.76043 -95.36980
# Do the same thing but get the 7th 8th and 9th largest cities
= "select * from north_american_cities
sql order by population desc
limit 3 offset 6"
sqldf(sql)
city country population latitude longitude
1 Havana Cuba 2106146 23.05407 -82.34519
2 Ecatepec de Morelos Mexico 1742000 19.60184 -99.05067
3 Montreal Canada 1717767 45.50169 -73.56726
# List the third and fourth largest cities (by population) in the United States and their population
# Let's start by showing all USA cities
= "select *
sql from north_american_cities
where country = 'United States'
order by population desc
"
sqldf(sql)
city country population latitude longitude
1 New York United States 8405837 40.71278 -74.00594
2 Los Angeles United States 3884307 34.05223 -118.24368
3 Chicago United States 2718782 41.87811 -87.62980
4 Houston United States 2195914 29.76043 -95.36980
5 Philadelphia United States 1553165 39.95258 -75.16522
6 Phoenix United States 1513367 33.44838 -112.07404
# Use limit 2 offset 3 to just show the 3rd and 4th largest cities
= "select *
sql from north_american_cities
where country = 'United States'
order by population desc
limit 2 offset 2
"
sqldf(sql)
city country population latitude longitude
1 Chicago United States 2718782 41.87811 -87.6298
2 Houston United States 2195914 29.76043 -95.3698
15.25 dynamic SQL - a brief introduction
NOTE:
- “IDS1020-Intro to IDS” students can ignore this section.
########################################################################.
# It is very common for SQL to be "embedded" in another language
# very similar to what we are doing with R. There are ways to
# "embed" SQL in all popular programming languages.
#
# This technique of building a sql statement from different parts of
# info using another language (in this case R) is known
# as "dynamic sql".
########################################################################.
#################################################################.
# Brief intro to an advanced concept: "dynamic SQL"?
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# As we've already learned, the sqldf function takes a
# single character value that contains
# a SQL command, e.g. sqldf("SELECT * FROM SOME_TABLE")
# This allows us to use the SQL language inside our R programs.
# However, you must remember that SQL and R are very different
# languages. As we learn SQL, we will focus primarily on how to write
# various forms of SQL commands. We wont focus much on R
# other than to use the sqldf function to run the SQL code.
#
# However, sometimes it is adventageous to
# use R code (more than just sqldf) to "dynamically" construct a
# SQL statement based on some other information that is
# available to your R code but would not have been available
# at the time you are writing the SQL code.
#
# For example, the following function, mySelect
# takes two arguments:
#
# table - The name of a table
# cols - The names of the columns to display from the table
#
# It displays just the specified columns from the specified table.
= function( table, cols){
mySelect = paste(cols, collapse=",")
colsList = paste0( "select ", colsList, " from ", table)
selectStatement sqldf(selectStatement)
}
# The function can now be called with different tables and
# lists of columns
mySelect("movies", c("title","director"))
title director
1 Toy Story John Lasseter
2 A Bug's Life John Lasseter
3 Toy Story 2 John Lasseter
4 Monsters, Inc. Pete Docter
5 Finding Nemo Andrew Stanton
6 The Incredibles Brad Bird
7 Cars John Lasseter
8 Ratatouille Brad Bird
9 WALL-E Andrew Stanton
10 Up Pete Docter
11 Toy Story 3 Lee Unkrich
12 Cars 2 John Lasseter
13 Brave Brenda Chapman
14 Monsters University Dan Scanlon
15 WALL-G Brenda Chapman
mySelect("grades", c("student", "year", "test1", "test2"))
student year test1 test2
1 joe fr 70 82
2 sue fr 80 80
3 anne fr 90 95
4 frank so 100 95
5 bob so 30 NA
6 samantha so 100 70
7 larry ju 32 80
8 bill se 84 90
9 lucricia fr 80 100
10 margaret se 59 47
11 dalia se 85 92
12 zvi ju 90 98
13 mike ju 90 86
14 david se 90 87
15 dana so 100 93