15  Relational Databases and SQL (using R)

We will spend the next few sections focusing on “Structured Query Language (SQL)”, the language that is used for accessing data in a relational database. Remember that a relational database stores its information as a collection of several “tables”. Each table arranges its data in rows and columns.

15.1 Web resources for learning SQL

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The following websites contains a basic overview of many SQL 
# commands and concepts. We will be going through the concepts and many 
# of the interactive exercises from these websites
#
# https://www.w3schools.com/sql/
#
# https://sqlbolt.com/
#
# https://www.sqlitetutorial.net/
#
# https://mode.com/sql-tutorial/
#
# https://sqlzoo.com/
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# See this page 
#
#   https://sqlbolt.com/lesson/introduction
# 
# For an overview of :
#
# - What is Structured Query Language (SQL) - pronounced "sequel"
# 
# - What is a "Relational Database"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.2 What software do I need to use SQL?

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The sqldf package in R.  ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# To use SQL you must have access to software that interacts with 
# a relational database. As mentioned above many such software packages
# exist, e.g.
#
#      - Microsoft SQL Server
#      - Oracle Relational Database Management System
#      - MySql
#      - Postgres
#      - MariaDb
#      - many many others
#
# In addition many custom software packages exist that use
# SQL to manipulate relational databases. 
#
# Many of these software packages tend to be overly complex to install. 
# However, the underlying SQL language that is used by all of these 
# different software packages tends to be very, very similar. 
# 
# For this class we will be using R to learn the basic SQL commands.
# The sqldf package in R includes the sqldf function that takes a single
# character value. The value is expected to be a valid SQL command. 
# Instead of referring to database "tables" the sqldf function interprets
# database table names found in the commands as referring to R dataframes.
# This makes sense since R dataframes are very similar in structure to 
# relational database tables.
#
# The version of SQL that is used by default with the sqldf package
# is the "sqlite" version. You can search online for more information
# about sqlite if you want to understand more about the specifics 
# of this "flavor" of SQL. The following website is a good place to
# start: https://www.sqlitetutorial.net/
#
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.2.1 The sqldf R package

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Using the sqldf R package ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SQL is NOT an "R" concept ... however we can use SQL in R via 
# the sqldf package.
#
# The "sqldf" package in R allows you to use the sqldf() function
# to run sql statements using R dataframes instead of database tables.
# We will be using this package to introduce the first sql commands
# we will learn. 
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# install and load (i.e. require) the sqldf package 

if(!require(sqldf)){install.packages("sqldf");require(sqldf);}
Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite

15.3 data for this part of the tutorial

You can download the data for this tutorial from the following files.

  • grades.csv This is a .csv file. You can read the contents of the file into R with the read.csv function as shown below.

  • sqlbolt_tables-v007.RData This is a .RData file. You can read the contents of the file into R with the load function as shown below.

You can load these tables into your R session with the following commands.

grades = read.csv("grades.csv")
load("sqlbolt_tables-v007.RData")

Below are the contents of these tables:

# Tables that we will be using in this presentation.

sqldf("select * from grades")
      id  student year honors test1 test2 test3 major
1  s0100      joe   fr  FALSE    70    82    NA   IDS
2  s0200      sue   fr  FALSE    80    80    79   IDS
3  s0300     anne   fr   TRUE    90    95    92   IDS
4  s0400    frank   so   TRUE   100    95    91   ACC
5  s0500      bob   so  FALSE    30    NA    NA   IDS
6  s0600 samantha   so   TRUE   100    70    97   IDS
7  s0700    larry   ju  FALSE    32    80    NA   FIN
8  s0800     bill   se  FALSE    84    90    92   ACC
9  s0900 lucricia   fr   TRUE    80   100    88   IDS
10 s1000 margaret   se  FALSE    59    47    NA   IDS
11 s1100    dalia   se  FALSE    85    92    93   FIN
12 s1200      zvi   ju   TRUE    90    98    92   ACC
13 s1300     mike   ju   TRUE    90    86    79   IDS
14 s1400    david   se   TRUE    90    87    87   IDS
15 s1500     dana   so  FALSE   100    93    91   FIN
sqldf("select * from movies")
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
sqldf("select * from orders")
    OrderID CustomerID EmployeeID  OrderDate ShipperID
1     10248         90          5   7/4/1996         3
2     10249         81          6   7/5/1996         1
3     10250         34          4   7/8/1996         2
4     10251         84          3   7/8/1996         1
5     10252         76          4   7/9/1996         2
6     10253         34          3  7/10/1996         2
7     10254         14          5  7/11/1996         2
8     10255         68          9  7/12/1996         3
9     10256         88          3  7/15/1996         2
10    10257         35          4  7/16/1996         3
11    10258         20          1  7/17/1996         1
12    10259         13          4  7/18/1996         3
13    10260         55          4  7/19/1996         1
14    10261         61          4  7/19/1996         2
15    10262         65          8  7/22/1996         3
16    10263         20          9  7/23/1996         3
17    10264         24          6  7/24/1996         3
18    10265          7          2  7/25/1996         1
19    10266         87          3  7/26/1996         3
20    10267         25          4  7/29/1996         1
21    10268         33          8  7/30/1996         3
22    10269         89          5  7/31/1996         1
23    10270         87          1   8/1/1996         1
24    10271         75          6   8/1/1996         2
25    10272         65          6   8/2/1996         2
26    10273         63          3   8/5/1996         3
27    10274         85          6   8/6/1996         1
28    10275         49          1   8/7/1996         1
29    10276         80          8   8/8/1996         3
30    10277         52          2   8/9/1996         3
31    10278          5          8  8/12/1996         2
32    10279         44          8  8/13/1996         2
33    10280          5          2  8/14/1996         1
34    10281         69          4  8/14/1996         1
35    10282         69          4  8/15/1996         1
36    10283         46          3  8/16/1996         3
37    10284         44          4  8/19/1996         1
38    10285         63          1  8/20/1996         2
39    10286         63          8  8/21/1996         3
40    10287         67          8  8/22/1996         3
41    10288         66          4  8/23/1996         1
42    10289         11          7  8/26/1996         3
43    10290         15          8  8/27/1996         1
44    10291         61          6  8/27/1996         2
45    10292         81          1  8/28/1996         2
46    10293         80          1  8/29/1996         3
47    10294         65          4  8/30/1996         2
48    10295         85          2   9/2/1996         2
49    10296         46          6   9/3/1996         1
50    10297          7          5   9/4/1996         2
51    10298         37          6   9/5/1996         2
52    10299         67          4   9/6/1996         2
53    10300         49          2   9/9/1996         2
54    10301         86          8   9/9/1996         2
55    10302         76          4  9/10/1996         2
56    10303         30          7  9/11/1996         2
57    10304         80          1  9/12/1996         2
58    10305         55          8  9/13/1996         3
59    10306         69          1  9/16/1996         3
60    10307         48          2  9/17/1996         2
61    10308          2          7  9/18/1996         3
62    10309         37          3  9/19/1996         1
63    10310         77          8  9/20/1996         2
64    10311         18          1  9/20/1996         3
65    10312         86          2  9/23/1996         2
66    10313         63          2  9/24/1996         2
67    10314         65          1  9/25/1996         2
68    10315         38          4  9/26/1996         2
69    10316         65          1  9/27/1996         3
70    10317         48          6  9/30/1996         1
71    10318         38          8  10/1/1996         2
72    10319         80          7  10/2/1996         3
73    10320         87          5  10/3/1996         3
74    10321         38          3  10/3/1996         2
75    10322         58          7  10/4/1996         3
76    10323         39          4  10/7/1996         1
77    10324         71          9  10/8/1996         1
78    10325         39          1  10/9/1996         3
79    10326          8          4 10/10/1996         2
80    10327         24          2 10/11/1996         1
81    10328         28          4 10/14/1996         3
82    10329         75          4 10/15/1996         2
83    10330         46          3 10/16/1996         1
84    10331          9          9 10/16/1996         1
85    10332         51          3 10/17/1996         2
86    10333         87          5 10/18/1996         3
87    10334         84          8 10/21/1996         2
88    10335         37          7 10/22/1996         2
89    10336         60          7 10/23/1996         2
90    10337         25          4 10/24/1996         3
91    10338         55          4 10/25/1996         3
92    10339         51          2 10/28/1996         2
93    10340          9          1 10/29/1996         3
94    10341         73          7 10/29/1996         3
95    10342         25          4 10/30/1996         2
96    10343         44          4 10/31/1996         1
97    10344         89          4  11/1/1996         2
98    10345         63          2  11/4/1996         2
99    10346         65          3  11/5/1996         3
100   10347         21          4  11/6/1996         3
101   10348         86          4  11/7/1996         2
102   10349         75          7  11/8/1996         1
103   10350         41          6 11/11/1996         2
104   10351         20          1 11/11/1996         1
105   10352         28          3 11/12/1996         3
106   10353         59          7 11/13/1996         3
107   10354         58          8 11/14/1996         3
108   10355          4          6 11/15/1996         1
109   10356         86          6 11/18/1996         2
110   10357         46          1 11/19/1996         3
111   10358         41          5 11/20/1996         1
112   10359         72          5 11/21/1996         3
113   10360          7          4 11/22/1996         3
114   10361         63          1 11/22/1996         2
115   10362          9          3 11/25/1996         1
116   10363         17          4 11/26/1996         3
117   10364         19          1 11/26/1996         1
118   10365          3          3 11/27/1996         2
119   10366         29          8 11/28/1996         2
120   10367         83          7 11/28/1996         3
121   10368         20          2 11/29/1996         2
122   10369         75          8  12/2/1996         2
123   10370         14          6  12/3/1996         2
124   10371         41          1  12/3/1996         1
125   10372         62          5  12/4/1996         2
126   10373         37          4  12/5/1996         3
127   10374         91          1  12/5/1996         3
128   10375         36          3  12/6/1996         2
129   10376         51          1  12/9/1996         2
130   10377         72          1  12/9/1996         3
131   10378         24          5 12/10/1996         3
132   10379         61          2 12/11/1996         1
133   10380         37          8 12/12/1996         3
134   10381         46          3 12/12/1996         3
135   10382         20          4 12/13/1996         1
136   10383          4          8 12/16/1996         3
137   10384          5          3 12/16/1996         3
138   10385         75          1 12/17/1996         2
139   10386         21          9 12/18/1996         3
140   10387         70          1 12/18/1996         2
141   10388         72          2 12/19/1996         1
142   10389         10          4 12/20/1996         2
143   10390         20          6 12/23/1996         1
144   10391         17          3 12/23/1996         3
145   10392         59          2 12/24/1996         3
146   10393         71          1 12/25/1996         3
147   10394         36          1 12/25/1996         3
148   10395         35          6 12/26/1996         1
149   10396         25          1 12/27/1996         3
150   10397         60          5 12/27/1996         1
151   10398         71          2 12/30/1996         3
152   10399         83          8 12/31/1996         3
153   10400         19          1   1/1/1997         3
154   10401         65          1   1/1/1997         1
155   10402         20          8   1/2/1997         2
156   10403         20          4   1/3/1997         3
157   10404         49          2   1/3/1997         1
158   10405         47          1   1/6/1997         1
159   10406         62          7   1/7/1997         1
160   10407         56          2   1/7/1997         2
161   10408         23          8   1/8/1997         1
162   10409         54          3   1/9/1997         1
163   10410         10          3  1/10/1997         3
164   10411         10          9  1/10/1997         3
165   10412         87          8  1/13/1997         2
166   10413         41          3  1/14/1997         2
167   10414         21          2  1/14/1997         3
168   10415         36          3  1/15/1997         1
169   10416         87          8  1/16/1997         3
170   10417         73          4  1/16/1997         3
171   10418         63          4  1/17/1997         1
172   10419         68          4  1/20/1997         2
173   10420         88          3  1/21/1997         1
174   10421         61          8  1/21/1997         1
175   10422         27          2  1/22/1997         1
176   10423         31          6  1/23/1997         3
177   10424         51          7  1/23/1997         2
178   10425         41          6  1/24/1997         2
179   10426         29          4  1/27/1997         1
180   10427         59          4  1/27/1997         2
181   10428         66          7  1/28/1997         1
182   10429         37          3  1/29/1997         2
183   10430         20          4  1/30/1997         1
184   10431         10          4  1/30/1997         2
185   10432         75          3  1/31/1997         2
186   10433         60          3   2/3/1997         3
187   10434         24          3   2/3/1997         2
188   10435         16          8   2/4/1997         2
189   10436          7          3   2/5/1997         2
190   10437         87          8   2/5/1997         1
191   10438         79          3   2/6/1997         2
192   10439         51          6   2/7/1997         3
193   10440         71          4  2/10/1997         2
194   10441         55          3  2/10/1997         2
195   10442         20          3  2/11/1997         2
196   10443         66          8  2/12/1997         1
sqldf("select * from customers")
   CustomerID                         CustomerName              ContactName                                           Address             City PostalCode     Country
1           1                  Alfreds Futterkiste             Maria Anders                                     Obere Str. 57           Berlin      12209     Germany
2           2   Ana Trujillo Emparedados y helados             Ana Trujillo                  Avda. de la Constituci\xf3n 2222   M\xe9xico D.F.       5021      Mexico
3           3           Antonio Moreno Taquer\xeda           Antonio Moreno                                    Mataderos 2312   M\xe9xico D.F.       5023      Mexico
4           4                      Around the Horn             Thomas Hardy                                   120 Hanover Sq.           London    WA1 1DP          UK
5           5                Berglunds snabbk\xf6p       Christina Berglund                                 Berguvsv\xe4gen 8         Lule\xe5   S-958 22      Sweden
6           6              Blauer See Delikatessen               Hanna Moos                                    Forsterstr. 57         Mannheim      68306     Germany
7           7              Blondel p\xe8re et fils Fr\xe9d\xe9rique Citeaux                               24, place Kl\xe9ber       Strasbourg      67000      France
8           8         B\xf3lido Comidas preparadas         Mart\xedn Sommer                                    C/ Araquil, 67           Madrid      28023       Spain
9           9                             Bon app'        Laurence Lebihans                              12, rue des Bouchers        Marseille      13008      France
10         10               Bottom-Dollar Marketse        Elizabeth Lincoln                                23 Tsawassen Blvd.        Tsawassen    T2F 8M4      Canada
11         11                        B's Beverages        Victoria Ashworth                                 Fauntleroy Circus           London    EC2 5NT          UK
12         12           Cactus Comidas para llevar         Patricio Simpson                                       Cerrito 333     Buenos Aires       1010   Argentina
13         13           Centro comercial Moctezuma          Francisco Chang                           Sierras de Granada 9993   M\xe9xico D.F.       5022      Mexico
14         14                    Chop-suey Chinese                Yang Wang                                      Hauptstr. 29             Bern       3012 Switzerland
15         15                  Com\xe9rcio Mineiro             Pedro Afonso                           Av. dos Lus\xedadas, 23     S\xe3o Paulo  05432-043      Brazil
16         16                Consolidated Holdings          Elizabeth Brown                       Berkeley Gardens 12 Brewery           London    WX1 6LT          UK
17         17            Drachenblut Delikatessend             Sven Ottlieb                                      Walserweg 21           Aachen      52066     Germany
18         18                      Du monde entier           Janine Labrune                      67, rue des Cinquante Otages           Nantes      44000      France
19         19                   Eastern Connection                Ann Devon                                    35 King George           London    WX3 6FW          UK
20         20                         Ernst Handel            Roland Mendel                                      Kirchgasse 6             Graz       8010     Austria
21         21                   Familia Arquibaldo                Aria Cruz                                   Rua Or\xf3s, 92     S\xe3o Paulo  05442-030      Brazil
22         22 FISSA Fabrica Inter. Salchichas S.A.               Diego Roel                                C/ Moralzarzal, 86           Madrid      28034       Spain
23         23                    Folies gourmandes         Martine Ranc\xe9                       184, chauss\xe9e de Tournai            Lille      59000      France
24         24                    Folk och f\xe4 HB            Maria Larsson                                   \xc5kergatan 24        Br\xe4cke   S-844 67      Sweden
25         25                       Frankenversand            Peter Franken                                 Berliner Platz 43       M\xfcnchen      80805     Germany
26         26                  France restauration           Carine Schmitt                                    54, rue Royale           Nantes      44000      France
27         27                       Franchi S.p.A.            Paolo Accorti                               Via Monte Bianco 34           Torino      10100       Italy
28         28       Furia Bacalhau e Frutos do Mar           Lino Rodriguez                            Jardim das rosas n. 32           Lisboa       1675    Portugal
29         29         Galer\xeda del gastr\xf3nomo         Eduardo Saavedra                         Rambla de Catalu\xf1a, 23        Barcelona       8022       Spain
30         30               Godos Cocina T\xedpica     Jos\xe9 Pedro Freyre                                     C/ Romero, 33          Sevilla      41101       Spain
31         31                  Gourmet Lanchonetes         Andr\xe9 Fonseca                                   Av. Brasil, 442         Campinas  04876-786      Brazil
32         32              Great Lakes Food Market            Howard Snyder                                  2732 Baker Blvd.           Eugene      97403         USA
33         33                 GROSELLA-Restaurante           Manuel Pereira                      5\xaa Ave. Los Palos Grandes          Caracas       1081   Venezuela
34         34                        Hanari Carnes             Mario Pontes                                Rua do Pa\xe7o, 67   Rio de Janeiro  05454-876      Brazil
35         35                  HILARI\xd3N-Abastos      Carlos Hern\xe1ndez        Carrera 22 con Ave. Carlos Soublette #8-35 San Crist\xf3bal       5022   Venezuela
36         36           Hungry Coyote Import Store            Yoshi Latimer                    City Center Plaza 516 Main St.            Elgin      97827         USA
37         37         Hungry Owl All-Night Grocers         Patricia McKenna                                  8 Johnstown Road             Cork                Ireland
38         38                       Island Trading            Helen Bennett                         Garden House Crowther Way            Cowes   PO31 7PJ          UK
39         39                   K\xf6niglich Essen            Philip Cramer                                     Maubelstr. 90      Brandenburg      14776     Germany
40         40                 La corne d'abondance            Daniel Tonini                            67, avenue de l'Europe       Versailles      78000      France
41         41                     La maison d'Asie           Annette Roulet                             1 rue Alsace-Lorraine         Toulouse      31000      France
42         42        Laughing Bacchus Wine Cellars          Yoshi Tannamuri                                      1900 Oak St.        Vancouver    V3F 2K1      Canada
43         43                 Lazy K Kountry Store               John Steel                              12 Orchestra Terrace      Walla Walla      99362         USA
44         44                  Lehmanns Marktstand           Renate Messner                                      Magazinweg 7   Frankfurt a.M.      60528     Germany
45         45                    Let's Stop N Shop             Jaime Yorres                               87 Polk St. Suite 5    San Francisco      94117         USA
46         46                    LILA-Supermercado       Carlos Gonz\xe1lez Carrera 52 con Ave. Bol\xedvar #65-98 Llano Largo     Barquisimeto       3508   Venezuela
47         47                     LINO-Delicateses         Felipe Izquierdo                           Ave. 5 de Mayo Porlamar  I. de Margarita       4980   Venezuela
48         48             Lonesome Pine Restaurant              Fran Wilson                                89 Chiaroscuro Rd.         Portland      97219         USA
49         49         Magazzini Alimentari Riuniti         Giovanni Rovelli                           Via Ludovico il Moro 22          Bergamo      24100       Italy
50         50                         Maison Dewey          Catherine Dewey                               Rue Joseph-Bens 532        Bruxelles     B-1180     Belgium
51         51                    M\xe8re Paillarde        Jean Fresni\xe8re                                43 rue St. Laurent      Montr\xe9al    H1J 1C3      Canada
52         52               Morgenstern Gesundkost          Alexander Feuer                                       Heerstr. 22          Leipzig       4179     Germany
53         53                          North/South           Simon Crowther                      South House 300 Queensbridge           London    SW7 1RZ          UK
54         54         Oc\xe9ano Atl\xe1ntico Ltda.           Yvonne Moncada               Ing. Gustavo Moncada 8585 Piso 20-A     Buenos Aires       1010   Argentina
55         55               Old World Delicatessen            Rene Phillips                                   2743 Bering St.        Anchorage      99508         USA
56         56                Ottilies K\xe4seladen      Henriette Pfalzheim                                Mehrheimerstr. 369          K\xf6ln      50739     Germany
57         57              Paris sp\xe9cialit\xe9s           Marie Bertrand                           265, boulevard Charonne            Paris      75012      France
58         58         Pericles Comidas cl\xe1sicas   Guillermo Fern\xe1ndez                          Calle Dr. Jorge Cash 321   M\xe9xico D.F.       5033      Mexico
59         59                     Piccolo und mehr              Georg Pipps                                       Geislweg 14         Salzburg       5020     Austria
60         60              Princesa Isabel Vinhoss         Isabel de Castro                         Estrada da sa\xfade n. 58           Lisboa       1756    Portugal
61         61                       Que Del\xedcia         Bernardo Batista                           Rua da Panificadora, 12   Rio de Janeiro  02389-673      Brazil
62         62                        Queen Cozinha        L\xfacia Carvalho                      Alameda dos Can\xe0rios, 891     S\xe3o Paulo  05487-020      Brazil
63         63                           QUICK-Stop              Horst Kloss                               Taucherstra\xdfe 10        Cunewalde       1307     Germany
64         64                        Rancho grande      Sergio Guti\xe9rrez                            Av. del Libertador 900     Buenos Aires       1010   Argentina
65         65           Rattlesnake Canyon Grocery             Paula Wilson                                   2817 Milton Dr.      Albuquerque      87110         USA
66         66                   Reggiani Caseifici          Maurizio Moroni                            Strada Provinciale 124    Reggio Emilia      42100       Italy
67         67                   Ricardo Adocicados           Janete Limeira                               Av. Copacabana, 267   Rio de Janeiro  02389-890      Brazil
68         68                   Richter Supermarkt             Michael Holz                                 Grenzacherweg 237        Gen\xe8ve       1203 Switzerland
69         69                     Romero y tomillo         Alejandra Camino                                    Gran V\xeda, 1           Madrid      28001       Spain
70         70                     Sant\xe9 Gourmet         Jonas Bergulfsen                            Erling Skakkes gate 78          Stavern       4110      Norway
71         71                   Save-a-lot Markets           Jose Pavarotti                                   187 Suffolk Ln.            Boise      83720         USA
72         72                   Seven Seas Imports               Hari Kumar                                   90 Wadhurst Rd.           London   OX15 4NB          UK
73         73                        Simons bistro           Jytte Petersen                                   Vinb\xe6ltet 34     K\xf8benhavn       1734     Denmark
74         74           Sp\xe9cialit\xe9s du monde        Dominique Perrier                                 25, rue Lauriston            Paris      75016      France
75         75                Split Rail Beer & Ale       Art Braunschweiger                                      P.O. Box 555           Lander      82520         USA
76         76               Supr\xeames d\xe9lices         Pascale Cartrain                              Boulevard Tirou, 255        Charleroi     B-6000     Belgium
77         77                       The Big Cheese                Liz Nixon                          89 Jefferson Way Suite 2         Portland      97201         USA
78         78                      The Cracker Box                 Liu Wong                               55 Grizzly Peak Rd.            Butte      59801         USA
79         79                Toms Spezialit\xe4ten            Karin Josephs                                     Luisenstr. 48       M\xfcnster      44087     Germany
80         80                  Tortuga Restaurante     Miguel Angel Paolino                                  Avda. Azteca 123   M\xe9xico D.F.       5033      Mexico
81         81         Tradi\xe7\xe3o Hipermercados        Anabela Domingues                        Av. In\xeas de Castro, 414     S\xe3o Paulo  05634-030      Brazil
82         82    Trail's Head Gourmet Provisioners           Helvetius Nagy                                 722 DaVinci Blvd.         Kirkland      98034         USA
83         83                         Vaffeljernet              Palle Ibsen                                  Smagsl\xf8get 45         \xc5rhus       8200     Denmark
84         84                 Victuailles en stock             Mary Saveley                                2, rue du Commerce             Lyon      69004      France
85         85            Vins et alcools Chevalier             Paul Henriot                                59 rue de l'Abbaye            Reims      51100      France
86         86                    Die Wandernde Kuh           Rita M\xfcller                                 Adenauerallee 900        Stuttgart      70563     Germany
87         87                       Wartian Herkku         Pirkko Koskitalo                                       Torikatu 38             Oulu      90110     Finland
88         88               Wellington Importadora            Paula Parente                                Rua do Mercado, 12          Resende  08737-363      Brazil
89         89                 White Clover Markets           Karl Jablonski                       305 - 14th Ave. S. Suite 3B          Seattle      98128         USA
90         90                          Wilman Kala          Matti Karttunen                                     Keskuskatu 45         Helsinki      21240     Finland
91         91                               Wolski                  Zbyszek                                   ul. Filtrowa 68            Walla     01-012      Poland
sqldf("select * from north_american_cities")
                  city       country population latitude  longitude
1          Guadalajara        Mexico    1500800 20.65970 -103.34961
2              Toronto        Canada    2795060 43.65323  -79.38318
3              Houston United States    2195914 29.76043  -95.36980
4             New York United States    8405837 40.71278  -74.00594
5         Philadelphia United States    1553165 39.95258  -75.16522
6               Havana          Cuba    2106146 23.05407  -82.34519
7          Mexico City        Mexico    8555500 19.43261  -99.13321
8              Phoenix United States    1513367 33.44838 -112.07404
9          Los Angeles United States    3884307 34.05223 -118.24368
10 Ecatepec de Morelos        Mexico    1742000 19.60184  -99.05067
11            Montreal        Canada    1717767 45.50169  -73.56726
12             Chicago United States    2718782 41.87811  -87.62980

15.4 SQL SELECT statement

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SQL select statement                                                     ####
#
# The select statement is used to retrieve information from a relational
# database table (or in our case, from an R dataframe). The select statement
# has several "clauses". The following are the allowed clauses. The ... after
# each clause name represents additional text that needs to be typed. We will
# learn how to use each of these different clauses below.
# 
# The following are the clauses of a SQL select statement:
#
#     select ...
#     from ...
#     where ...
#     group by ..
#     having ...
#     order by ..
#     limit ...        (NOTE: the limit clause is found in many SQL implementations but is not standard)
#
#
# NOTES: 
#
# 1. All of the clauses are optional, except for the select clause.
#
# 2. The clauses that appear in a particular select statement must be 
#    in the order shown above. 
#
# 3. The limit clause is not a standard part of SQL but is commonly used in many flavors
#    of the SQL language.
#
# 4. According to the ANSI SQL standard, SQL is NOT case sensitive.
#    In other words you can type "SELECT ..." or "select ...". 
#
#    Often, programmers will capitalize the clause names in select statements however,
#    this is not required.
#
#    Some SQL products do make SQL case sensitive. This is especially true
#    when it comes to names of tables and names of columns in the tables.
#
#    The sqldf package that we will be using is case-insensitve (i.e. case does
#    NOT matter) for everything EXCEPT for table names (i.e. dataframe names).
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.5 How to use the R sqldf() function

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The sqldf function accepts a single argument that is expected to 
# be a SQL SELECT command. The table names in the SELECT command are expected to 
# be R dataframes.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# See help info about the sqldf package
help(package="sqldf")

# See help about the sqldf function that is part of the sqldf package
?sqldf

15.6 Intro to the SQL SELECT command

# select clause specifies which columns you'd like to 
# see in the results.

sqldf("select student, test1, test2 
      from grades")
    student test1 test2
1       joe    70    82
2       sue    80    80
3      anne    90    95
4     frank   100    95
5       bob    30    NA
6  samantha   100    70
7     larry    32    80
8      bill    84    90
9  lucricia    80   100
10 margaret    59    47
11    dalia    85    92
12      zvi    90    98
13     mike    90    86
14    david    90    87
15     dana   100    93

15.7 SELECT * FROM …..

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# SELECT * FROM ....
#
# You can type * instead of the column names to get all of the columns.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sqldf("select * from grades")   # display the entire grades table
      id  student year honors test1 test2 test3 major
1  s0100      joe   fr  FALSE    70    82    NA   IDS
2  s0200      sue   fr  FALSE    80    80    79   IDS
3  s0300     anne   fr   TRUE    90    95    92   IDS
4  s0400    frank   so   TRUE   100    95    91   ACC
5  s0500      bob   so  FALSE    30    NA    NA   IDS
6  s0600 samantha   so   TRUE   100    70    97   IDS
7  s0700    larry   ju  FALSE    32    80    NA   FIN
8  s0800     bill   se  FALSE    84    90    92   ACC
9  s0900 lucricia   fr   TRUE    80   100    88   IDS
10 s1000 margaret   se  FALSE    59    47    NA   IDS
11 s1100    dalia   se  FALSE    85    92    93   FIN
12 s1200      zvi   ju   TRUE    90    98    92   ACC
13 s1300     mike   ju   TRUE    90    86    79   IDS
14 s1400    david   se   TRUE    90    87    87   IDS
15 s1500     dana   so  FALSE   100    93    91   FIN

15.8 capturing the output of sqldf in R

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The output of sqldf is another dataframe   ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# In a relational database there is usually a way to 
# capture the output of a select statement in a different table
# however, the actual mechanism is not standard. MANY sql database
# software products use the "into" clause of the select statement
# to do so. I am not going to talk about that here because that
# really depends on the exact sql product you are using.
#
# In R, when using the sqldf function, you can capture the output of sqldf
# in another variable. This simply creates a new R dataframe (which in sqldf
# is analogous to a database table).
#
# Remember though that the actual sqldf() function in R is NOT technically 
# part of the SQL language. It is simply how we can use SQL commands to 
# manipulate R dataframes.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Example: Capture the output of the sql command in a new dataframe

justTest1 = sqldf("select student, test1, test2
                   from grades")

# Here is the original table
sqldf("select * from grades")
      id  student year honors test1 test2 test3 major
1  s0100      joe   fr  FALSE    70    82    NA   IDS
2  s0200      sue   fr  FALSE    80    80    79   IDS
3  s0300     anne   fr   TRUE    90    95    92   IDS
4  s0400    frank   so   TRUE   100    95    91   ACC
5  s0500      bob   so  FALSE    30    NA    NA   IDS
6  s0600 samantha   so   TRUE   100    70    97   IDS
7  s0700    larry   ju  FALSE    32    80    NA   FIN
8  s0800     bill   se  FALSE    84    90    92   ACC
9  s0900 lucricia   fr   TRUE    80   100    88   IDS
10 s1000 margaret   se  FALSE    59    47    NA   IDS
11 s1100    dalia   se  FALSE    85    92    93   FIN
12 s1200      zvi   ju   TRUE    90    98    92   ACC
13 s1300     mike   ju   TRUE    90    86    79   IDS
14 s1400    david   se   TRUE    90    87    87   IDS
15 s1500     dana   so  FALSE   100    93    91   FIN
# Here is the new table
sqldf("select * from justTest1")
    student test1 test2
1       joe    70    82
2       sue    80    80
3      anne    90    95
4     frank   100    95
5       bob    30    NA
6  samantha   100    70
7     larry    32    80
8      bill    84    90
9  lucricia    80   100
10 margaret    59    47
11    dalia    85    92
12      zvi    90    98
13     mike    90    86
14    david    90    87
15     dana   100    93

15.9 case-sensitivity in SQL

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# A note about case-sensitivity ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# NOTE: as mentioned above, sqldf is case-INsensitive regarding parts of the
# command OTHER THAN the table names (i.e. the dataframe names).
# To demonstrate, the following will produce the same results as the 
# command above.

# same results as above
sqldf("SELECT STUDENT, TEST1, TEST2   
      FROM grades")
    student test1 test2
1       joe    70    82
2       sue    80    80
3      anne    90    95
4     frank   100    95
5       bob    30    NA
6  samantha   100    70
7     larry    32    80
8      bill    84    90
9  lucricia    80   100
10 margaret    59    47
11    dalia    85    92
12      zvi    90    98
13     mike    90    86
14    david    90    87
15     dana   100    93
# However, the following command will NOT work since there is no variable
# named GRADES (the name is grades)

# ERROR - should be "grades", not "GRADES"

sqldf("SELECT STUDENT, TEST1, TEST2    
       FROM GRADES")                      # Error: no such table: GRADES
Error: no such table: GRADES

15.10 order by

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ORDER BY                                                     ####
#
# The ORDER BY clause allows you to specify the order that
# the rows should appear in the results.
# Numbers are ordered numerically.
# Character data is ordered alphabetically.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


# Order by test1
sqldf("select student, year, test1, test2
      from grades
      order by test1")
    student year test1 test2
1       bob   so    30    NA
2     larry   ju    32    80
3  margaret   se    59    47
4       joe   fr    70    82
5       sue   fr    80    80
6  lucricia   fr    80   100
7      bill   se    84    90
8     dalia   se    85    92
9      anne   fr    90    95
10      zvi   ju    90    98
11     mike   ju    90    86
12    david   se    90    87
13    frank   so   100    95
14 samantha   so   100    70
15     dana   so   100    93
# Order by student
sqldf("select student, year, test1, test2
      from grades
      order by student")
    student year test1 test2
1      anne   fr    90    95
2      bill   se    84    90
3       bob   so    30    NA
4     dalia   se    85    92
5      dana   so   100    93
6     david   se    90    87
7     frank   so   100    95
8       joe   fr    70    82
9     larry   ju    32    80
10 lucricia   fr    80   100
11 margaret   se    59    47
12     mike   ju    90    86
13 samantha   so   100    70
14      sue   fr    80    80
15      zvi   ju    90    98
# Order by year
sqldf("select student, year, test1, test2
      from grades
      order by year")
    student year test1 test2
1       joe   fr    70    82
2       sue   fr    80    80
3      anne   fr    90    95
4  lucricia   fr    80   100
5     larry   ju    32    80
6       zvi   ju    90    98
7      mike   ju    90    86
8      bill   se    84    90
9  margaret   se    59    47
10    dalia   se    85    92
11    david   se    90    87
12    frank   so   100    95
13      bob   so    30    NA
14 samantha   so   100    70
15     dana   so   100    93
#..........................................................................
# You can specify asc to indicate "ascending" order  (this is the default)
#
# You can specify desc to indicate "descending" order
#..........................................................................

# specify asc for order (i.e. ascending) - this is the default.
# if you dont' specify asc you will get the same results.

sqldf("select student, year, test1, test2
      from grades
      order by test1 asc")
    student year test1 test2
1       bob   so    30    NA
2     larry   ju    32    80
3  margaret   se    59    47
4       joe   fr    70    82
5       sue   fr    80    80
6  lucricia   fr    80   100
7      bill   se    84    90
8     dalia   se    85    92
9      anne   fr    90    95
10      zvi   ju    90    98
11     mike   ju    90    86
12    david   se    90    87
13    frank   so   100    95
14 samantha   so   100    70
15     dana   so   100    93
# reversed - order is descending, ie. desc
sqldf("select student, year, test1, test2
      from grades
      order by test1 desc")
    student year test1 test2
1     frank   so   100    95
2  samantha   so   100    70
3      dana   so   100    93
4      anne   fr    90    95
5       zvi   ju    90    98
6      mike   ju    90    86
7     david   se    90    87
8     dalia   se    85    92
9      bill   se    84    90
10      sue   fr    80    80
11 lucricia   fr    80   100
12      joe   fr    70    82
13 margaret   se    59    47
14    larry   ju    32    80
15      bob   so    30    NA
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# You can specify more than one column in the order by clause
# The 2nd specified column will only have an effect for rows 
# that have the same value in the first specified column.
#
# Notice that there is a separate asc or desc indicator
# for each of the columns in the order by clause
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#.............................
# ordering by a single column
#.............................

# order the data in increasing order of test1 value
# Notice that the year values are NOT in order

sqldf("select student, year, test1
      from grades
      order by test1 asc")
    student year test1
1       bob   so    30
2     larry   ju    32
3  margaret   se    59
4       joe   fr    70
5       sue   fr    80
6  lucricia   fr    80
7      bill   se    84
8     dalia   se    85
9      anne   fr    90
10      zvi   ju    90
11     mike   ju    90
12    david   se    90
13    frank   so   100
14 samantha   so   100
15     dana   so   100
# order the data in increasing order of year value (alphabetical order)
# Notice that the test values are NOT in order

sqldf("select student, year, test1
      from grades
      order by year asc")
    student year test1
1       joe   fr    70
2       sue   fr    80
3      anne   fr    90
4  lucricia   fr    80
5     larry   ju    32
6       zvi   ju    90
7      mike   ju    90
8      bill   se    84
9  margaret   se    59
10    dalia   se    85
11    david   se    90
12    frank   so   100
13      bob   so    30
14 samantha   so   100
15     dana   so   100
#..................................................
# order the data by two different columns
#..................................................

# Order the data alphabetically by year.
# For all of the rows for a particular year, sort the test1 values
# in ascending numeric order.

sqldf("select student, year, test1, test2
      from grades
      order by year asc, test1 asc")
    student year test1 test2
1       joe   fr    70    82
2       sue   fr    80    80
3  lucricia   fr    80   100
4      anne   fr    90    95
5     larry   ju    32    80
6       zvi   ju    90    98
7      mike   ju    90    86
8  margaret   se    59    47
9      bill   se    84    90
10    dalia   se    85    92
11    david   se    90    87
12      bob   so    30    NA
13    frank   so   100    95
14 samantha   so   100    70
15     dana   so   100    93
# Order the data by alphabetically by year.
# For all of the rows for a particular year, sort the test1 values
# in descending numeric order.

sqldf("select student, year, test1, test2
      from grades
      order by year asc, test1 desc")
    student year test1 test2
1      anne   fr    90    95
2       sue   fr    80    80
3  lucricia   fr    80   100
4       joe   fr    70    82
5       zvi   ju    90    98
6      mike   ju    90    86
7     larry   ju    32    80
8     david   se    90    87
9     dalia   se    85    92
10     bill   se    84    90
11 margaret   se    59    47
12    frank   so   100    95
13 samantha   so   100    70
14     dana   so   100    93
15      bob   so    30    NA
# Order the data by test1.
# For all the rows that have the same value for test1 sort the rows by test2


sqldf("select student, test1, test2
      from grades
      order by test1 desc, test2 desc")
    student test1 test2
1     frank   100    95
2      dana   100    93
3  samantha   100    70
4       zvi    90    98
5      anne    90    95
6     david    90    87
7      mike    90    86
8     dalia    85    92
9      bill    84    90
10 lucricia    80   100
11      sue    80    80
12      joe    70    82
13 margaret    59    47
14    larry    32    80
15      bob    30    NA
# Show the rows in descending order of the test1 grades.
# If several students got the same grade on test1 then show 
# those rows in alphabetical order of the student names.

sqldf("select student, test1, test2
      from grades
      order by test1 desc, student asc")
    student test1 test2
1      dana   100    93
2     frank   100    95
3  samantha   100    70
4      anne    90    95
5     david    90    87
6      mike    90    86
7       zvi    90    98
8     dalia    85    92
9      bill    84    90
10 lucricia    80   100
11      sue    80    80
12      joe    70    82
13 margaret    59    47
14    larry    32    80
15      bob    30    NA
#..................................................
# order the data by three different columns
#..................................................

# Order all of the rows by year 
#
# among the rows that have the same year
# order within those rows by test1
#
# among the rows that have the same year and test1 values
# order within those rows by test2

sqldf("select student, year, test1, test2
      from grades
      order by year, test1 desc, test2 desc")
    student year test1 test2
1      anne   fr    90    95
2  lucricia   fr    80   100
3       sue   fr    80    80
4       joe   fr    70    82
5       zvi   ju    90    98
6      mike   ju    90    86
7     larry   ju    32    80
8     david   se    90    87
9     dalia   se    85    92
10     bill   se    84    90
11 margaret   se    59    47
12    frank   so   100    95
13     dana   so   100    93
14 samantha   so   100    70
15      bob   so    30    NA

15.11 limit clause

###############################################################.
# LIMIT <M>       and     LIMIT <M> OFFSET <N>
###############################################################.
#
# The LIMIT clause must come at the very end of the SQL 
# statement. You can use it to display only some of the 
# rows that would otherwise have been displayed.
###############################################################.

For example:

# Display the first 3 students
sqldf("SELECT *
      FROM grades
      ORDER BY student
      LIMIT 3")
     id student year honors test1 test2 test3 major
1 s0300    anne   fr   TRUE    90    95    92   IDS
2 s0800    bill   se  FALSE    84    90    92   ACC
3 s0500     bob   so  FALSE    30    NA    NA   IDS
###############################################################.
# In general the limit clause has two different forms: 
#
#      <SOME SELECT STATEMENT>
#      limit M
#
# (where M is an integer) results in only the first
# M rows of data from what would normally have been displayed 
# had the LIMIT clause not been specified.
# 
#      <SOME SELECT STATEMENT>
#      limit M offset N
#
# (where both M and N are integers) 
# starts the output from the N+1'th row of what would normally 
# have been displayed without the limit clause and then 
# displays the next M rows of data.
#
# LIMIT is often used in combination with the ORDER BY clause.
###############################################################.

For example:

# Display the first 3 students
sqldf("SELECT *
      FROM grades
      ORDER BY student
      LIMIT 3")
     id student year honors test1 test2 test3 major
1 s0300    anne   fr   TRUE    90    95    92   IDS
2 s0800    bill   se  FALSE    84    90    92   ACC
3 s0500     bob   so  FALSE    30    NA    NA   IDS
# Display the next 3 students
sqldf("SELECT *
      FROM grades
      ORDER BY student
      LIMIT 3 OFFSET 3")
     id student year honors test1 test2 test3 major
1 s1100   dalia   se  FALSE    85    92    93   FIN
2 s1500    dana   so  FALSE   100    93    91   FIN
3 s1400   david   se   TRUE    90    87    87   IDS
# Display the next 3 students
sqldf("SELECT *
      FROM grades
      ORDER BY student
      LIMIT 3 OFFSET 6")
     id student year honors test1 test2 test3 major
1 s0400   frank   so   TRUE   100    95    91   ACC
2 s0100     joe   fr  FALSE    70    82    NA   IDS
3 s0700   larry   ju  FALSE    32    80    NA   FIN

15.12 Using calculations in the select clause

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Using calculations in the select clause    ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# You can use math or SQL functions to modify the values
# that are returned. We will discuss SQL functions later.
# For now, let's demonstrate the concept using math.
# See the questions and answers below:
#/////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Display the student, test1 and test2 columns from the grades table.
# In addition, display a column named that contains
# the values from the test2 column with 10 points added to each number.
#/////////////////////////////////////////////////////////////////////////

# ANSWER

sqldf("select student, test1, test2, test2 + 10 
      from grades")
    student test1 test2 test2 + 10
1       joe    70    82         92
2       sue    80    80         90
3      anne    90    95        105
4     frank   100    95        105
5       bob    30    NA         NA
6  samantha   100    70         80
7     larry    32    80         90
8      bill    84    90        100
9  lucricia    80   100        110
10 margaret    59    47         57
11    dalia    85    92        102
12      zvi    90    98        108
13     mike    90    86         96
14    david    90    87         97
15     dana   100    93        103

15.13 Renaming columns

#/////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Answer the same question, but this time change the name of the new
# column to "test2_curved".
#/////////////////////////////////////////////////////////////////////////

# ANSWER

# Use AS to change the name of the column in the output

sqldf("select student, test1, test2, test2 + 10 as test2_curved
      from grades")
    student test1 test2 test2_curved
1       joe    70    82           92
2       sue    80    80           90
3      anne    90    95          105
4     frank   100    95          105
5       bob    30    NA           NA
6  samantha   100    70           80
7     larry    32    80           90
8      bill    84    90          100
9  lucricia    80   100          110
10 margaret    59    47           57
11    dalia    85    92          102
12      zvi    90    98          108
13     mike    90    86           96
14    david    90    87           97
15     dana   100    93          103
# You don't actually need to write "AS" 

sqldf("select student, test1, test2, test2 + 10 test2_curved
      from grades")
    student test1 test2 test2_curved
1       joe    70    82           92
2       sue    80    80           90
3      anne    90    95          105
4     frank   100    95          105
5       bob    30    NA           NA
6  samantha   100    70           80
7     larry    32    80           90
8      bill    84    90          100
9  lucricia    80   100          110
10 margaret    59    47           57
11    dalia    85    92          102
12      zvi    90    98          108
13     mike    90    86           96
14    david    90    87           97
15     dana   100    93          103

15.14 ‘Quoting’ column names that contain spaces (or other special characters)

In general, column names and table names should only contain letters and numbers. If the name of a column or a table needs to contain spaces or other “special characters” (e.g. !@#$%^&*()_+~~ etc.) you need to ‘quote’ the name using ‘apostrophes’ or “quotes”. See the following example which contains a space in the new column name.

# You must 'quote column names that contain spaces' 

sqldf("select student, test1, test2, test2 + 10 'test2 curved'
      from grades")
    student test1 test2 test2 curved
1       joe    70    82           92
2       sue    80    80           90
3      anne    90    95          105
4     frank   100    95          105
5       bob    30    NA           NA
6  samantha   100    70           80
7     larry    32    80           90
8      bill    84    90          100
9  lucricia    80   100          110
10 margaret    59    47           57
11    dalia    85    92          102
12      zvi    90    98          108
13     mike    90    86           96
14    david    90    87           97
15     dana   100    93          103

Note that in the SQLite dialect of SQL (which is what is being used in R with sqldf) you can use ‘apostrophes’ in place of quotes. We used ‘apostrophes’ instead of “quotes” to quote the new column name to avoid issues that would arise if we used “quotes” since we are already using “quotes” to surround the entire SQL statement. See the next section for more info.

15.15 ‘single quotes’ vs “double quotes”

In many areas of technology, ‘apostrophes’ are used to ‘quote’ text instead of “quotation marks”. Apostrophes are used in this way, are referred to as ‘single quotes’ whereas quotation marks are referred to as “double quotes” (since an apostophe has one short line while a single quotation mark is comprised of two short lines).

In SQL, there are two different situations where something needs to be quoted. One situation is as shown above to create column headings that contain spaces or special characters. Another situation is where you want to include some literal text (see the description of “concatenation” below).

In this presentation we are using ‘single quotes’ for most purposes inside our SQL statements. This is because SQLite (i.e. the dialect of SQL we are using here) allows for the use of either ‘single quotes’ or “double quotes” wherever you need to quote something. Using ‘single quotes’ allows us to sidestep issues that would arise with “double quotes”. Using double quotes inside the SQL statement would interfere with the “double quotes” that we are using to “quote” the entire sql statement in our call to sqldf(” … “). Other SQL envirnoments that do not rely on sqldf() do not have this issue.

It should be noted though that the ANSI SQL standard, calls for using double quotes for column names (as in the example above) and for using single quotes to quote textual values (as in the concatenation example below).

(see https://www.googlecloudcommunity.com/gc/Technical-Tips-Tricks/In-SQL-do-I-use-double-quotes-or-single-quotes-for-string/ta-p/584590)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 'single quotes' (ie. 'apostrophes')   vs     "double quotes" (ie. "quotes")  ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Explicit textual values need to be 'quoted'.
#
# Standard SQL uses single quotes for quoting values.
#
# Many sql software packages allow for "double quotes" however, that is not 
# officially part of the standard.
#
# The sqlite dialect of SQL (which is the default dialect used by sqldf)
# DOES recognize "double quotes" too. However, we will try to stick to 
# 'single quotes' in order to adhere to the standard SQL notation.
#
# The following example combines the id and the student name into a single
# value with a dash (i.e. - ) between them. Notice the 'single quotes' around
# the dash.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.16 Concatenation: ‘abc’ || ‘def’ is ‘abcdef’

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# To "concatenate" two values means to "paste" them together 
# into a single character value.
# ANSI SQL defines the  || as the "concatenation" operator.
# This is similar to the & operator in Excel.
#
# Sqlite (which is used by R's sqldf package) uses the ANSI standard || operator.
#
# See the examples below.
#
# NOTE: Some other database software products use non-standard 
#       operators for concatenation. For example:
#
#   * MySql does not have a concatenation operator. MySql uses || for logical or. 
#     Rather, it uses the function, concat(a, b), to concatenate a with b.
#
#   * Microsoft SQL Server uses + for concatenation.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#///////////////////////////////////////////////////////////////
# QUESTION
#
# Show a column named id_student that contains the values from the 
# the id and student columns concatenated together. Also show the 
# test1, test2 and test3 columns.
#///////////////////////////////////////////////////////////////

# ANSWER 

sqldf("select id || student as id_student, test1, test2, test3
      from grades")
      id_student test1 test2 test3
1       s0100joe    70    82    NA
2       s0200sue    80    80    79
3      s0300anne    90    95    92
4     s0400frank   100    95    91
5       s0500bob    30    NA    NA
6  s0600samantha   100    70    97
7     s0700larry    32    80    NA
8      s0800bill    84    90    92
9  s0900lucricia    80   100    88
10 s1000margaret    59    47    NA
11    s1100dalia    85    92    93
12      s1200zvi    90    98    92
13     s1300mike    90    86    79
14    s1400david    90    87    87
15     s1500dana   100    93    91
#///////////////////////////////////////////////////////////////
# QUESTION
#
# Modify the answer to the previous question so that the
# "id_student" column contains a dash between the 
# id and the name of the student.
#///////////////////////////////////////////////////////////////

# ANSWER:
# 
# You need to use 'quotes' around the '-' dash.
# First concatenate the id with '-' (don't forget the 'single quotes').
# Then concatenate the student's name.

sqldf("select id || '-' || student as id_student, test1, test2, test3
      from grades")
       id_student test1 test2 test3
1       s0100-joe    70    82    NA
2       s0200-sue    80    80    79
3      s0300-anne    90    95    92
4     s0400-frank   100    95    91
5       s0500-bob    30    NA    NA
6  s0600-samantha   100    70    97
7     s0700-larry    32    80    NA
8      s0800-bill    84    90    92
9  s0900-lucricia    80   100    88
10 s1000-margaret    59    47    NA
11    s1100-dalia    85    92    93
12      s1200-zvi    90    98    92
13     s1300-mike    90    86    79
14    s1400-david    90    87    87
15     s1500-dana   100    93    91

15.17 Rules for naming tables and columns

NOTE:

  • “IDS1020-Intro to IDS” students can ignore this section.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Rules for naming tables and columns:
#
# What happens if you use non-standard characters in the names.
#
#   - ANSI standard SQL - use "double quotes" for non-standard column or
#     table names ( NOT 'single quotes' )
#
#   - some other flavors of SQL may use other symbols for this purpose.
#     For example:
#       o Microsoft Access uses [square brackets] and 
#       o MySql uses `grave accent` characters (AKA `backticks`)
#         (https://en.wikipedia.org/wiki/Grave_accent)
#     
#   - sqlite (i.e. the default version of SQL for sqldf) allows all of the 
#     above, i.e. "double quotes", `back ticks` and [square brackets]
#     to surround non-standard names. However, it is best to use 
#     "double quotes" as that is the ANSI standard.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Every version of SQL has its own rules for which characters are allowed
# to be used in table names and column names. To be safe that your code will work
# on any version of SQL the best bet is to stick to the following rules for 
# both table names and for column names:
# 
#   1. only use letters, numbers and underscores
#   2. start the name with a letter
#
# However, many SQL versions allow for non-standard characters to be included
# in a table name or column name. However, if you use a non-standard character
# then you must 'quote' the name of the table or name of the column
# in SQL commands using the quoting rules provided by the version of SQL
# you are using. When using sqldf, if you use a non-standard character 
# (e.g. a period, a space, etc) then you should surround the name of the 
# table or column with "double quotes" (`backticks` and [square brackets] also
# work but "double quotes" are preferred as that is the ANSI standard).
# You cannot use 'single quotes' for this purpose.
#
# To demonstrate we can use the iris dataframe that is built into R.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.17.1 WARNING: In SQLite, you must use “double quotes” for non standard table or column names.

NOTE:

  • “IDS1020-Intro to IDS” students can ignore this section.

  • “IDS2460-Data Management” you should be aware of the following issue.

This doesn’t come up too often but if you are using sqldf to work with R dataframes you should be aware of the following issue. R dataframes often use periods in the column names. This is non-standard for SQL.

# Show the first 10 rows of the iris dataframe that comes built into R.
head(iris,10)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
#//////////////////////////////////////////////////////////////////////////
# QUESTION
#
# Use sqldf to display just the Sepal.Length, Sepal.Width and Species columns 
# from the "iris" dataframe.
#//////////////////////////////////////////////////////////////////////////


# ANSWER
#
# This dataframe has column names that contain periods. A period
# is NOT a standard character for table and column names. Therefore
# we must use "double quotes" or `back ticks` to quote these column names.
#
# This does NOT work because of the periods in the column names

# sqldf("select Sepal.Length, Sepal.Width, Species 
#         from iris")                              # ERROR: no such column Sepal.Length

# This works - i.e. use "double quotes". Note that we used 
# 'single quotes' to quote the entire select statement.
sqldf('select "Sepal.Length", "Sepal.Width", Species from iris')
    Sepal.Length Sepal.Width    Species
1            5.1         3.5     setosa
2            4.9         3.0     setosa
3            4.7         3.2     setosa
4            4.6         3.1     setosa
5            5.0         3.6     setosa
6            5.4         3.9     setosa
7            4.6         3.4     setosa
8            5.0         3.4     setosa
9            4.4         2.9     setosa
10           4.9         3.1     setosa
11           5.4         3.7     setosa
12           4.8         3.4     setosa
13           4.8         3.0     setosa
14           4.3         3.0     setosa
15           5.8         4.0     setosa
16           5.7         4.4     setosa
17           5.4         3.9     setosa
18           5.1         3.5     setosa
19           5.7         3.8     setosa
20           5.1         3.8     setosa
21           5.4         3.4     setosa
22           5.1         3.7     setosa
23           4.6         3.6     setosa
24           5.1         3.3     setosa
25           4.8         3.4     setosa
26           5.0         3.0     setosa
27           5.0         3.4     setosa
28           5.2         3.5     setosa
29           5.2         3.4     setosa
30           4.7         3.2     setosa
31           4.8         3.1     setosa
32           5.4         3.4     setosa
33           5.2         4.1     setosa
34           5.5         4.2     setosa
35           4.9         3.1     setosa
36           5.0         3.2     setosa
37           5.5         3.5     setosa
38           4.9         3.6     setosa
39           4.4         3.0     setosa
40           5.1         3.4     setosa
41           5.0         3.5     setosa
42           4.5         2.3     setosa
43           4.4         3.2     setosa
44           5.0         3.5     setosa
45           5.1         3.8     setosa
46           4.8         3.0     setosa
47           5.1         3.8     setosa
48           4.6         3.2     setosa
49           5.3         3.7     setosa
50           5.0         3.3     setosa
51           7.0         3.2 versicolor
52           6.4         3.2 versicolor
53           6.9         3.1 versicolor
54           5.5         2.3 versicolor
55           6.5         2.8 versicolor
56           5.7         2.8 versicolor
57           6.3         3.3 versicolor
58           4.9         2.4 versicolor
59           6.6         2.9 versicolor
60           5.2         2.7 versicolor
61           5.0         2.0 versicolor
62           5.9         3.0 versicolor
63           6.0         2.2 versicolor
64           6.1         2.9 versicolor
65           5.6         2.9 versicolor
66           6.7         3.1 versicolor
67           5.6         3.0 versicolor
68           5.8         2.7 versicolor
69           6.2         2.2 versicolor
70           5.6         2.5 versicolor
71           5.9         3.2 versicolor
72           6.1         2.8 versicolor
73           6.3         2.5 versicolor
74           6.1         2.8 versicolor
75           6.4         2.9 versicolor
76           6.6         3.0 versicolor
77           6.8         2.8 versicolor
78           6.7         3.0 versicolor
79           6.0         2.9 versicolor
80           5.7         2.6 versicolor
81           5.5         2.4 versicolor
82           5.5         2.4 versicolor
83           5.8         2.7 versicolor
84           6.0         2.7 versicolor
85           5.4         3.0 versicolor
86           6.0         3.4 versicolor
87           6.7         3.1 versicolor
88           6.3         2.3 versicolor
89           5.6         3.0 versicolor
90           5.5         2.5 versicolor
91           5.5         2.6 versicolor
92           6.1         3.0 versicolor
93           5.8         2.6 versicolor
94           5.0         2.3 versicolor
95           5.6         2.7 versicolor
96           5.7         3.0 versicolor
97           5.7         2.9 versicolor
98           6.2         2.9 versicolor
99           5.1         2.5 versicolor
100          5.7         2.8 versicolor
101          6.3         3.3  virginica
102          5.8         2.7  virginica
103          7.1         3.0  virginica
104          6.3         2.9  virginica
105          6.5         3.0  virginica
106          7.6         3.0  virginica
107          4.9         2.5  virginica
108          7.3         2.9  virginica
109          6.7         2.5  virginica
110          7.2         3.6  virginica
111          6.5         3.2  virginica
112          6.4         2.7  virginica
113          6.8         3.0  virginica
114          5.7         2.5  virginica
115          5.8         2.8  virginica
116          6.4         3.2  virginica
117          6.5         3.0  virginica
118          7.7         3.8  virginica
119          7.7         2.6  virginica
120          6.0         2.2  virginica
121          6.9         3.2  virginica
122          5.6         2.8  virginica
123          7.7         2.8  virginica
124          6.3         2.7  virginica
125          6.7         3.3  virginica
126          7.2         3.2  virginica
127          6.2         2.8  virginica
128          6.1         3.0  virginica
129          6.4         2.8  virginica
130          7.2         3.0  virginica
131          7.4         2.8  virginica
132          7.9         3.8  virginica
133          6.4         2.8  virginica
134          6.3         2.8  virginica
135          6.1         2.6  virginica
136          7.7         3.0  virginica
137          6.3         3.4  virginica
138          6.4         3.1  virginica
139          6.0         3.0  virginica
140          6.9         3.1  virginica
141          6.7         3.1  virginica
142          6.9         3.1  virginica
143          5.8         2.7  virginica
144          6.8         3.2  virginica
145          6.7         3.3  virginica
146          6.7         3.0  virginica
147          6.3         2.5  virginica
148          6.5         3.0  virginica
149          6.2         3.4  virginica
150          5.9         3.0  virginica
# Same thing but we used "double quotes" to quote the entire select statement.
# Therefore we used \" for every double quote that is inside the command.
sqldf("select \"Sepal.Length\", \"Sepal.Width\", Species from iris")
    Sepal.Length Sepal.Width    Species
1            5.1         3.5     setosa
2            4.9         3.0     setosa
3            4.7         3.2     setosa
4            4.6         3.1     setosa
5            5.0         3.6     setosa
6            5.4         3.9     setosa
7            4.6         3.4     setosa
8            5.0         3.4     setosa
9            4.4         2.9     setosa
10           4.9         3.1     setosa
11           5.4         3.7     setosa
12           4.8         3.4     setosa
13           4.8         3.0     setosa
14           4.3         3.0     setosa
15           5.8         4.0     setosa
16           5.7         4.4     setosa
17           5.4         3.9     setosa
18           5.1         3.5     setosa
19           5.7         3.8     setosa
20           5.1         3.8     setosa
21           5.4         3.4     setosa
22           5.1         3.7     setosa
23           4.6         3.6     setosa
24           5.1         3.3     setosa
25           4.8         3.4     setosa
26           5.0         3.0     setosa
27           5.0         3.4     setosa
28           5.2         3.5     setosa
29           5.2         3.4     setosa
30           4.7         3.2     setosa
31           4.8         3.1     setosa
32           5.4         3.4     setosa
33           5.2         4.1     setosa
34           5.5         4.2     setosa
35           4.9         3.1     setosa
36           5.0         3.2     setosa
37           5.5         3.5     setosa
38           4.9         3.6     setosa
39           4.4         3.0     setosa
40           5.1         3.4     setosa
41           5.0         3.5     setosa
42           4.5         2.3     setosa
43           4.4         3.2     setosa
44           5.0         3.5     setosa
45           5.1         3.8     setosa
46           4.8         3.0     setosa
47           5.1         3.8     setosa
48           4.6         3.2     setosa
49           5.3         3.7     setosa
50           5.0         3.3     setosa
51           7.0         3.2 versicolor
52           6.4         3.2 versicolor
53           6.9         3.1 versicolor
54           5.5         2.3 versicolor
55           6.5         2.8 versicolor
56           5.7         2.8 versicolor
57           6.3         3.3 versicolor
58           4.9         2.4 versicolor
59           6.6         2.9 versicolor
60           5.2         2.7 versicolor
61           5.0         2.0 versicolor
62           5.9         3.0 versicolor
63           6.0         2.2 versicolor
64           6.1         2.9 versicolor
65           5.6         2.9 versicolor
66           6.7         3.1 versicolor
67           5.6         3.0 versicolor
68           5.8         2.7 versicolor
69           6.2         2.2 versicolor
70           5.6         2.5 versicolor
71           5.9         3.2 versicolor
72           6.1         2.8 versicolor
73           6.3         2.5 versicolor
74           6.1         2.8 versicolor
75           6.4         2.9 versicolor
76           6.6         3.0 versicolor
77           6.8         2.8 versicolor
78           6.7         3.0 versicolor
79           6.0         2.9 versicolor
80           5.7         2.6 versicolor
81           5.5         2.4 versicolor
82           5.5         2.4 versicolor
83           5.8         2.7 versicolor
84           6.0         2.7 versicolor
85           5.4         3.0 versicolor
86           6.0         3.4 versicolor
87           6.7         3.1 versicolor
88           6.3         2.3 versicolor
89           5.6         3.0 versicolor
90           5.5         2.5 versicolor
91           5.5         2.6 versicolor
92           6.1         3.0 versicolor
93           5.8         2.6 versicolor
94           5.0         2.3 versicolor
95           5.6         2.7 versicolor
96           5.7         3.0 versicolor
97           5.7         2.9 versicolor
98           6.2         2.9 versicolor
99           5.1         2.5 versicolor
100          5.7         2.8 versicolor
101          6.3         3.3  virginica
102          5.8         2.7  virginica
103          7.1         3.0  virginica
104          6.3         2.9  virginica
105          6.5         3.0  virginica
106          7.6         3.0  virginica
107          4.9         2.5  virginica
108          7.3         2.9  virginica
109          6.7         2.5  virginica
110          7.2         3.6  virginica
111          6.5         3.2  virginica
112          6.4         2.7  virginica
113          6.8         3.0  virginica
114          5.7         2.5  virginica
115          5.8         2.8  virginica
116          6.4         3.2  virginica
117          6.5         3.0  virginica
118          7.7         3.8  virginica
119          7.7         2.6  virginica
120          6.0         2.2  virginica
121          6.9         3.2  virginica
122          5.6         2.8  virginica
123          7.7         2.8  virginica
124          6.3         2.7  virginica
125          6.7         3.3  virginica
126          7.2         3.2  virginica
127          6.2         2.8  virginica
128          6.1         3.0  virginica
129          6.4         2.8  virginica
130          7.2         3.0  virginica
131          7.4         2.8  virginica
132          7.9         3.8  virginica
133          6.4         2.8  virginica
134          6.3         2.8  virginica
135          6.1         2.6  virginica
136          7.7         3.0  virginica
137          6.3         3.4  virginica
138          6.4         3.1  virginica
139          6.0         3.0  virginica
140          6.9         3.1  virginica
141          6.7         3.1  virginica
142          6.9         3.1  virginica
143          5.8         2.7  virginica
144          6.8         3.2  virginica
145          6.7         3.3  virginica
146          6.7         3.0  virginica
147          6.3         2.5  virginica
148          6.5         3.0  virginica
149          6.2         3.4  virginica
150          5.9         3.0  virginica
# This also works - use `back ticks`
sqldf('select `Sepal.Length`, `Sepal.Width`, Species from iris')
    Sepal.Length Sepal.Width    Species
1            5.1         3.5     setosa
2            4.9         3.0     setosa
3            4.7         3.2     setosa
4            4.6         3.1     setosa
5            5.0         3.6     setosa
6            5.4         3.9     setosa
7            4.6         3.4     setosa
8            5.0         3.4     setosa
9            4.4         2.9     setosa
10           4.9         3.1     setosa
11           5.4         3.7     setosa
12           4.8         3.4     setosa
13           4.8         3.0     setosa
14           4.3         3.0     setosa
15           5.8         4.0     setosa
16           5.7         4.4     setosa
17           5.4         3.9     setosa
18           5.1         3.5     setosa
19           5.7         3.8     setosa
20           5.1         3.8     setosa
21           5.4         3.4     setosa
22           5.1         3.7     setosa
23           4.6         3.6     setosa
24           5.1         3.3     setosa
25           4.8         3.4     setosa
26           5.0         3.0     setosa
27           5.0         3.4     setosa
28           5.2         3.5     setosa
29           5.2         3.4     setosa
30           4.7         3.2     setosa
31           4.8         3.1     setosa
32           5.4         3.4     setosa
33           5.2         4.1     setosa
34           5.5         4.2     setosa
35           4.9         3.1     setosa
36           5.0         3.2     setosa
37           5.5         3.5     setosa
38           4.9         3.6     setosa
39           4.4         3.0     setosa
40           5.1         3.4     setosa
41           5.0         3.5     setosa
42           4.5         2.3     setosa
43           4.4         3.2     setosa
44           5.0         3.5     setosa
45           5.1         3.8     setosa
46           4.8         3.0     setosa
47           5.1         3.8     setosa
48           4.6         3.2     setosa
49           5.3         3.7     setosa
50           5.0         3.3     setosa
51           7.0         3.2 versicolor
52           6.4         3.2 versicolor
53           6.9         3.1 versicolor
54           5.5         2.3 versicolor
55           6.5         2.8 versicolor
56           5.7         2.8 versicolor
57           6.3         3.3 versicolor
58           4.9         2.4 versicolor
59           6.6         2.9 versicolor
60           5.2         2.7 versicolor
61           5.0         2.0 versicolor
62           5.9         3.0 versicolor
63           6.0         2.2 versicolor
64           6.1         2.9 versicolor
65           5.6         2.9 versicolor
66           6.7         3.1 versicolor
67           5.6         3.0 versicolor
68           5.8         2.7 versicolor
69           6.2         2.2 versicolor
70           5.6         2.5 versicolor
71           5.9         3.2 versicolor
72           6.1         2.8 versicolor
73           6.3         2.5 versicolor
74           6.1         2.8 versicolor
75           6.4         2.9 versicolor
76           6.6         3.0 versicolor
77           6.8         2.8 versicolor
78           6.7         3.0 versicolor
79           6.0         2.9 versicolor
80           5.7         2.6 versicolor
81           5.5         2.4 versicolor
82           5.5         2.4 versicolor
83           5.8         2.7 versicolor
84           6.0         2.7 versicolor
85           5.4         3.0 versicolor
86           6.0         3.4 versicolor
87           6.7         3.1 versicolor
88           6.3         2.3 versicolor
89           5.6         3.0 versicolor
90           5.5         2.5 versicolor
91           5.5         2.6 versicolor
92           6.1         3.0 versicolor
93           5.8         2.6 versicolor
94           5.0         2.3 versicolor
95           5.6         2.7 versicolor
96           5.7         3.0 versicolor
97           5.7         2.9 versicolor
98           6.2         2.9 versicolor
99           5.1         2.5 versicolor
100          5.7         2.8 versicolor
101          6.3         3.3  virginica
102          5.8         2.7  virginica
103          7.1         3.0  virginica
104          6.3         2.9  virginica
105          6.5         3.0  virginica
106          7.6         3.0  virginica
107          4.9         2.5  virginica
108          7.3         2.9  virginica
109          6.7         2.5  virginica
110          7.2         3.6  virginica
111          6.5         3.2  virginica
112          6.4         2.7  virginica
113          6.8         3.0  virginica
114          5.7         2.5  virginica
115          5.8         2.8  virginica
116          6.4         3.2  virginica
117          6.5         3.0  virginica
118          7.7         3.8  virginica
119          7.7         2.6  virginica
120          6.0         2.2  virginica
121          6.9         3.2  virginica
122          5.6         2.8  virginica
123          7.7         2.8  virginica
124          6.3         2.7  virginica
125          6.7         3.3  virginica
126          7.2         3.2  virginica
127          6.2         2.8  virginica
128          6.1         3.0  virginica
129          6.4         2.8  virginica
130          7.2         3.0  virginica
131          7.4         2.8  virginica
132          7.9         3.8  virginica
133          6.4         2.8  virginica
134          6.3         2.8  virginica
135          6.1         2.6  virginica
136          7.7         3.0  virginica
137          6.3         3.4  virginica
138          6.4         3.1  virginica
139          6.0         3.0  virginica
140          6.9         3.1  virginica
141          6.7         3.1  virginica
142          6.9         3.1  virginica
143          5.8         2.7  virginica
144          6.8         3.2  virginica
145          6.7         3.3  virginica
146          6.7         3.0  virginica
147          6.3         2.5  virginica
148          6.5         3.0  virginica
149          6.2         3.4  virginica
150          5.9         3.0  virginica
# *** WARNING!!! ***
#
# BE CAREFUL - don't use 'single quotes' here. Single quotes will NOT
# work to quote column names or table names.
#
# The following will simply display the words 'Sepal.Length' and 'Sepal.Width'
# for every row in the output.
sqldf("select 'Sepal.Length', 'Sepal.Width', Species from iris")
    'Sepal.Length' 'Sepal.Width'    Species
1     Sepal.Length   Sepal.Width     setosa
2     Sepal.Length   Sepal.Width     setosa
3     Sepal.Length   Sepal.Width     setosa
4     Sepal.Length   Sepal.Width     setosa
5     Sepal.Length   Sepal.Width     setosa
6     Sepal.Length   Sepal.Width     setosa
7     Sepal.Length   Sepal.Width     setosa
8     Sepal.Length   Sepal.Width     setosa
9     Sepal.Length   Sepal.Width     setosa
10    Sepal.Length   Sepal.Width     setosa
11    Sepal.Length   Sepal.Width     setosa
12    Sepal.Length   Sepal.Width     setosa
13    Sepal.Length   Sepal.Width     setosa
14    Sepal.Length   Sepal.Width     setosa
15    Sepal.Length   Sepal.Width     setosa
16    Sepal.Length   Sepal.Width     setosa
17    Sepal.Length   Sepal.Width     setosa
18    Sepal.Length   Sepal.Width     setosa
19    Sepal.Length   Sepal.Width     setosa
20    Sepal.Length   Sepal.Width     setosa
21    Sepal.Length   Sepal.Width     setosa
22    Sepal.Length   Sepal.Width     setosa
23    Sepal.Length   Sepal.Width     setosa
24    Sepal.Length   Sepal.Width     setosa
25    Sepal.Length   Sepal.Width     setosa
26    Sepal.Length   Sepal.Width     setosa
27    Sepal.Length   Sepal.Width     setosa
28    Sepal.Length   Sepal.Width     setosa
29    Sepal.Length   Sepal.Width     setosa
30    Sepal.Length   Sepal.Width     setosa
31    Sepal.Length   Sepal.Width     setosa
32    Sepal.Length   Sepal.Width     setosa
33    Sepal.Length   Sepal.Width     setosa
34    Sepal.Length   Sepal.Width     setosa
35    Sepal.Length   Sepal.Width     setosa
36    Sepal.Length   Sepal.Width     setosa
37    Sepal.Length   Sepal.Width     setosa
38    Sepal.Length   Sepal.Width     setosa
39    Sepal.Length   Sepal.Width     setosa
40    Sepal.Length   Sepal.Width     setosa
41    Sepal.Length   Sepal.Width     setosa
42    Sepal.Length   Sepal.Width     setosa
43    Sepal.Length   Sepal.Width     setosa
44    Sepal.Length   Sepal.Width     setosa
45    Sepal.Length   Sepal.Width     setosa
46    Sepal.Length   Sepal.Width     setosa
47    Sepal.Length   Sepal.Width     setosa
48    Sepal.Length   Sepal.Width     setosa
49    Sepal.Length   Sepal.Width     setosa
50    Sepal.Length   Sepal.Width     setosa
51    Sepal.Length   Sepal.Width versicolor
52    Sepal.Length   Sepal.Width versicolor
53    Sepal.Length   Sepal.Width versicolor
54    Sepal.Length   Sepal.Width versicolor
55    Sepal.Length   Sepal.Width versicolor
56    Sepal.Length   Sepal.Width versicolor
57    Sepal.Length   Sepal.Width versicolor
58    Sepal.Length   Sepal.Width versicolor
59    Sepal.Length   Sepal.Width versicolor
60    Sepal.Length   Sepal.Width versicolor
61    Sepal.Length   Sepal.Width versicolor
62    Sepal.Length   Sepal.Width versicolor
63    Sepal.Length   Sepal.Width versicolor
64    Sepal.Length   Sepal.Width versicolor
65    Sepal.Length   Sepal.Width versicolor
66    Sepal.Length   Sepal.Width versicolor
67    Sepal.Length   Sepal.Width versicolor
68    Sepal.Length   Sepal.Width versicolor
69    Sepal.Length   Sepal.Width versicolor
70    Sepal.Length   Sepal.Width versicolor
71    Sepal.Length   Sepal.Width versicolor
72    Sepal.Length   Sepal.Width versicolor
73    Sepal.Length   Sepal.Width versicolor
74    Sepal.Length   Sepal.Width versicolor
75    Sepal.Length   Sepal.Width versicolor
76    Sepal.Length   Sepal.Width versicolor
77    Sepal.Length   Sepal.Width versicolor
78    Sepal.Length   Sepal.Width versicolor
79    Sepal.Length   Sepal.Width versicolor
80    Sepal.Length   Sepal.Width versicolor
81    Sepal.Length   Sepal.Width versicolor
82    Sepal.Length   Sepal.Width versicolor
83    Sepal.Length   Sepal.Width versicolor
84    Sepal.Length   Sepal.Width versicolor
85    Sepal.Length   Sepal.Width versicolor
86    Sepal.Length   Sepal.Width versicolor
87    Sepal.Length   Sepal.Width versicolor
88    Sepal.Length   Sepal.Width versicolor
89    Sepal.Length   Sepal.Width versicolor
90    Sepal.Length   Sepal.Width versicolor
91    Sepal.Length   Sepal.Width versicolor
92    Sepal.Length   Sepal.Width versicolor
93    Sepal.Length   Sepal.Width versicolor
94    Sepal.Length   Sepal.Width versicolor
95    Sepal.Length   Sepal.Width versicolor
96    Sepal.Length   Sepal.Width versicolor
97    Sepal.Length   Sepal.Width versicolor
98    Sepal.Length   Sepal.Width versicolor
99    Sepal.Length   Sepal.Width versicolor
100   Sepal.Length   Sepal.Width versicolor
101   Sepal.Length   Sepal.Width  virginica
102   Sepal.Length   Sepal.Width  virginica
103   Sepal.Length   Sepal.Width  virginica
104   Sepal.Length   Sepal.Width  virginica
105   Sepal.Length   Sepal.Width  virginica
106   Sepal.Length   Sepal.Width  virginica
107   Sepal.Length   Sepal.Width  virginica
108   Sepal.Length   Sepal.Width  virginica
109   Sepal.Length   Sepal.Width  virginica
110   Sepal.Length   Sepal.Width  virginica
111   Sepal.Length   Sepal.Width  virginica
112   Sepal.Length   Sepal.Width  virginica
113   Sepal.Length   Sepal.Width  virginica
114   Sepal.Length   Sepal.Width  virginica
115   Sepal.Length   Sepal.Width  virginica
116   Sepal.Length   Sepal.Width  virginica
117   Sepal.Length   Sepal.Width  virginica
118   Sepal.Length   Sepal.Width  virginica
119   Sepal.Length   Sepal.Width  virginica
120   Sepal.Length   Sepal.Width  virginica
121   Sepal.Length   Sepal.Width  virginica
122   Sepal.Length   Sepal.Width  virginica
123   Sepal.Length   Sepal.Width  virginica
124   Sepal.Length   Sepal.Width  virginica
125   Sepal.Length   Sepal.Width  virginica
126   Sepal.Length   Sepal.Width  virginica
127   Sepal.Length   Sepal.Width  virginica
128   Sepal.Length   Sepal.Width  virginica
129   Sepal.Length   Sepal.Width  virginica
130   Sepal.Length   Sepal.Width  virginica
131   Sepal.Length   Sepal.Width  virginica
132   Sepal.Length   Sepal.Width  virginica
133   Sepal.Length   Sepal.Width  virginica
134   Sepal.Length   Sepal.Width  virginica
135   Sepal.Length   Sepal.Width  virginica
136   Sepal.Length   Sepal.Width  virginica
137   Sepal.Length   Sepal.Width  virginica
138   Sepal.Length   Sepal.Width  virginica
139   Sepal.Length   Sepal.Width  virginica
140   Sepal.Length   Sepal.Width  virginica
141   Sepal.Length   Sepal.Width  virginica
142   Sepal.Length   Sepal.Width  virginica
143   Sepal.Length   Sepal.Width  virginica
144   Sepal.Length   Sepal.Width  virginica
145   Sepal.Length   Sepal.Width  virginica
146   Sepal.Length   Sepal.Width  virginica
147   Sepal.Length   Sepal.Width  virginica
148   Sepal.Length   Sepal.Width  virginica
149   Sepal.Length   Sepal.Width  virginica
150   Sepal.Length   Sepal.Width  virginica

15.17.2 Aside: using R “raw strings” (only for those learning R)

NOTE:

  • “IDS1020-Intro to IDS” students can ignore this section.

If you are learning SQL but are not interested in learning more about R right now you can safely skip this section.

# RAW STRINGS in R  - i.e.    r"( ... )"
#
# You can also use a "raw string" - introduced in R 4.0
# Any string (ie. character value) that appears between r"( .... )"
# is quoted as exactly the value that appears between the r"( and )"
# symbols. For detailed info about R's new raw string feature see this page:
#
#    https://r4ds.hadley.nz/strings.html#sec-raw-strings
#
# You can use an R "raw string" to quote anything without
# resorting to backslahes or other techniques.
#
# To create a "raw string" in R place the text that you want to quote
# between    r"(      and      )"
# The r stands for raw, not "R". 
# The text being quoted can safely include any characters you like.
# For example, the following "strangeValue" gets displayed just fine.

strangeValue = r"( Backslash: \  Quote: "  Apostrophe: ' )"

cat(strangeValue)    # Backslash: \  Quote: "  Apostrophe: ' 
 Backslash: \  Quote: "  Apostrophe: ' 
# You can use r"(raw strings)" to very simply quote any SQL command
# that you want to pass to sqldf, regardless of what is in the SQL.

sqldf( r"(select "Sepal.Length", "Sepal.Width", Species from iris)" )
    Sepal.Length Sepal.Width    Species
1            5.1         3.5     setosa
2            4.9         3.0     setosa
3            4.7         3.2     setosa
4            4.6         3.1     setosa
5            5.0         3.6     setosa
6            5.4         3.9     setosa
7            4.6         3.4     setosa
8            5.0         3.4     setosa
9            4.4         2.9     setosa
10           4.9         3.1     setosa
11           5.4         3.7     setosa
12           4.8         3.4     setosa
13           4.8         3.0     setosa
14           4.3         3.0     setosa
15           5.8         4.0     setosa
16           5.7         4.4     setosa
17           5.4         3.9     setosa
18           5.1         3.5     setosa
19           5.7         3.8     setosa
20           5.1         3.8     setosa
21           5.4         3.4     setosa
22           5.1         3.7     setosa
23           4.6         3.6     setosa
24           5.1         3.3     setosa
25           4.8         3.4     setosa
26           5.0         3.0     setosa
27           5.0         3.4     setosa
28           5.2         3.5     setosa
29           5.2         3.4     setosa
30           4.7         3.2     setosa
31           4.8         3.1     setosa
32           5.4         3.4     setosa
33           5.2         4.1     setosa
34           5.5         4.2     setosa
35           4.9         3.1     setosa
36           5.0         3.2     setosa
37           5.5         3.5     setosa
38           4.9         3.6     setosa
39           4.4         3.0     setosa
40           5.1         3.4     setosa
41           5.0         3.5     setosa
42           4.5         2.3     setosa
43           4.4         3.2     setosa
44           5.0         3.5     setosa
45           5.1         3.8     setosa
46           4.8         3.0     setosa
47           5.1         3.8     setosa
48           4.6         3.2     setosa
49           5.3         3.7     setosa
50           5.0         3.3     setosa
51           7.0         3.2 versicolor
52           6.4         3.2 versicolor
53           6.9         3.1 versicolor
54           5.5         2.3 versicolor
55           6.5         2.8 versicolor
56           5.7         2.8 versicolor
57           6.3         3.3 versicolor
58           4.9         2.4 versicolor
59           6.6         2.9 versicolor
60           5.2         2.7 versicolor
61           5.0         2.0 versicolor
62           5.9         3.0 versicolor
63           6.0         2.2 versicolor
64           6.1         2.9 versicolor
65           5.6         2.9 versicolor
66           6.7         3.1 versicolor
67           5.6         3.0 versicolor
68           5.8         2.7 versicolor
69           6.2         2.2 versicolor
70           5.6         2.5 versicolor
71           5.9         3.2 versicolor
72           6.1         2.8 versicolor
73           6.3         2.5 versicolor
74           6.1         2.8 versicolor
75           6.4         2.9 versicolor
76           6.6         3.0 versicolor
77           6.8         2.8 versicolor
78           6.7         3.0 versicolor
79           6.0         2.9 versicolor
80           5.7         2.6 versicolor
81           5.5         2.4 versicolor
82           5.5         2.4 versicolor
83           5.8         2.7 versicolor
84           6.0         2.7 versicolor
85           5.4         3.0 versicolor
86           6.0         3.4 versicolor
87           6.7         3.1 versicolor
88           6.3         2.3 versicolor
89           5.6         3.0 versicolor
90           5.5         2.5 versicolor
91           5.5         2.6 versicolor
92           6.1         3.0 versicolor
93           5.8         2.6 versicolor
94           5.0         2.3 versicolor
95           5.6         2.7 versicolor
96           5.7         3.0 versicolor
97           5.7         2.9 versicolor
98           6.2         2.9 versicolor
99           5.1         2.5 versicolor
100          5.7         2.8 versicolor
101          6.3         3.3  virginica
102          5.8         2.7  virginica
103          7.1         3.0  virginica
104          6.3         2.9  virginica
105          6.5         3.0  virginica
106          7.6         3.0  virginica
107          4.9         2.5  virginica
108          7.3         2.9  virginica
109          6.7         2.5  virginica
110          7.2         3.6  virginica
111          6.5         3.2  virginica
112          6.4         2.7  virginica
113          6.8         3.0  virginica
114          5.7         2.5  virginica
115          5.8         2.8  virginica
116          6.4         3.2  virginica
117          6.5         3.0  virginica
118          7.7         3.8  virginica
119          7.7         2.6  virginica
120          6.0         2.2  virginica
121          6.9         3.2  virginica
122          5.6         2.8  virginica
123          7.7         2.8  virginica
124          6.3         2.7  virginica
125          6.7         3.3  virginica
126          7.2         3.2  virginica
127          6.2         2.8  virginica
128          6.1         3.0  virginica
129          6.4         2.8  virginica
130          7.2         3.0  virginica
131          7.4         2.8  virginica
132          7.9         3.8  virginica
133          6.4         2.8  virginica
134          6.3         2.8  virginica
135          6.1         2.6  virginica
136          7.7         3.0  virginica
137          6.3         3.4  virginica
138          6.4         3.1  virginica
139          6.0         3.0  virginica
140          6.9         3.1  virginica
141          6.7         3.1  virginica
142          6.9         3.1  virginica
143          5.8         2.7  virginica
144          6.8         3.2  virginica
145          6.7         3.3  virginica
146          6.7         3.0  virginica
147          6.3         2.5  virginica
148          6.5         3.0  virginica
149          6.2         3.4  virginica
150          5.9         3.0  virginica
# NOTE - R's new "raw string" syntax is very useful for 
# writing regular expression patterns in R
# For example:
#
#   without raw strings - you NEED \\double-backslashes

pattern = "\\d+\\.\\d+" 
cat(pattern)   # \d+\.\d+
\d+\.\d+
#   with raw strings - you DON'T need the \\double-backslashes
pattern = r"(\d+\.\d)"      
cat(pattern)   # \d+\.\d+
\d+\.\d
charNums = c("one", "1.593", "278.123", "999")
grep(pattern, charNums, value=TRUE)
[1] "1.593"   "278.123"

15.18 where clause

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# where clause   ####
#
# The where clause allows you to specify the rows that you want.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# The where clause include a logical expression. The expression is evaluated
# separately for each row in the table (i.e. dataframe in our case). 
# The data from the row is substituted into the logical expression. If the
# logical expression results in TRUE then information from the row becomes
# encorporated into the output. If the logical expression results in FALSE 
# for the row then the row does not appear in the output.

# Only the rows that have a TRUE result are displayed

sqldf("select student, test1, test2 
      from grades
      where test1 > test2")
   student test1 test2
1    frank   100    95
2 samantha   100    70
3 margaret    59    47
4     mike    90    86
5    david    90    87
6     dana   100    93
# You can get all of the columns by using the * instead of the column names
sqldf("select *
      from grades
      where test1 > test2")
     id  student year honors test1 test2 test3 major
1 s0400    frank   so   TRUE   100    95    91   ACC
2 s0600 samantha   so   TRUE   100    70    97   IDS
3 s1000 margaret   se  FALSE    59    47    NA   IDS
4 s1300     mike   ju   TRUE    90    86    79   IDS
5 s1400    david   se   TRUE    90    87    87   IDS
6 s1500     dana   so  FALSE   100    93    91   FIN
sqldf("select student, test1, test2 
      from grades
      where test1 >= test2")
   student test1 test2
1      sue    80    80
2    frank   100    95
3 samantha   100    70
4 margaret    59    47
5     mike    90    86
6    david    90    87
7     dana   100    93
sqldf("select student, honors, test1, test2 
      from grades 
      where honors == TRUE and test1 < 90 and test2 < 90")
[1] student honors  test1   test2  
<0 rows> (or 0-length row.names)
sqldf("select student, honors, test1, test2 
      from grades 
      where honors != TRUE and test1 > 90 and test2 > 90")
  student honors test1 test2
1    dana  FALSE   100    93
# NOTE: 
#
# When using the relational operators >  <   >=   <=
# with character data, values that would appear earlier in a dictionary
# are considered "less than" values that would appear later in the dictionary.

15.19 SQL operators

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
# SQL operators
#
# Different flavors of SQL sometimes use different operators.
# Look up the documentation for your particular flavor of SQL.
# See the SQLite operators here
# https://www.tutorialspoint.com/sqlite/sqlite_operators.htm
# They are summarized below.
#
#                        R operator        sqlite operator
#                        -----------       ---------------
#
#   ARITHMETIC OPERATORS
#   addition                +                 same as R
#   subtraction             -                 same as R
#   multiplication          *                 same as R
#   regular division        /                 /     depends on the type of data
#   integer division        %/%               /     depends on the type of data
#   modulus (remainder)     %%                %
#
#   LOGICAL OPERATORS
#   equal to                ==                = or == (they both work)
#   not equal to            !=                != or <> (they both work)
#   greater than            >                 same as R
#   less than               <                 same as R
#   greater or equal        >=                same as R
#   less or equal           <=                same as R
#
#   logical not             !                 not
#   logical and             &&                and
#   logical or              ||                or
#
#   CHARACTER OPERATORS
#   concatenation           paste0(a,b)       a || b
#
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#
#   NOTE - the following operators exist in sqlite but mean different 
#          things than what they mean in R.
#          We will probably not cover the details of these operators
#          (unless we have extra time) but you should be aware that
#          they do NOT do the same thing as in R.
#
#                        R operator      sqlite operator
#                        -----------     ---------------
#   BITWISE OPERATORS - we will probably not cover these unless we have extra time.
#
#   bitwise "and"        not avaialble        &
#   bitwise "or"         not avaialble        |
#   bitwise "not"        not avaialble        ~
#   left shift bits      not avaialble        <<
#   right shift bits     not avaialble        >>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

15.20 Performing calculations with SQL

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
# Using SQL to perform calculations  ####
#
# In the sqlite flavor of SQL you can perform calculations by simply
# using SELECT without a FROM clause.
#
# Some other versions of SQL (e.g. Oracle) require a FROM clause in every select.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

# RESULTS of following query
#
#     3+2   23/5    23.0/5
#       5      4       4.6

sqldf("select 3+2, 23/5, 23.0 / 5")   # 5   4   4.6 (headings show calculations)
  3+2 23/5 23.0 / 5
1   5    4      4.6
# Same result but headings show Sum, IntegerDivision, RealDivision
#
#     Sum   IntegerDivision   RealDivision
#       5                 4            4.6

sqldf("select 3+2 as Sum, 
      23/5 as IntegerDivision, 
      23.0 / 5 as RealDivision")   # 5   4   4.6   (same results with differnet headings)
  Sum IntegerDivision RealDivision
1   5               4          4.6
# You don't actullay need the word "as" but it makes the code more readable.
# The following will produce the same results as above.

sqldf("select 3+2 Sum, 
      23/5 IntegerDivision, 
      23.0 / 5 RealDivision")   # 5   4   4.6   (same results with differnet headings)
  Sum IntegerDivision RealDivision
1   5               4          4.6
# The column heading must only use legal column heading characters.
# If you want to add non-legal column-heading characters, you can put the
# column heading in "double quotes".
# The following adds spaces to some of the column headings.

sqldf('select 3+2 "Sum", 
      23/5 "Integer Division", 
      23.0 / 5 "Real Division"')   # 5   4   4.6   (same results with differnet headings)
  Sum Integer Division Real Division
1   5                4           4.6
# If you perform calculations that do not refer to the columns in a table,
# you will get back the result of the calculation for every row of the table.
# 
# The following will return multiple rows of the answers, one row for each
# of the grades table.
# 
# You generally don't want to do this, which is why we left off the FROM 
# clause in our earlier examples.

sqldf('select 3+2 "Sum", 
      23/5 "Integer Division", 
      23.0 / 5 "Real Division"
      from grades')                # you probably dont want the from clause here
   Sum Integer Division Real Division
1    5                4           4.6
2    5                4           4.6
3    5                4           4.6
4    5                4           4.6
5    5                4           4.6
6    5                4           4.6
7    5                4           4.6
8    5                4           4.6
9    5                4           4.6
10   5                4           4.6
11   5                4           4.6
12   5                4           4.6
13   5                4           4.6
14   5                4           4.6
15   5                4           4.6
sqldf("select 3 as curve, student, test1, test1 + 3 as curved_test1 from grades")
   curve  student test1 curved_test1
1      3      joe    70           73
2      3      sue    80           83
3      3     anne    90           93
4      3    frank   100          103
5      3      bob    30           33
6      3 samantha   100          103
7      3    larry    32           35
8      3     bill    84           87
9      3 lucricia    80           83
10     3 margaret    59           62
11     3    dalia    85           88
12     3      zvi    90           93
13     3     mike    90           93
14     3    david    90           93
15     3     dana   100          103

15.21 Reminder - how NA works in R

NOTE:

  • “IDS1020-Intro to IDS” students can ignore this section.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Remember how NA works in R  ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# ANY expression that contains NA results in NA

franksSalary = 100
tinasSalary = 200
nancysSalary = NA
nedsSalary = NA

# all of the following result in NA

franksSalary > nancysSalary  # NA
[1] NA
nancysSalary == nedsSalary   # NA
[1] NA
franksSalary + nancysSalary  # NA
[1] NA
# The only way to test for NA in R is with the is.na function

salarys = c(100,200,NA,NA) 
salarys == NA     # NA NA NA NA
[1] NA NA NA NA
is.na(salarys)    # FALSE FALSE TRUE TRUE
[1] FALSE FALSE  TRUE  TRUE
# In R, you can check for non-na values using not operator ( ! )

!is.na(salarys)   # TRUE TRUE FALSE FALSE
[1]  TRUE  TRUE FALSE FALSE
#..................................................................
# Remember in R that when you perform any operation with NA
# the result is NA. For example:
#
#    > NA + 10
#    [1] NA
#
#    > NA == NA  
#    [1] NA
#
#    > NA != NA  
#    [1] NA
#   
# SQL has the same approach to NULL values. Any operation with NULL
# results in NULL.
#..................................................................

15.22 NULL in SQL

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# NULL values in SQL  ####
#
# In relational database tables, the value NULL is used to record 
# a value that is "not known". For example a salary value of NULL
# is treated as a salary that hasn't been recorded in the database yet. 
# This is very different from a salary of 0 which would be the case if 
# someone is volunteering or donating their time.
#
# Note that when using sqldf to run SQL commands in R
# the data is stored in an R dataframe. Therefore these types of values
# appear in the dataframe as NA but the SQL statements use the term NULL.
# In an actual relational database these values would also appear in the 
# table as NULL (not as NA)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#...............................................................
# Check for values that are NULL in the following way:
#     
#     WHERE COLUMN_NAME IS NULL
#     (see examples below)
#
# NOTE: DO NOT USE "== NULL". The == operator will NOT work with NULL
#...............................................................

# show all rows for which the test3 grade is NULL 
# 
# NOTE: Remember that R does not understand NULL, R uses NA.
#       Remember that SQL understands NULL and does not understand NA.
#       Since sqldf uses R dataframes, the sqldf function automatically 
#       translates NULL into NA and vice versa when using SQL code to 
#       access data in R dataframes.

sqldf("select *
      from grades
      where test3 is NULL")
     id  student year honors test1 test2 test3 major
1 s0100      joe   fr  FALSE    70    82    NA   IDS
2 s0500      bob   so  FALSE    30    NA    NA   IDS
3 s0700    larry   ju  FALSE    32    80    NA   FIN
4 s1000 margaret   se  FALSE    59    47    NA   IDS
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!

sqldf("select *
      from grades
      where test3 = NULL")  # use IS NULL!!!
[1] id      student year    honors  test1   test2   test3   major  
<0 rows> (or 0-length row.names)
#...............................................................
# Check for values that aren't NULL in the following way:
#     
#     WHERE COLUMN_NAME IS NOT NULL
#     (see examples below)
#
# NOTE: DO NOT USE "!= NULL". The != operator will NOT work with NULL
#...............................................................

# show all rows for which the test3 grade is NOT NULL (i.e. isn't NA)

sqldf("select *
      from grades
      where test3 is NOT NULL")
      id  student year honors test1 test2 test3 major
1  s0200      sue   fr  FALSE    80    80    79   IDS
2  s0300     anne   fr   TRUE    90    95    92   IDS
3  s0400    frank   so   TRUE   100    95    91   ACC
4  s0600 samantha   so   TRUE   100    70    97   IDS
5  s0800     bill   se  FALSE    84    90    92   ACC
6  s0900 lucricia   fr   TRUE    80   100    88   IDS
7  s1100    dalia   se  FALSE    85    92    93   FIN
8  s1200      zvi   ju   TRUE    90    98    92   ACC
9  s1300     mike   ju   TRUE    90    86    79   IDS
10 s1400    david   se   TRUE    90    87    87   IDS
11 s1500     dana   so  FALSE   100    93    91   FIN
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!

sqldf("select *
      from grades
      where test3 != NULL")   # use IS NOT NULL !!!
[1] id      student year    honors  test1   test2   test3   major  
<0 rows> (or 0-length row.names)

15.23 Practice

# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# QUESTION
#
# curve all test3 grades by 10 points (NULLs remain NULL)
#
# (remember that since we are using R, the NULL values
#  appear as NA)
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

sqldf("select student, test3, test3 + 10 as curved_test3
       from grades
       order by student")
    student test3 curved_test3
1      anne    92          102
2      bill    92          102
3       bob    NA           NA
4     dalia    93          103
5      dana    91          101
6     david    87           97
7     frank    91          101
8       joe    NA           NA
9     larry    NA           NA
10 lucricia    88           98
11 margaret    NA           NA
12     mike    79           89
13 samantha    97          107
14      sue    79           89
15      zvi    92          102
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# QUESTION
#
# Show rows where both test2 and test3 are NULL
# REMEMBER when looking for NULL values: 
#
#    USE:        value IS NULL      # correct
#    DON'T use:  value = NULL       # wrong
#
#    USE:        value IS NOT NULL  # correct
#    DON'T use:  value != NULL      # wrong
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

sqldf("select student, test1, test2, test3
       from grades
       where test2 IS NULL and test3 IS NULL  
       order by student")
  student test1 test2 test3
1     bob    30    NA    NA
# DON'T DO THE FOLLOWING !!! IT DOESN'T WORK!!!

sqldf("select student, test1, test2, test3
       from grades
       where test2 == NULL and test3 == NULL  
       order by student")
[1] student test1   test2   test3  
<0 rows> (or 0-length row.names)

15.24 Tutorials sqlbolt.com and w3chools.com

Please see the tutorials at the following locations.

The file <sqlbolt_tables-v007.RData> contains the dataframes with the information for each table that are used on these websites. These dataframes can be loaded into R from this file with the following command.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# sqlbolt.com    and    https://www.w3schools.com/sql/sql_join.asp    ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
load("sqlbolt_tables-v007.RData")

Below are the contents of these tables:

sqldf("select * from movies")
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
sqldf("select * from orders")
    OrderID CustomerID EmployeeID  OrderDate ShipperID
1     10248         90          5   7/4/1996         3
2     10249         81          6   7/5/1996         1
3     10250         34          4   7/8/1996         2
4     10251         84          3   7/8/1996         1
5     10252         76          4   7/9/1996         2
6     10253         34          3  7/10/1996         2
7     10254         14          5  7/11/1996         2
8     10255         68          9  7/12/1996         3
9     10256         88          3  7/15/1996         2
10    10257         35          4  7/16/1996         3
11    10258         20          1  7/17/1996         1
12    10259         13          4  7/18/1996         3
13    10260         55          4  7/19/1996         1
14    10261         61          4  7/19/1996         2
15    10262         65          8  7/22/1996         3
16    10263         20          9  7/23/1996         3
17    10264         24          6  7/24/1996         3
18    10265          7          2  7/25/1996         1
19    10266         87          3  7/26/1996         3
20    10267         25          4  7/29/1996         1
21    10268         33          8  7/30/1996         3
22    10269         89          5  7/31/1996         1
23    10270         87          1   8/1/1996         1
24    10271         75          6   8/1/1996         2
25    10272         65          6   8/2/1996         2
26    10273         63          3   8/5/1996         3
27    10274         85          6   8/6/1996         1
28    10275         49          1   8/7/1996         1
29    10276         80          8   8/8/1996         3
30    10277         52          2   8/9/1996         3
31    10278          5          8  8/12/1996         2
32    10279         44          8  8/13/1996         2
33    10280          5          2  8/14/1996         1
34    10281         69          4  8/14/1996         1
35    10282         69          4  8/15/1996         1
36    10283         46          3  8/16/1996         3
37    10284         44          4  8/19/1996         1
38    10285         63          1  8/20/1996         2
39    10286         63          8  8/21/1996         3
40    10287         67          8  8/22/1996         3
41    10288         66          4  8/23/1996         1
42    10289         11          7  8/26/1996         3
43    10290         15          8  8/27/1996         1
44    10291         61          6  8/27/1996         2
45    10292         81          1  8/28/1996         2
46    10293         80          1  8/29/1996         3
47    10294         65          4  8/30/1996         2
48    10295         85          2   9/2/1996         2
49    10296         46          6   9/3/1996         1
50    10297          7          5   9/4/1996         2
51    10298         37          6   9/5/1996         2
52    10299         67          4   9/6/1996         2
53    10300         49          2   9/9/1996         2
54    10301         86          8   9/9/1996         2
55    10302         76          4  9/10/1996         2
56    10303         30          7  9/11/1996         2
57    10304         80          1  9/12/1996         2
58    10305         55          8  9/13/1996         3
59    10306         69          1  9/16/1996         3
60    10307         48          2  9/17/1996         2
61    10308          2          7  9/18/1996         3
62    10309         37          3  9/19/1996         1
63    10310         77          8  9/20/1996         2
64    10311         18          1  9/20/1996         3
65    10312         86          2  9/23/1996         2
66    10313         63          2  9/24/1996         2
67    10314         65          1  9/25/1996         2
68    10315         38          4  9/26/1996         2
69    10316         65          1  9/27/1996         3
70    10317         48          6  9/30/1996         1
71    10318         38          8  10/1/1996         2
72    10319         80          7  10/2/1996         3
73    10320         87          5  10/3/1996         3
74    10321         38          3  10/3/1996         2
75    10322         58          7  10/4/1996         3
76    10323         39          4  10/7/1996         1
77    10324         71          9  10/8/1996         1
78    10325         39          1  10/9/1996         3
79    10326          8          4 10/10/1996         2
80    10327         24          2 10/11/1996         1
81    10328         28          4 10/14/1996         3
82    10329         75          4 10/15/1996         2
83    10330         46          3 10/16/1996         1
84    10331          9          9 10/16/1996         1
85    10332         51          3 10/17/1996         2
86    10333         87          5 10/18/1996         3
87    10334         84          8 10/21/1996         2
88    10335         37          7 10/22/1996         2
89    10336         60          7 10/23/1996         2
90    10337         25          4 10/24/1996         3
91    10338         55          4 10/25/1996         3
92    10339         51          2 10/28/1996         2
93    10340          9          1 10/29/1996         3
94    10341         73          7 10/29/1996         3
95    10342         25          4 10/30/1996         2
96    10343         44          4 10/31/1996         1
97    10344         89          4  11/1/1996         2
98    10345         63          2  11/4/1996         2
99    10346         65          3  11/5/1996         3
100   10347         21          4  11/6/1996         3
101   10348         86          4  11/7/1996         2
102   10349         75          7  11/8/1996         1
103   10350         41          6 11/11/1996         2
104   10351         20          1 11/11/1996         1
105   10352         28          3 11/12/1996         3
106   10353         59          7 11/13/1996         3
107   10354         58          8 11/14/1996         3
108   10355          4          6 11/15/1996         1
109   10356         86          6 11/18/1996         2
110   10357         46          1 11/19/1996         3
111   10358         41          5 11/20/1996         1
112   10359         72          5 11/21/1996         3
113   10360          7          4 11/22/1996         3
114   10361         63          1 11/22/1996         2
115   10362          9          3 11/25/1996         1
116   10363         17          4 11/26/1996         3
117   10364         19          1 11/26/1996         1
118   10365          3          3 11/27/1996         2
119   10366         29          8 11/28/1996         2
120   10367         83          7 11/28/1996         3
121   10368         20          2 11/29/1996         2
122   10369         75          8  12/2/1996         2
123   10370         14          6  12/3/1996         2
124   10371         41          1  12/3/1996         1
125   10372         62          5  12/4/1996         2
126   10373         37          4  12/5/1996         3
127   10374         91          1  12/5/1996         3
128   10375         36          3  12/6/1996         2
129   10376         51          1  12/9/1996         2
130   10377         72          1  12/9/1996         3
131   10378         24          5 12/10/1996         3
132   10379         61          2 12/11/1996         1
133   10380         37          8 12/12/1996         3
134   10381         46          3 12/12/1996         3
135   10382         20          4 12/13/1996         1
136   10383          4          8 12/16/1996         3
137   10384          5          3 12/16/1996         3
138   10385         75          1 12/17/1996         2
139   10386         21          9 12/18/1996         3
140   10387         70          1 12/18/1996         2
141   10388         72          2 12/19/1996         1
142   10389         10          4 12/20/1996         2
143   10390         20          6 12/23/1996         1
144   10391         17          3 12/23/1996         3
145   10392         59          2 12/24/1996         3
146   10393         71          1 12/25/1996         3
147   10394         36          1 12/25/1996         3
148   10395         35          6 12/26/1996         1
149   10396         25          1 12/27/1996         3
150   10397         60          5 12/27/1996         1
151   10398         71          2 12/30/1996         3
152   10399         83          8 12/31/1996         3
153   10400         19          1   1/1/1997         3
154   10401         65          1   1/1/1997         1
155   10402         20          8   1/2/1997         2
156   10403         20          4   1/3/1997         3
157   10404         49          2   1/3/1997         1
158   10405         47          1   1/6/1997         1
159   10406         62          7   1/7/1997         1
160   10407         56          2   1/7/1997         2
161   10408         23          8   1/8/1997         1
162   10409         54          3   1/9/1997         1
163   10410         10          3  1/10/1997         3
164   10411         10          9  1/10/1997         3
165   10412         87          8  1/13/1997         2
166   10413         41          3  1/14/1997         2
167   10414         21          2  1/14/1997         3
168   10415         36          3  1/15/1997         1
169   10416         87          8  1/16/1997         3
170   10417         73          4  1/16/1997         3
171   10418         63          4  1/17/1997         1
172   10419         68          4  1/20/1997         2
173   10420         88          3  1/21/1997         1
174   10421         61          8  1/21/1997         1
175   10422         27          2  1/22/1997         1
176   10423         31          6  1/23/1997         3
177   10424         51          7  1/23/1997         2
178   10425         41          6  1/24/1997         2
179   10426         29          4  1/27/1997         1
180   10427         59          4  1/27/1997         2
181   10428         66          7  1/28/1997         1
182   10429         37          3  1/29/1997         2
183   10430         20          4  1/30/1997         1
184   10431         10          4  1/30/1997         2
185   10432         75          3  1/31/1997         2
186   10433         60          3   2/3/1997         3
187   10434         24          3   2/3/1997         2
188   10435         16          8   2/4/1997         2
189   10436          7          3   2/5/1997         2
190   10437         87          8   2/5/1997         1
191   10438         79          3   2/6/1997         2
192   10439         51          6   2/7/1997         3
193   10440         71          4  2/10/1997         2
194   10441         55          3  2/10/1997         2
195   10442         20          3  2/11/1997         2
196   10443         66          8  2/12/1997         1
sqldf("select * from customers")
   CustomerID                         CustomerName              ContactName                                           Address             City PostalCode     Country
1           1                  Alfreds Futterkiste             Maria Anders                                     Obere Str. 57           Berlin      12209     Germany
2           2   Ana Trujillo Emparedados y helados             Ana Trujillo                  Avda. de la Constituci\xf3n 2222   M\xe9xico D.F.       5021      Mexico
3           3           Antonio Moreno Taquer\xeda           Antonio Moreno                                    Mataderos 2312   M\xe9xico D.F.       5023      Mexico
4           4                      Around the Horn             Thomas Hardy                                   120 Hanover Sq.           London    WA1 1DP          UK
5           5                Berglunds snabbk\xf6p       Christina Berglund                                 Berguvsv\xe4gen 8         Lule\xe5   S-958 22      Sweden
6           6              Blauer See Delikatessen               Hanna Moos                                    Forsterstr. 57         Mannheim      68306     Germany
7           7              Blondel p\xe8re et fils Fr\xe9d\xe9rique Citeaux                               24, place Kl\xe9ber       Strasbourg      67000      France
8           8         B\xf3lido Comidas preparadas         Mart\xedn Sommer                                    C/ Araquil, 67           Madrid      28023       Spain
9           9                             Bon app'        Laurence Lebihans                              12, rue des Bouchers        Marseille      13008      France
10         10               Bottom-Dollar Marketse        Elizabeth Lincoln                                23 Tsawassen Blvd.        Tsawassen    T2F 8M4      Canada
11         11                        B's Beverages        Victoria Ashworth                                 Fauntleroy Circus           London    EC2 5NT          UK
12         12           Cactus Comidas para llevar         Patricio Simpson                                       Cerrito 333     Buenos Aires       1010   Argentina
13         13           Centro comercial Moctezuma          Francisco Chang                           Sierras de Granada 9993   M\xe9xico D.F.       5022      Mexico
14         14                    Chop-suey Chinese                Yang Wang                                      Hauptstr. 29             Bern       3012 Switzerland
15         15                  Com\xe9rcio Mineiro             Pedro Afonso                           Av. dos Lus\xedadas, 23     S\xe3o Paulo  05432-043      Brazil
16         16                Consolidated Holdings          Elizabeth Brown                       Berkeley Gardens 12 Brewery           London    WX1 6LT          UK
17         17            Drachenblut Delikatessend             Sven Ottlieb                                      Walserweg 21           Aachen      52066     Germany
18         18                      Du monde entier           Janine Labrune                      67, rue des Cinquante Otages           Nantes      44000      France
19         19                   Eastern Connection                Ann Devon                                    35 King George           London    WX3 6FW          UK
20         20                         Ernst Handel            Roland Mendel                                      Kirchgasse 6             Graz       8010     Austria
21         21                   Familia Arquibaldo                Aria Cruz                                   Rua Or\xf3s, 92     S\xe3o Paulo  05442-030      Brazil
22         22 FISSA Fabrica Inter. Salchichas S.A.               Diego Roel                                C/ Moralzarzal, 86           Madrid      28034       Spain
23         23                    Folies gourmandes         Martine Ranc\xe9                       184, chauss\xe9e de Tournai            Lille      59000      France
24         24                    Folk och f\xe4 HB            Maria Larsson                                   \xc5kergatan 24        Br\xe4cke   S-844 67      Sweden
25         25                       Frankenversand            Peter Franken                                 Berliner Platz 43       M\xfcnchen      80805     Germany
26         26                  France restauration           Carine Schmitt                                    54, rue Royale           Nantes      44000      France
27         27                       Franchi S.p.A.            Paolo Accorti                               Via Monte Bianco 34           Torino      10100       Italy
28         28       Furia Bacalhau e Frutos do Mar           Lino Rodriguez                            Jardim das rosas n. 32           Lisboa       1675    Portugal
29         29         Galer\xeda del gastr\xf3nomo         Eduardo Saavedra                         Rambla de Catalu\xf1a, 23        Barcelona       8022       Spain
30         30               Godos Cocina T\xedpica     Jos\xe9 Pedro Freyre                                     C/ Romero, 33          Sevilla      41101       Spain
31         31                  Gourmet Lanchonetes         Andr\xe9 Fonseca                                   Av. Brasil, 442         Campinas  04876-786      Brazil
32         32              Great Lakes Food Market            Howard Snyder                                  2732 Baker Blvd.           Eugene      97403         USA
33         33                 GROSELLA-Restaurante           Manuel Pereira                      5\xaa Ave. Los Palos Grandes          Caracas       1081   Venezuela
34         34                        Hanari Carnes             Mario Pontes                                Rua do Pa\xe7o, 67   Rio de Janeiro  05454-876      Brazil
35         35                  HILARI\xd3N-Abastos      Carlos Hern\xe1ndez        Carrera 22 con Ave. Carlos Soublette #8-35 San Crist\xf3bal       5022   Venezuela
36         36           Hungry Coyote Import Store            Yoshi Latimer                    City Center Plaza 516 Main St.            Elgin      97827         USA
37         37         Hungry Owl All-Night Grocers         Patricia McKenna                                  8 Johnstown Road             Cork                Ireland
38         38                       Island Trading            Helen Bennett                         Garden House Crowther Way            Cowes   PO31 7PJ          UK
39         39                   K\xf6niglich Essen            Philip Cramer                                     Maubelstr. 90      Brandenburg      14776     Germany
40         40                 La corne d'abondance            Daniel Tonini                            67, avenue de l'Europe       Versailles      78000      France
41         41                     La maison d'Asie           Annette Roulet                             1 rue Alsace-Lorraine         Toulouse      31000      France
42         42        Laughing Bacchus Wine Cellars          Yoshi Tannamuri                                      1900 Oak St.        Vancouver    V3F 2K1      Canada
43         43                 Lazy K Kountry Store               John Steel                              12 Orchestra Terrace      Walla Walla      99362         USA
44         44                  Lehmanns Marktstand           Renate Messner                                      Magazinweg 7   Frankfurt a.M.      60528     Germany
45         45                    Let's Stop N Shop             Jaime Yorres                               87 Polk St. Suite 5    San Francisco      94117         USA
46         46                    LILA-Supermercado       Carlos Gonz\xe1lez Carrera 52 con Ave. Bol\xedvar #65-98 Llano Largo     Barquisimeto       3508   Venezuela
47         47                     LINO-Delicateses         Felipe Izquierdo                           Ave. 5 de Mayo Porlamar  I. de Margarita       4980   Venezuela
48         48             Lonesome Pine Restaurant              Fran Wilson                                89 Chiaroscuro Rd.         Portland      97219         USA
49         49         Magazzini Alimentari Riuniti         Giovanni Rovelli                           Via Ludovico il Moro 22          Bergamo      24100       Italy
50         50                         Maison Dewey          Catherine Dewey                               Rue Joseph-Bens 532        Bruxelles     B-1180     Belgium
51         51                    M\xe8re Paillarde        Jean Fresni\xe8re                                43 rue St. Laurent      Montr\xe9al    H1J 1C3      Canada
52         52               Morgenstern Gesundkost          Alexander Feuer                                       Heerstr. 22          Leipzig       4179     Germany
53         53                          North/South           Simon Crowther                      South House 300 Queensbridge           London    SW7 1RZ          UK
54         54         Oc\xe9ano Atl\xe1ntico Ltda.           Yvonne Moncada               Ing. Gustavo Moncada 8585 Piso 20-A     Buenos Aires       1010   Argentina
55         55               Old World Delicatessen            Rene Phillips                                   2743 Bering St.        Anchorage      99508         USA
56         56                Ottilies K\xe4seladen      Henriette Pfalzheim                                Mehrheimerstr. 369          K\xf6ln      50739     Germany
57         57              Paris sp\xe9cialit\xe9s           Marie Bertrand                           265, boulevard Charonne            Paris      75012      France
58         58         Pericles Comidas cl\xe1sicas   Guillermo Fern\xe1ndez                          Calle Dr. Jorge Cash 321   M\xe9xico D.F.       5033      Mexico
59         59                     Piccolo und mehr              Georg Pipps                                       Geislweg 14         Salzburg       5020     Austria
60         60              Princesa Isabel Vinhoss         Isabel de Castro                         Estrada da sa\xfade n. 58           Lisboa       1756    Portugal
61         61                       Que Del\xedcia         Bernardo Batista                           Rua da Panificadora, 12   Rio de Janeiro  02389-673      Brazil
62         62                        Queen Cozinha        L\xfacia Carvalho                      Alameda dos Can\xe0rios, 891     S\xe3o Paulo  05487-020      Brazil
63         63                           QUICK-Stop              Horst Kloss                               Taucherstra\xdfe 10        Cunewalde       1307     Germany
64         64                        Rancho grande      Sergio Guti\xe9rrez                            Av. del Libertador 900     Buenos Aires       1010   Argentina
65         65           Rattlesnake Canyon Grocery             Paula Wilson                                   2817 Milton Dr.      Albuquerque      87110         USA
66         66                   Reggiani Caseifici          Maurizio Moroni                            Strada Provinciale 124    Reggio Emilia      42100       Italy
67         67                   Ricardo Adocicados           Janete Limeira                               Av. Copacabana, 267   Rio de Janeiro  02389-890      Brazil
68         68                   Richter Supermarkt             Michael Holz                                 Grenzacherweg 237        Gen\xe8ve       1203 Switzerland
69         69                     Romero y tomillo         Alejandra Camino                                    Gran V\xeda, 1           Madrid      28001       Spain
70         70                     Sant\xe9 Gourmet         Jonas Bergulfsen                            Erling Skakkes gate 78          Stavern       4110      Norway
71         71                   Save-a-lot Markets           Jose Pavarotti                                   187 Suffolk Ln.            Boise      83720         USA
72         72                   Seven Seas Imports               Hari Kumar                                   90 Wadhurst Rd.           London   OX15 4NB          UK
73         73                        Simons bistro           Jytte Petersen                                   Vinb\xe6ltet 34     K\xf8benhavn       1734     Denmark
74         74           Sp\xe9cialit\xe9s du monde        Dominique Perrier                                 25, rue Lauriston            Paris      75016      France
75         75                Split Rail Beer & Ale       Art Braunschweiger                                      P.O. Box 555           Lander      82520         USA
76         76               Supr\xeames d\xe9lices         Pascale Cartrain                              Boulevard Tirou, 255        Charleroi     B-6000     Belgium
77         77                       The Big Cheese                Liz Nixon                          89 Jefferson Way Suite 2         Portland      97201         USA
78         78                      The Cracker Box                 Liu Wong                               55 Grizzly Peak Rd.            Butte      59801         USA
79         79                Toms Spezialit\xe4ten            Karin Josephs                                     Luisenstr. 48       M\xfcnster      44087     Germany
80         80                  Tortuga Restaurante     Miguel Angel Paolino                                  Avda. Azteca 123   M\xe9xico D.F.       5033      Mexico
81         81         Tradi\xe7\xe3o Hipermercados        Anabela Domingues                        Av. In\xeas de Castro, 414     S\xe3o Paulo  05634-030      Brazil
82         82    Trail's Head Gourmet Provisioners           Helvetius Nagy                                 722 DaVinci Blvd.         Kirkland      98034         USA
83         83                         Vaffeljernet              Palle Ibsen                                  Smagsl\xf8get 45         \xc5rhus       8200     Denmark
84         84                 Victuailles en stock             Mary Saveley                                2, rue du Commerce             Lyon      69004      France
85         85            Vins et alcools Chevalier             Paul Henriot                                59 rue de l'Abbaye            Reims      51100      France
86         86                    Die Wandernde Kuh           Rita M\xfcller                                 Adenauerallee 900        Stuttgart      70563     Germany
87         87                       Wartian Herkku         Pirkko Koskitalo                                       Torikatu 38             Oulu      90110     Finland
88         88               Wellington Importadora            Paula Parente                                Rua do Mercado, 12          Resende  08737-363      Brazil
89         89                 White Clover Markets           Karl Jablonski                       305 - 14th Ave. S. Suite 3B          Seattle      98128         USA
90         90                          Wilman Kala          Matti Karttunen                                     Keskuskatu 45         Helsinki      21240     Finland
91         91                               Wolski                  Zbyszek                                   ul. Filtrowa 68            Walla     01-012      Poland
sqldf("select * from north_american_cities")
                  city       country population latitude  longitude
1          Guadalajara        Mexico    1500800 20.65970 -103.34961
2              Toronto        Canada    2795060 43.65323  -79.38318
3              Houston United States    2195914 29.76043  -95.36980
4             New York United States    8405837 40.71278  -74.00594
5         Philadelphia United States    1553165 39.95258  -75.16522
6               Havana          Cuba    2106146 23.05407  -82.34519
7          Mexico City        Mexico    8555500 19.43261  -99.13321
8              Phoenix United States    1513367 33.44838 -112.07404
9          Los Angeles United States    3884307 34.05223 -118.24368
10 Ecatepec de Morelos        Mexico    1742000 19.60184  -99.05067
11            Montreal        Canada    1717767 45.50169  -73.56726
12             Chicago United States    2718782 41.87811  -87.62980

15.24.1 Practice - select clause

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# https://sqlbolt.com/lesson/select_queries_introduction
#
# Concepts:
#
#  Select query for a specific columns
#
#   SELECT column, another_column, .
#   FROM mytable;
#
#
#  Select query for all columns
#   
#    SELECT * 
#    FROM mytable;
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#https://sqlbolt.com/lesson/select_queries_introduction

movies  
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
###########################
#Exercise 1 - Tasks
###########################

#Find the title of each film

sql = 
  "select title
   from movies"

sql
[1] "select title\n   from movies"
sqldf(sql)
                 title
1            Toy Story
2         A Bug's Life
3          Toy Story 2
4       Monsters, Inc.
5         Finding Nemo
6      The Incredibles
7                 Cars
8          Ratatouille
9               WALL-E
10                  Up
11         Toy Story 3
12              Cars 2
13               Brave
14 Monsters University
15              WALL-G
#Find the director of each film

sql = 
  "select director
   from movies"

sqldf(sql)
         director
1   John Lasseter
2   John Lasseter
3   John Lasseter
4     Pete Docter
5  Andrew Stanton
6       Brad Bird
7   John Lasseter
8       Brad Bird
9  Andrew Stanton
10    Pete Docter
11    Lee Unkrich
12  John Lasseter
13 Brenda Chapman
14    Dan Scanlon
15 Brenda Chapman
#Find the title and director of each film

sql = 
  "select title, director
   from movies"

sqldf(sql)
                 title       director
1            Toy Story  John Lasseter
2         A Bug's Life  John Lasseter
3          Toy Story 2  John Lasseter
4       Monsters, Inc.    Pete Docter
5         Finding Nemo Andrew Stanton
6      The Incredibles      Brad Bird
7                 Cars  John Lasseter
8          Ratatouille      Brad Bird
9               WALL-E Andrew Stanton
10                  Up    Pete Docter
11         Toy Story 3    Lee Unkrich
12              Cars 2  John Lasseter
13               Brave Brenda Chapman
14 Monsters University    Dan Scanlon
15              WALL-G Brenda Chapman
#Find the title and year of each film

sql = 
  "select title, year
   from movies"

sqldf(sql)
                 title year
1            Toy Story 1995
2         A Bug's Life 1998
3          Toy Story 2 1999
4       Monsters, Inc. 2001
5         Finding Nemo 2003
6      The Incredibles 2004
7                 Cars 2006
8          Ratatouille 2007
9               WALL-E 2008
10                  Up 2009
11         Toy Story 3 2010
12              Cars 2 2011
13               Brave 2012
14 Monsters University 2013
15              WALL-G 2042
#Find all the information about each film

sqldf("select *
       from movies")
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
# The following produces the exact same results as above

sqldf("select id, title, director, year, length_minutes
       from movies")
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92

Practice - ADDITIONAL QUESTIONS (not on sqlbolt.com)

# QUESTION
# Show the movie name, length_minutes and a third column that you 
# calculate which is the length in hours

# ANSWER
#
# when you divide an integer by an integer the result in sql in an integer
# if you divide a floating point number by an integer or vice versa the 
# result is floating point (i.e. a number that has values after the decimal point)

# This is a good start but the length_hours column will always show up 
# as a whole number - this is NOT what we wanted.

sql = "select title, length_minutes, length_minutes/60 as length_hours
       from movies"
sqldf(sql)
                 title length_minutes length_hours
1            Toy Story             81            1
2         A Bug's Life             95            1
3          Toy Story 2             93            1
4       Monsters, Inc.             92            1
5         Finding Nemo            107            1
6      The Incredibles            116            1
7                 Cars            117            1
8          Ratatouille            115            1
9               WALL-E            104            1
10                  Up            101            1
11         Toy Story 3            103            1
12              Cars 2            120            2
13               Brave            102            1
14 Monsters University            110            1
15              WALL-G             92            1
# Change 60 to 60.0 to produce a "floating point" value (i.e. a value
# that contains numbers after the decimal point)

sql = "select title, length_minutes, length_minutes/60.0 as length_hours
       from movies"
sqldf(sql)
                 title length_minutes length_hours
1            Toy Story             81     1.350000
2         A Bug's Life             95     1.583333
3          Toy Story 2             93     1.550000
4       Monsters, Inc.             92     1.533333
5         Finding Nemo            107     1.783333
6      The Incredibles            116     1.933333
7                 Cars            117     1.950000
8          Ratatouille            115     1.916667
9               WALL-E            104     1.733333
10                  Up            101     1.683333
11         Toy Story 3            103     1.716667
12              Cars 2            120     2.000000
13               Brave            102     1.700000
14 Monsters University            110     1.833333
15              WALL-G             92     1.533333
# QUESTION
# show the length in hours, minutes - eg. 81 minutes show as 2 hours and 21 minutes

# ANSWER
sql =
  "select title, length_minutes, length_minutes / 60 as hours,
                  length_minutes - 60 * (length_minutes/60) as minutes
   from movies"
sqldf(sql)
                 title length_minutes hours minutes
1            Toy Story             81     1      21
2         A Bug's Life             95     1      35
3          Toy Story 2             93     1      33
4       Monsters, Inc.             92     1      32
5         Finding Nemo            107     1      47
6      The Incredibles            116     1      56
7                 Cars            117     1      57
8          Ratatouille            115     1      55
9               WALL-E            104     1      44
10                  Up            101     1      41
11         Toy Story 3            103     1      43
12              Cars 2            120     2       0
13               Brave            102     1      42
14 Monsters University            110     1      50
15              WALL-G             92     1      32
# ANSWER - using sqlite's % operator
sql =
  "select title, length_minutes, length_minutes / 60 as hours,
                  length_minutes % 60 as minutes
   from movies"
sqldf(sql)
                 title length_minutes hours minutes
1            Toy Story             81     1      21
2         A Bug's Life             95     1      35
3          Toy Story 2             93     1      33
4       Monsters, Inc.             92     1      32
5         Finding Nemo            107     1      47
6      The Incredibles            116     1      56
7                 Cars            117     1      57
8          Ratatouille            115     1      55
9               WALL-E            104     1      44
10                  Up            101     1      41
11         Toy Story 3            103     1      43
12              Cars 2            120     2       0
13               Brave            102     1      42
14 Monsters University            110     1      50
15              WALL-G             92     1      32
# Note that while it is tempting to refer to hours on line 2 of the following
# query, R's version of SQL (ie. sqlite) will not allow for this.
# Some SQL flavors will allow this.
#
# The following WILL NOT WORK IN R's VERSION OF SQL
sql =
  "select title, length_minutes, length_minutes / 60 as hours,
                  length_minutes - 60 * hours as minutes
   from movies"

# sqldf(sql)       # ERROR - no such column: hours

15.24.2 Practice - where clause

#-------------------------------------------------------------------------
# https://sqlbolt.com/lesson/select_queries_with_constraints
# 
# CONCEPTS
#
# Select query with constraints
# SELECT column, another_column, .
# FROM mytable
# WHERE condition
# AND/OR another_condition
# AND/OR .;
# 
# 
# 
# # List of SQL operators (see webpage) 
# Operator  Condition   SQL Example
# =, !=, < <=, >, >=    Standard numerical operators    col_name != 4
# BETWEEN . AND .   Number is within range of two values (inclusive)    col_name BETWEEN 1.5 AND 10.5
# NOT BETWEEN . AND .   Number is not within range of two values (inclusive)    col_name NOT BETWEEN 1 AND 10
# IN (.)    Number exists in a list     col_name IN (2, 4, 6)
# NOT IN (.)    Number does not exist in a list     col_name NOT IN (1, 3, 5)



###########################
# Exercise 2 - Tasks
###########################
 
# Find the movie with a row id of 6

sql = "
  select *
  from movies
  where id=6"

sqldf(sql)
  id           title  director year length_minutes
1  6 The Incredibles Brad Bird 2004            116
###############################################################.
#  BETWEEN ... AND ...
#
#    and 
#
#  NOT BETWEEN ... AND ...
###############################################################.

# Find the movies released in the years between 2000 and 2010

sql = "select * 
       from movies
       where year >= 2000 and year <= 2010 "
sqldf(sql)
  id           title       director year length_minutes
1  4  Monsters, Inc.    Pete Docter 2001             92
2  5    Finding Nemo Andrew Stanton 2003            107
3  6 The Incredibles      Brad Bird 2004            116
4  7            Cars  John Lasseter 2006            117
5  8     Ratatouille      Brad Bird 2007            115
6  9          WALL-E Andrew Stanton 2008            104
7 10              Up    Pete Docter 2009            101
8 11     Toy Story 3    Lee Unkrich 2010            103
# you can combine the above where clause with a different select clause:
# For example:

sql = "select title, length_minutes, length_minutes / 60 as hours,
                  length_minutes - 60 * (length_minutes/60) as minutes,
                  year
       from movies
       where year >= 2000 and year <= 2010 "
sqldf(sql)
            title length_minutes hours minutes year
1  Monsters, Inc.             92     1      32 2001
2    Finding Nemo            107     1      47 2003
3 The Incredibles            116     1      56 2004
4            Cars            117     1      57 2006
5     Ratatouille            115     1      55 2007
6          WALL-E            104     1      44 2008
7              Up            101     1      41 2009
8     Toy Story 3            103     1      43 2010
# use the BETWEEN ... AND ...
# to do the same thing

sql = "select * 
       from movies
       where year between 2000 and 2010"

sqldf(sql)
  id           title       director year length_minutes
1  4  Monsters, Inc.    Pete Docter 2001             92
2  5    Finding Nemo Andrew Stanton 2003            107
3  6 The Incredibles      Brad Bird 2004            116
4  7            Cars  John Lasseter 2006            117
5  8     Ratatouille      Brad Bird 2007            115
6  9          WALL-E Andrew Stanton 2008            104
7 10              Up    Pete Docter 2009            101
8 11     Toy Story 3    Lee Unkrich 2010            103
# Find the movies not released in the years between 2000 and 2010

sql = "select * 
       from movies
       where year < 2000 or year > 2010"
sqldf(sql)
  id               title       director year length_minutes
1  1           Toy Story  John Lasseter 1995             81
2  2        A Bug's Life  John Lasseter 1998             95
3  3         Toy Story 2  John Lasseter 1999             93
4 12              Cars 2  John Lasseter 2011            120
5 13               Brave Brenda Chapman 2012            102
6 14 Monsters University    Dan Scanlon 2013            110
7 87              WALL-G Brenda Chapman 2042             92
# use the NOT BETWEEN ... AND ...
# to do the same thing

sql = "select * 
       from movies
       where year not between 2000 and 2010"

sqldf(sql)
  id               title       director year length_minutes
1  1           Toy Story  John Lasseter 1995             81
2  2        A Bug's Life  John Lasseter 1998             95
3  3         Toy Story 2  John Lasseter 1999             93
4 12              Cars 2  John Lasseter 2011            120
5 13               Brave Brenda Chapman 2012            102
6 14 Monsters University    Dan Scanlon 2013            110
7 87              WALL-G Brenda Chapman 2042             92

15.24.3 Practice - limit clause

###############################################################.
# LIMIT <M>       and     LIMIT <M> OFFSET <N>
###############################################################.
#
# The LIMIT clause must come at the very end of the SQL 
# statement. For example: 
#   
#      SELECT *
#      FROM grades
#      ORDER BY student
#      LIMIT 3
# 
# Displayes the first 3 students.
#
# In general:
#
#      <SOME SELECT STATEMENT>
#      limit M
#
# (where M is an integer) results in only the first
# M rows of data from what would normally have been displayed 
# had the LIMIT clause not been specified.
# 
#      <SOME SELECT STATEMENT>
#      limit M offset N
#
# (where both M and N are integers) 
# starts the output from the N+1'th row of what would normally 
# have been displayed without the limit clause and then 
# displays the next M rows of data.
###############################################################.


# Find the first 5 movies and their release year

# We will learn the best way to do this a little later. The
# answer on the sqlbolt website is not ideal.

sqldf("select * 
      from movies
      order by year")
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
# The limit clause allows to retrieve the first several values from an 
# ordered query

sqldf("select * 
      from movies
      order by year
      limit 5")
  id          title       director year length_minutes
1  1      Toy Story  John Lasseter 1995             81
2  2   A Bug's Life  John Lasseter 1998             95
3  3    Toy Story 2  John Lasseter 1999             93
4  4 Monsters, Inc.    Pete Docter 2001             92
5  5   Finding Nemo Andrew Stanton 2003            107
# Get the next 2 movies

sqldf("select * 
      from movies
      order by year
      limit 2 offset 5")
  id           title      director year length_minutes
1  6 The Incredibles     Brad Bird 2004            116
2  7            Cars John Lasseter 2006            117
# Find all movies that were released in 2000, 2004 or 2008

sql = 
  "select * 
  from movies
  where year = 2000 or year = 2004 or year = 2008"

sqldf(sql)
  id           title       director year length_minutes
1  6 The Incredibles      Brad Bird 2004            116
2  9          WALL-E Andrew Stanton 2008            104
# Do this again using the IN operator
#
# The IN operators takes a LIST of information
# A LIST is a set of values in parentheses, separated by commans.

sql = 
  "select * 
  from movies
  where year in (2000,2004,2008)"
sqldf(sql)
  id           title       director year length_minutes
1  6 The Incredibles      Brad Bird 2004            116
2  9          WALL-E Andrew Stanton 2008            104
# show the movies that were released in years other than 2000, 2004, 2008

sql = 
  "select * 
  from movies
  where year != 2000 and year != 2004 and year != 2008"

sqldf(sql)
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   7                Cars  John Lasseter 2006            117
7   8         Ratatouille      Brad Bird 2007            115
8  10                  Up    Pete Docter 2009            101
9  11         Toy Story 3    Lee Unkrich 2010            103
10 12              Cars 2  John Lasseter 2011            120
11 13               Brave Brenda Chapman 2012            102
12 14 Monsters University    Dan Scanlon 2013            110
13 87              WALL-G Brenda Chapman 2042             92
# A student asked if we could change the and's in the previous question
# to be or's instead. The answer is you can but in addition to 
# changing the and's to or's you must also modify the logical expression
# in other ways. Specifically, DeMorgan's law describes how to 
# rewrite any logical expression that uses and's and or's into an
# equivalent expression.
#
# DeMorgans law says: to convert a logical expression 
# step 1: negate it twice (ie. use two not's)
# Step 2: (a) distribute one of the not's over the expression
#         (b) change the and's to or's 
#         (c) change the or's to and's
sql = 
  "select * 
  from movies
  where not ( year = 2000 or year = 2004 or year = 2008)"

sqldf(sql)
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   7                Cars  John Lasseter 2006            117
7   8         Ratatouille      Brad Bird 2007            115
8  10                  Up    Pete Docter 2009            101
9  11         Toy Story 3    Lee Unkrich 2010            103
10 12              Cars 2  John Lasseter 2011            120
11 13               Brave Brenda Chapman 2012            102
12 14 Monsters University    Dan Scanlon 2013            110
13 87              WALL-G Brenda Chapman 2042             92
# do this with the "not in" operator

sql = 
  "select * 
  from movies
  where year not in (2000,2004,2008)"
sqldf(sql)
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   7                Cars  John Lasseter 2006            117
7   8         Ratatouille      Brad Bird 2007            115
8  10                  Up    Pete Docter 2009            101
9  11         Toy Story 3    Lee Unkrich 2010            103
10 12              Cars 2  John Lasseter 2011            120
11 13               Brave Brenda Chapman 2012            102
12 14 Monsters University    Dan Scanlon 2013            110
13 87              WALL-G Brenda Chapman 2042             92
# warning - the "in" operator is followed by a list of values.
# The list of values is in parentheses.
#
# if the values are numbers there are no quotes e.g. (2000,2004,2005)
#
# If the values are character values then in SQL you must use
# 'single quotes' around the different values.
#
# See the next question


# Write a SQL SELECT statment to show all the movies that
# were directed by any of the following people:
#     Pete Docter , Brad Bird, Dan Scanlon
#
# (a) Write it using the IN operator
# (b) write it without using the IN operator

# ANSWER - part (a)
sql = "select * 
       from movies
       where director IN ('Pete Docter', 'Brad Bird', 'Dan Scanlon')
       order by director"

sqldf(sql)
  id               title    director year length_minutes
1  6     The Incredibles   Brad Bird 2004            116
2  8         Ratatouille   Brad Bird 2007            115
3 14 Monsters University Dan Scanlon 2013            110
4  4      Monsters, Inc. Pete Docter 2001             92
5 10                  Up Pete Docter 2009            101
# ANSWER - part (b) - without the IN operator
sql = "select * 
       from movies
       where director='Pete Docter' or  director='Brad Bird' or director = 'Dan Scanlon'
       order by director"

sqldf(sql)
  id               title    director year length_minutes
1  6     The Incredibles   Brad Bird 2004            116
2  8         Ratatouille   Brad Bird 2007            115
3 14 Monsters University Dan Scanlon 2013            110
4  4      Monsters, Inc. Pete Docter 2001             92
5 10                  Up Pete Docter 2009            101

15.24.4 Practice - “like” operator

######################################################################.
# The "like" operator
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The like operator in sql uses a "pattern" to match data values
# This is similar to regular expressions. However, the SQL like 
# operator uses very different meta characters. 
#
# The two standard metacharacters that are used by the sql like 
# operator in the pattern are:
#
#   %    will "match" 0 or more characters  (equivalent to .* in regex)
#
#   _    i.e. an underscore, will "match" exactly one character (equivalent to . in regex)
#
# Some DBMS software allows for additional metacharacters in LIKE clauses
# For example, MySql allows for the following. (sqlite does not)
#
#   [abc]  MySQL DBMS software recognizes these "character classes" 
#   [a-c]  similar to regular expressions. Not all DBMS software recognizes
#   [^a-c] these. sqlite does not.
######################################################################.


# exercise 3

# Exercise 3 - Tasks
# Find all the Toy Story movies
sql = "select *
       from movies
       where title like 'Toy Story%'"
sqldf(sql)
  id       title      director year length_minutes
1  1   Toy Story John Lasseter 1995             81
2  3 Toy Story 2 John Lasseter 1999             93
3 11 Toy Story 3   Lee Unkrich 2010            103
# Find all the movies directed by John Lasseter

sql = "select *
       from movies
       where director = 'John Lasseter'"
sqldf(sql)
  id        title      director year length_minutes
1  1    Toy Story John Lasseter 1995             81
2  2 A Bug's Life John Lasseter 1998             95
3  3  Toy Story 2 John Lasseter 1999             93
4  7         Cars John Lasseter 2006            117
5 12       Cars 2 John Lasseter 2011            120
# Find all the movies (and director) not directed by John Lasseter
sql = "select * 
       from movies 
       where director != 'John Lasseter'"
sqldf(sql)
   id               title       director year length_minutes
1   4      Monsters, Inc.    Pete Docter 2001             92
2   5        Finding Nemo Andrew Stanton 2003            107
3   6     The Incredibles      Brad Bird 2004            116
4   8         Ratatouille      Brad Bird 2007            115
5   9              WALL-E Andrew Stanton 2008            104
6  10                  Up    Pete Docter 2009            101
7  11         Toy Story 3    Lee Unkrich 2010            103
8  13               Brave Brenda Chapman 2012            102
9  14 Monsters University    Dan Scanlon 2013            110
10 87              WALL-G Brenda Chapman 2042             92
# The following will work but is NOT recommended (no pun intended)
sql = "select * 
       from movies 
       where NOT ( director = 'John Lasseter' ) "
sqldf(sql)
   id               title       director year length_minutes
1   4      Monsters, Inc.    Pete Docter 2001             92
2   5        Finding Nemo Andrew Stanton 2003            107
3   6     The Incredibles      Brad Bird 2004            116
4   8         Ratatouille      Brad Bird 2007            115
5   9              WALL-E Andrew Stanton 2008            104
6  10                  Up    Pete Docter 2009            101
7  11         Toy Story 3    Lee Unkrich 2010            103
8  13               Brave Brenda Chapman 2012            102
9  14 Monsters University    Dan Scanlon 2013            110
10 87              WALL-G Brenda Chapman 2042             92
# Find all the WALL-* movies
sql = "select *
       from movies
       where title like 'WALL-_'"
sqldf(sql)
  id  title       director year length_minutes
1  9 WALL-E Andrew Stanton 2008            104
2 87 WALL-G Brenda Chapman 2042             92
movies
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
# QUESTION 
# Show all movies that start with a "T" and end with an "s"

sql = "select * 
       from movies
       where title like 'T%s'
       order by title"
sqldf(sql)
  id           title  director year length_minutes
1  6 The Incredibles Brad Bird 2004            116
# QUESTION 
# Show all movies whose title is exactly 6 charcters (including spaces)

sql = "select * 
       from movies
       where title like '______'
       order by title"
sqldf(sql)
  id  title       director year length_minutes
1 12 Cars 2  John Lasseter 2011            120
2  9 WALL-E Andrew Stanton 2008            104
3 87 WALL-G Brenda Chapman 2042             92
# QUESTION
# Show all movies whose title starts with A,B or C
# and whose title is 10 characters or longer

sql ="select title
      from movies
      where title BETWEEN 'A' AND 'D' and
            title LIKE '__________%'"
sqldf(sql)
         title
1 A Bug's Life
# QUESTION
# Show all movies whose title starts with A,B or C
# and whose title is fewer than 10 characters or long

sql ="select title
      from movies
      where title BETWEEN 'A' AND 'D' and
            title NOT LIKE '__________%'"
sqldf(sql)
   title
1   Cars
2 Cars 2
3  Brave
sql="select substr(title, 1,1) from movies order by 1"
sqldf(sql)
   substr(title, 1,1)
1                   A
2                   B
3                   C
4                   C
5                   F
6                   M
7                   M
8                   R
9                   T
10                  T
11                  T
12                  T
13                  U
14                  W
15                  W
sqldf("select * from movies where substr(title,1,1) = 'M'")
  id               title    director year length_minutes
1  4      Monsters, Inc. Pete Docter 2001             92
2 14 Monsters University Dan Scanlon 2013            110
sqldf("select * from movies where title < 'D'")
  id        title       director year length_minutes
1  2 A Bug's Life  John Lasseter 1998             95
2  7         Cars  John Lasseter 2006            117
3 12       Cars 2  John Lasseter 2011            120
4 13        Brave Brenda Chapman 2012            102

15.24.5 Practice - select distinct

################################################################.
# SELECT DISTINCT ...
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Depending on which values are selected, a SELECT ... statement
# might return exactly the same row in the output multiple times. 
#
# To instead, show only one copy of each row in the output,
# start the statement with  SELECT DISTINCT ... 
# (instead of just SELECT ...)
#
# See the examples below.
################################################################.


# The output of the following query includes some rows
# that are exactly the same as other rows.

sqldf("select year, honors, major 
      from grades
      where honors = TRUE and major = 'IDS'
      order by year")
  year honors major
1   fr   TRUE   IDS
2   fr   TRUE   IDS
3   ju   TRUE   IDS
4   se   TRUE   IDS
5   so   TRUE   IDS
# The following query is exactly the same as the previous query, except
# that the following query includes the word DISTINCT at the beginning
# of the query.
#
# This version of the query does not include any duplicated rows in the output.

sqldf("select DISTINCT year, honors, major 
      from grades
      where honors = TRUE and major = 'IDS'
      order by year, major")
  year honors major
1   fr   TRUE   IDS
2   ju   TRUE   IDS
3   se   TRUE   IDS
4   so   TRUE   IDS
sqldf("select DISTINCT student, year, honors, major 
      from grades
      where honors = TRUE and major = 'IDS'
      order by year, major")
   student year honors major
1     anne   fr   TRUE   IDS
2 lucricia   fr   TRUE   IDS
3     mike   ju   TRUE   IDS
4    david   se   TRUE   IDS
5 samantha   so   TRUE   IDS
# Exercise 4


# Show just the director column
sql = "select director 
       from movies"
sqldf(sql)
         director
1   John Lasseter
2   John Lasseter
3   John Lasseter
4     Pete Docter
5  Andrew Stanton
6       Brad Bird
7   John Lasseter
8       Brad Bird
9  Andrew Stanton
10    Pete Docter
11    Lee Unkrich
12  John Lasseter
13 Brenda Chapman
14    Dan Scanlon
15 Brenda Chapman
# List all directors of movies, without duplicates

sql = "select DISTINCT director 
       from movies"
sqldf(sql)
        director
1  John Lasseter
2    Pete Docter
3 Andrew Stanton
4      Brad Bird
5    Lee Unkrich
6 Brenda Chapman
7    Dan Scanlon
# List all directors of movies (alphabetically), without duplicates
sql = "select DISTINCT director 
       from movies
       order by director"
sqldf(sql)
        director
1 Andrew Stanton
2      Brad Bird
3 Brenda Chapman
4    Dan Scanlon
5  John Lasseter
6    Lee Unkrich
7    Pete Docter
# Both the where clause and the order by clause may refer to information
# that is not actually displayed by the select clause
# 
# For example

# List the title and year for all movies directed by John Lasseter.
# Do not John Lasseter's name in the output.
sql = "select title, year 
       from movies
       where director = 'John Lasseter'"
sqldf(sql)
         title year
1    Toy Story 1995
2 A Bug's Life 1998
3  Toy Story 2 1999
4         Cars 2006
5       Cars 2 2011
# List the title of all movies in "reverse chronological order". 
# In other words, the most recent movie should be listed first and the 
# oldest movie listed last.
# Do NOT show the actual year in the output.
sql = "select title 
       from movies
       order by year desc"
sqldf(sql)
                 title
1               WALL-G
2  Monsters University
3                Brave
4               Cars 2
5          Toy Story 3
6                   Up
7               WALL-E
8          Ratatouille
9                 Cars
10     The Incredibles
11        Finding Nemo
12      Monsters, Inc.
13         Toy Story 2
14        A Bug's Life
15           Toy Story
# List the last four movies released (ordered from least recent to most recent)
sql = "select * 
       from movies
       order by year"
sqldf(sql)
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
# the following does the same thing
sql = "select * 
       from movies
       order by year asc"
sqldf(sql)
   id               title       director year length_minutes
1   1           Toy Story  John Lasseter 1995             81
2   2        A Bug's Life  John Lasseter 1998             95
3   3         Toy Story 2  John Lasseter 1999             93
4   4      Monsters, Inc.    Pete Docter 2001             92
5   5        Finding Nemo Andrew Stanton 2003            107
6   6     The Incredibles      Brad Bird 2004            116
7   7                Cars  John Lasseter 2006            117
8   8         Ratatouille      Brad Bird 2007            115
9   9              WALL-E Andrew Stanton 2008            104
10 10                  Up    Pete Docter 2009            101
11 11         Toy Story 3    Lee Unkrich 2010            103
12 12              Cars 2  John Lasseter 2011            120
13 13               Brave Brenda Chapman 2012            102
14 14 Monsters University    Dan Scanlon 2013            110
15 87              WALL-G Brenda Chapman 2042             92
# List the last four movies released (ordered from most recent to last)
#
# To order a column in descending order follow the name of the column with desc
sql = "select * 
       from movies
       order by year desc"
sqldf(sql)
   id               title       director year length_minutes
1  87              WALL-G Brenda Chapman 2042             92
2  14 Monsters University    Dan Scanlon 2013            110
3  13               Brave Brenda Chapman 2012            102
4  12              Cars 2  John Lasseter 2011            120
5  11         Toy Story 3    Lee Unkrich 2010            103
6  10                  Up    Pete Docter 2009            101
7   9              WALL-E Andrew Stanton 2008            104
8   8         Ratatouille      Brad Bird 2007            115
9   7                Cars  John Lasseter 2006            117
10  6     The Incredibles      Brad Bird 2004            116
11  5        Finding Nemo Andrew Stanton 2003            107
12  4      Monsters, Inc.    Pete Docter 2001             92
13  3         Toy Story 2  John Lasseter 1999             93
14  2        A Bug's Life  John Lasseter 1998             95
15  1           Toy Story  John Lasseter 1995             81
sql = "select * 
       from movies
       order by year desc
       limit 4"
sqldf(sql)
  id               title       director year length_minutes
1 87              WALL-G Brenda Chapman 2042             92
2 14 Monsters University    Dan Scanlon 2013            110
3 13               Brave Brenda Chapman 2012            102
4 12              Cars 2  John Lasseter 2011            120
# List all movies sorted alphabetically by title

sql = "select * 
       from movies
       order by title"
sqldf(sql)
   id               title       director year length_minutes
1   2        A Bug's Life  John Lasseter 1998             95
2  13               Brave Brenda Chapman 2012            102
3   7                Cars  John Lasseter 2006            117
4  12              Cars 2  John Lasseter 2011            120
5   5        Finding Nemo Andrew Stanton 2003            107
6  14 Monsters University    Dan Scanlon 2013            110
7   4      Monsters, Inc.    Pete Docter 2001             92
8   8         Ratatouille      Brad Bird 2007            115
9   6     The Incredibles      Brad Bird 2004            116
10  1           Toy Story  John Lasseter 1995             81
11  3         Toy Story 2  John Lasseter 1999             93
12 11         Toy Story 3    Lee Unkrich 2010            103
13 10                  Up    Pete Docter 2009            101
14  9              WALL-E Andrew Stanton 2008            104
15 87              WALL-G Brenda Chapman 2042             92
# List the first five movies sorted alphabetically
sql = "select * 
       from movies
       order by title
       limit 5"
sqldf(sql)
  id        title       director year length_minutes
1  2 A Bug's Life  John Lasseter 1998             95
2 13        Brave Brenda Chapman 2012            102
3  7         Cars  John Lasseter 2006            117
4 12       Cars 2  John Lasseter 2011            120
5  5 Finding Nemo Andrew Stanton 2003            107
# List the next five movies sorted alphabetically

sql = "select * 
       from movies
       order by title
       limit 5 offset 5"
sqldf(sql)
  id               title      director year length_minutes
1 14 Monsters University   Dan Scanlon 2013            110
2  4      Monsters, Inc.   Pete Docter 2001             92
3  8         Ratatouille     Brad Bird 2007            115
4  6     The Incredibles     Brad Bird 2004            116
5  1           Toy Story John Lasseter 1995             81
# List the next five movies sorted alphabetically

sql = "select * 
       from movies
       order by title
       limit 5 offset 10"
sqldf(sql)
  id       title       director year length_minutes
1  3 Toy Story 2  John Lasseter 1999             93
2 11 Toy Story 3    Lee Unkrich 2010            103
3 10          Up    Pete Docter 2009            101
4  9      WALL-E Andrew Stanton 2008            104
5 87      WALL-G Brenda Chapman 2042             92

15.24.6 Practice - more practice - north_american_cities

# Review 1 - Tasks
#
# https://sqlbolt.com/lesson/select_queries_review


# See the table (i.e. dataframe)
sqldf("select * north_american_cities")
Error: near "north_american_cities": syntax error
# List all the Canadian cities and their populations

sql = "select country, city, population
       from north_american_cities
       where country='Canada'"
sqldf(sql)
  country     city population
1  Canada  Toronto    2795060
2  Canada Montreal    1717767
# Order all the cities in the United States by their latitude from north to south
sql = "
select *
from north_american_cities
where country = 'United States'
order by latitude desc"
sqldf(sql)
          city       country population latitude  longitude
1      Chicago United States    2718782 41.87811  -87.62980
2     New York United States    8405837 40.71278  -74.00594
3 Philadelphia United States    1553165 39.95258  -75.16522
4  Los Angeles United States    3884307 34.05223 -118.24368
5      Phoenix United States    1513367 33.44838 -112.07404
6      Houston United States    2195914 29.76043  -95.36980
# List all the cities west of Chicago, ordered from west to east

sql = "select *
from north_american_cities
where longitude < -87.62980
order by longitude
"
sqldf(sql)
                 city       country population latitude  longitude
1         Los Angeles United States    3884307 34.05223 -118.24368
2             Phoenix United States    1513367 33.44838 -112.07404
3         Guadalajara        Mexico    1500800 20.65970 -103.34961
4         Mexico City        Mexico    8555500 19.43261  -99.13321
5 Ecatepec de Morelos        Mexico    1742000 19.60184  -99.05067
6             Houston United States    2195914 29.76043  -95.36980
# List the two largest cities in Mexico (by population)

# Start by listing all of the cities in Mexico in order of decreasing population
sql = "select *
       from north_american_cities
       where country = 'Mexico'
       order by population desc"
sqldf(sql)
                 city country population latitude  longitude
1         Mexico City  Mexico    8555500 19.43261  -99.13321
2 Ecatepec de Morelos  Mexico    1742000 19.60184  -99.05067
3         Guadalajara  Mexico    1500800 20.65970 -103.34961
# use the limit clause to limit results to only a specified number of values
# Start by listing all of the cities in Mexico
sql = "select *
       from north_american_cities
       where country = 'Mexico'
       order by population desc
       limit 2"
sqldf(sql)
                 city country population latitude longitude
1         Mexico City  Mexico    8555500 19.43261 -99.13321
2 Ecatepec de Morelos  Mexico    1742000 19.60184 -99.05067
# The limit clause takes an option "offset"
# Each row in the data has an imaginary number starting with 0 for the 1st row
# and 1 for the 2nd row , etc.

sql = "select * from north_american_cities
       order by population desc"
sqldf(sql)
                  city       country population latitude  longitude
1          Mexico City        Mexico    8555500 19.43261  -99.13321
2             New York United States    8405837 40.71278  -74.00594
3          Los Angeles United States    3884307 34.05223 -118.24368
4              Toronto        Canada    2795060 43.65323  -79.38318
5              Chicago United States    2718782 41.87811  -87.62980
6              Houston United States    2195914 29.76043  -95.36980
7               Havana          Cuba    2106146 23.05407  -82.34519
8  Ecatepec de Morelos        Mexico    1742000 19.60184  -99.05067
9             Montreal        Canada    1717767 45.50169  -73.56726
10        Philadelphia United States    1553165 39.95258  -75.16522
11             Phoenix United States    1513367 33.44838 -112.07404
12         Guadalajara        Mexico    1500800 20.65970 -103.34961
# Do the same thing but only get the 3 largest cities
sql = "select * from north_american_cities
       order by population desc
       limit 3"
sqldf(sql)
         city       country population latitude  longitude
1 Mexico City        Mexico    8555500 19.43261  -99.13321
2    New York United States    8405837 40.71278  -74.00594
3 Los Angeles United States    3884307 34.05223 -118.24368
# Do the same thing but get the 4th, 5th and 6th largest cities
sql = "select * from north_american_cities
       order by population desc
       limit 3 offset 3"
sqldf(sql)
     city       country population latitude longitude
1 Toronto        Canada    2795060 43.65323 -79.38318
2 Chicago United States    2718782 41.87811 -87.62980
3 Houston United States    2195914 29.76043 -95.36980
# Do the same thing but get the 7th 8th and 9th largest cities
sql = "select * from north_american_cities
       order by population desc
       limit 3 offset 6"
sqldf(sql)
                 city country population latitude longitude
1              Havana    Cuba    2106146 23.05407 -82.34519
2 Ecatepec de Morelos  Mexico    1742000 19.60184 -99.05067
3            Montreal  Canada    1717767 45.50169 -73.56726
# List the third and fourth largest cities (by population) in the United States and their population

# Let's start by showing all USA cities
sql = "select * 
       from north_american_cities
       where country = 'United States'
       order by population desc
       "
sqldf(sql)
          city       country population latitude  longitude
1     New York United States    8405837 40.71278  -74.00594
2  Los Angeles United States    3884307 34.05223 -118.24368
3      Chicago United States    2718782 41.87811  -87.62980
4      Houston United States    2195914 29.76043  -95.36980
5 Philadelphia United States    1553165 39.95258  -75.16522
6      Phoenix United States    1513367 33.44838 -112.07404
# Use limit 2 offset 3 to just show the 3rd and 4th largest cities
sql = "select * 
       from north_american_cities
       where country = 'United States'
       order by population desc
       limit 2 offset 2
       "
sqldf(sql)
     city       country population latitude longitude
1 Chicago United States    2718782 41.87811  -87.6298
2 Houston United States    2195914 29.76043  -95.3698

15.25 dynamic SQL - a brief introduction

NOTE:

  • “IDS1020-Intro to IDS” students can ignore this section.
########################################################################.
# It is very common for SQL to be "embedded" in another language
# very similar to what we are doing with R. There are ways to 
# "embed" SQL in all popular programming languages.
#
# This technique of building a sql statement  from different parts of
# info using another language (in this case R) is known
# as "dynamic sql".
########################################################################.

#################################################################.
# Brief intro to an advanced concept: "dynamic SQL"?
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# As we've already learned, the sqldf function takes a
# single character value that contains
# a SQL command, e.g. sqldf("SELECT * FROM SOME_TABLE")
# This allows us to use the SQL language inside our R programs.
# However, you must remember that SQL and R are very different 
# languages. As we learn SQL, we will focus primarily on how to write
# various forms of SQL commands. We wont focus much on R
# other than to use the sqldf function to run the SQL code.
#
# However, sometimes it is adventageous to 
# use R code (more than just sqldf) to "dynamically" construct a
# SQL statement based on some other information that is 
# available to your R code but would not have been available
# at the time you are writing the SQL code.
#
# For example, the following function, mySelect
# takes two arguments: 
#
#    table - The name of a table 
#    cols - The names of the columns to display from the table
#
# It displays just the specified columns from the specified table.

mySelect = function( table, cols){
  colsList = paste(cols, collapse=",")
  selectStatement = paste0( "select ", colsList, " from ", table)
  sqldf(selectStatement)
}

# The function can now be called with different tables and 
# lists of columns

mySelect("movies", c("title","director"))
                 title       director
1            Toy Story  John Lasseter
2         A Bug's Life  John Lasseter
3          Toy Story 2  John Lasseter
4       Monsters, Inc.    Pete Docter
5         Finding Nemo Andrew Stanton
6      The Incredibles      Brad Bird
7                 Cars  John Lasseter
8          Ratatouille      Brad Bird
9               WALL-E Andrew Stanton
10                  Up    Pete Docter
11         Toy Story 3    Lee Unkrich
12              Cars 2  John Lasseter
13               Brave Brenda Chapman
14 Monsters University    Dan Scanlon
15              WALL-G Brenda Chapman
mySelect("grades", c("student", "year", "test1", "test2"))
    student year test1 test2
1       joe   fr    70    82
2       sue   fr    80    80
3      anne   fr    90    95
4     frank   so   100    95
5       bob   so    30    NA
6  samantha   so   100    70
7     larry   ju    32    80
8      bill   se    84    90
9  lucricia   fr    80   100
10 margaret   se    59    47
11    dalia   se    85    92
12      zvi   ju    90    98
13     mike   ju    90    86
14    david   se    90    87
15     dana   so   100    93