Welcome back on my journey of learning SQL with DQLab!
Let’s explore more !
Suppose we got a sales transaction report of kotak pensil and flashdisk 32 GB for branch X. However, we found that the sales data table did not contain a product name column, so I was very confused to identify each product category.
If you are already proficient in SQL this will be easy. To get a product name column, you only need to combine the sales data table with another table in the database that contains product details including nama_produk, this merging mechanism is called JOIN.
With JOIN, several tables in a database can be combined with each other. This capability is actually the hallmark and strength that differentiates RDMS from other database systems.
Table 1: ms_item_warna
The table above consists of two columns containing information on the name of the fruit or vegetable with their color, and has eight rows of data.
If we want to know information not only about the color, but also the type / category of each item, whether it is a vegetable or fruit, then we can not get that information if only based on ms_item_warna table. Therefore, we need to jointhis table to another table that contains this information, which is the ms_item_kategory table.
Table 2: ms_item_kategori
This table consists of two columns containing information on plant names with their categories, and has eight rows of data.
Just like the problem in the previous table, if we only have the ms_item_kategori table then we can’t get information about the plant’s name and color. Therefore, we can combine the two tables into one new table that contains complete information about the name, color and category of each item / item.
First, try typing a query in the code editor to see the entire contents of the ms_item_kategori and ms_item_warna table columns.
We can see that the two tables have a relationship / relationship, which is nama_barang column (ms_item_warna table) and the nama_item column (ms_item_kategori table) because there are many matches of the data contents.
These two columns are hereinafter referred to as key columns.
Combine Table by Key Columns
So, let’s combine ms_item_kategori table and ms_item_warna using key columns. Here is the query:
You can see the results of combining the two tables, is a new table with four columns and six rows of data. Please note that this merge is temporary, meaning that the original table in the database does not undergo any changes, and the new merged table is also not stored in the database automatically.
The Explanation of Joining Table Query
Merging two tables into one new table using the SELECT query in the previous section is done in the following way:
- Type two table names to be used separated by comma (,)
- Write pairs of key columns with the conjunctive operator equal to (=) in the filter or condition section.
Here is the detail:
This way of joining is called a join using the comma operator.
Problem: the joining tables produces six rows of data, whereas in the two origin tables each has eight rows of data.
The two tables at the top, which are the ms_item_kategory table and the ms_item_warna table, are the data source tables, and the bottom part is the result table of the combination of the two tables.
From the combined table, it can be seen that the data rows that appear are only data rows whose data contents are in the two key columns. Data whose contents are available for both tables are only six out of a total of eight data rows. Meanwhile, data rows containing belimbing, jamur, apel dan daun bawang, are not included in both tables, so this data row will be excluded and will not appear in the combined table.
The Explanation of the Process of Joining Table
The magenta arrows indicate the matching and joining process. For example: The first data line with the contents of “bayam” in the name_item key column (ms_item_kategori table), will look for the same contents in the nama_barang key column (ms_item_warna table), and is found on the second row and then linked into one row in the new merged table.
Matches with this search are based on the conditional clause. In the query below, this condition is marked in yellow.
After all data is found with this condition, and if you do not specify a specific column, then all data columns from both tables will be displayed with the same contents / rows of data and contained in both key columns combined into one row in the new table.
The data contents marked in red are data that do not have a match and therefore cannot be combined, and thus do not have combined data in the final result.
The matching and joining method is called an INNER JOIN — where the contents of the key column of the two tables must match each other before they can be merged.
What if the Table Order is Changed?
If you are going to change the order of the tables in the FROM section of the inner join query, only the column order changes but the data content and the amount of data do not change.
Try to change the query
If we type the second query. We will get this result:
You can see that the amount of data generated is still 6 rows of data, but with a different column order.
Where the first two columns are from ms_item_warna table, and the next two columns are from ms_item_barang table. This is in accordance with the order in which the table names are typed after FROM.
Using Tables’s Name Prefix
As explained before, if we use wildcard (*), and do not specify the column name that will be displayed in the SELECT section, then by default the order of the columns starts with the column from the table that is stated first in the FROM section.
However, it is also possible to take advantage of wildcards by adding a table name prefix.
By specifying this table name prefix, it is possible to determine the column order of which table appears first.
For example, if you want to combine the two tables, first by stating the ms_item_warna table in the FROM section, but what you want to display the column at the beginning of the result is the one with ms_item_kategori, then the query will be as follows.
Joining without Any Conditions
This joining process can also be done using the comma method and without using relationship conditions between columns.
It can be seen from the query above, we only include the names of the two tables that the data will be retrieved, but there is no information on how the two tables relate to one another through the key column.
Here is the result:
You can see a lot of results, this is because every row of data in the two tables will be linked to each other — without any relationship.
The total of sixty-four rows of data is the product of the sum of the data from the two tables, each of which has eight rows of data. This method of joining the two tables is called a cross join mechanism.
INNER JOIN Syntax
The use of prefixes in table_1.nama_kolom and table_2.name_kolom is intended so that there is clarity of which column in the table is used as a reference in the INNER JOIN process. Of course, using this prefix is very useful if you want to join multiple tables.
If each of these tables is analogous to two sets then the INNER JOIN process is equivalent to INTERSECTION between the two sets.
INNER JOIN process has combination concept for records in both tables which have the same value.
Now, type this query:
You will get this result:
First, let’s display all data of tr_penjualan table and ms_produk table
Here is the query:
What is the query to have this result table?
We can see that kode_produk from tr_penjualan and kode_produk from ms_produk are the key column. The first column is belong to tr_penjualan, so this is the first table that we need to do the inner join.
So, here is the query:
We need to write the prefix of each table in ON section, so it is not ambigious.
Understand Query Process and Results with INNER JOIN
What is the difference between the table before merging with INNER JOIN and after merging ?
Now all columns from both tables are in one table and contain all columns from both tables. So, I can easily identify the product name of each product code purchased by the customer
Now, compare the kode_produk column in the query results with the INNER JOIN above, with the kode_produk column in the ms_produk table. What do you think is different ?
We cannot find prod-06 in the query results table, even though in the ms_produk table there was prod-06 data, which is multifunctional pen + DQLAB laser
Next, compare it with the tr_penjualan table, do you find the prod-06 data in the tr_penjualan table ?
There is no transaction with kode_produksi prod-06 in the tr_penjualan table.
So, what can we conclude ?
INNER JOIN will match the tr_penjualan table column key with the ms_produk column key, if the value of the column key is the same in both tables, then that row will be returned as a query result and form a table containing all the columns of the two tables, whereas if the value key column is only in one column, so this row will not be displayed. prod-06 is in the table in ms_produk but not in the sales data, so when merging the table with the INNER JOIN, the prod-06 data from the ms_produk table will not appear in the query results.
INNER JOIN Syntax — Selecting some columns what want to be displayed
This part of the query requires selecting which columns from the two tables will be combined. Here, it is necessary to use the table name prefix.
This part of the query is used to merge tabel1 with tabel2
Last part of the query is reference for combining table1 and table2 based on the column that has a high level of match. The two columns with a high level of match in each table are also called key columns.
It should be noted that if you display a column with the same name in both tables, then in the Select section, you can not only type in the column name, but also must be preceded by a prefix of the table name where the column originates to avoid errors due to ambiguity.
Now, try to combine the tr_penjualan and ms_produk table. Columns displayed from the tr_produk table are kode_transaksi, kode_pelanggan, kode_produk, qty. For ms_produk table, display nama_produk dan harga.
Here is the expected result table:
So, we need to form a new column which is total, which is the result of multiplying each row in the harga column in the ms_produk table with the qty column in the tr_produk table.
Here is the query:
Here, we select the kode_transaksi, kode_pelanggan, kode_produk from tr_penjualan table, then nama_produk column from ms_produk, next we state qty column fromtr_penjualan table. Last we multiply qty and harga from tr_penjualan table with alias Total.
The first column is from tr_penjualan, so we type FROM tr_penjualan and do inner join with ms_produk.
We combine both table based on kode_produk from tr_penjualan and kode_produk from ms_produk.
Suppose we have data of the sales of a store. However, when we try to check the database, there are 2 sales tables, which are:
- Table A consists of transactions for transaction code ‘tr-001’ to ‘tr-003’
- Tabel B consists of transactions for transaction code ‘tr-004’ to ‘tr-006’
What is the effective way to make analysis of the two tables using SQL? Can I use the JOIN method as we learned before ?
First, SELECT all the column from Table A
Then, SELECT all the column from Table B
Conditions for UNION statement:
- the number of tabel_A and tabel_B columns is the same
- columns in table_A and table_B have the same data type
- the columns in table_A and table_B have the same position order
Both table_A and table_B already have the same number of columns, and also the order of the column positions is the same, so we can immediately combine the two columns by adding UNION.
Using UNION with WHERE Clause
What if we just want to combine the table consists of sales data for kode_produk prod-4 ?
Just adding WHERE for both of SELECT Statement
Here is the example:
Try to combine the table consists of sales data for kode_pelanggan ‘dqlabcust03'
Here is the query:
It can be seen that only the sales data kode_pelanggan ‘dqlabcust03’ are selected.
Aligning the Column
It happens that both tables A & B have the same number of columns, position, and column names. What if the column positions of the two tables are not the same? Can’t you UNION it?
Of course we can.
See the table below:
The number of columns from both tables is the same which are 7 columns, but the column position of the ContactName column from the two tables is not the same. In the Customer table, the position of the ContactName column is in 3rd Column , while in the supplier table it is in the 2nd column.
If we union it directly, we will get this result:
Of course, this is an unwanted UNION result, therefore, sort the position of the column in the SELECT-Statement and also select the column you want to combine, so you don’t need all columns from both tables to be UNION-ed, as follows:
If there is a difference in the column names between the first SELECT-statement and the second SELECT-statements, by default the column names from the first SELECT-statement will be used.
Using UNION and Aligning the Column
Here is the syntax to see the result of UNION query and aligning the columns
Here is the result:
Here, we see that CustomerName from Customers Table is combined with SupplierName from Suppliers Table with CustomerName as the name of the column.
What is the difference between UNION and JOIN ?
UNION and JOIN are used to combine data from two or more tables. But what makes the difference is how the tables are combined. We use JOIN when we are joining tables horizontally, so the join will contain the columns of the two or more combined tables.
The following is an illustration of merging tables with the JOIN method:
In the JOIN method, merging is done based on certain keys / columns contained in the tables to be combined and these keys / columns have values that are interrelated. As seen in the figure, Column A and Column E are interrelated keys / columns so that the two tables can be combined by matching the values of these two columns. The JOIN process cannot be performed if there are no related keys / columns in the two or more tables to be combined.
For UNION as already explained, it is used when you want to join tables vertically, that is, join rows from two or more tables. Unlike JOIN, for merging with UNION, there is no need for an associated key / column but UNION requires that the number of columns from the tables to be combined is the same and is in the same position.
In the UNION merging process, there are no additional columns but the number of rows will be increased.
The basic difference between JOIN and UNION is that JOIN combines 2 or more tables based on related rows whereas UNION joins 2 tables vertically.
INNER JOIN MINI PROJECT
In the database, there is a ms_pelanggan table which contains data for customers who buy products and tr_penjualan table which contains data on purchase transactions in a store.
One day, the marketing & promotion department asked for help querying the data of customers who bought Kotak Pensil DQLab, Flashdisk DQLab 32 GB, and Sticky Notes DQLab 500 sheets.
Make a query using ms_pelanggan and tr_penjualan tables to get the data requested by marketing, namely kode_pelanggan, nama_customer, alamat.
NB: Gunakan SELECT DISTINCT untuk menghilangkan duplikasi, jika diperlukan.
First, we need to use SELECT DISTINCT to remove any duplicates on kode_pelanggan, nama_customer, and alamat.
Then, we make the query using ms_pelanggan and tr_penjualan table with FROM dan INNER JOIN based on key column which is kode_pelanggan
The marketing & promotion department asked for data of customers who bought Kotak Pensil DQLab, Flashdisk DQLab 32 GB, and Sticky Notes DQLab 500 sheets. So, this is the conditions to be met. So, we use WHERE with operand OR.
Here is the result:
If we use SELECT, not SELECT DISTINCT, we will get this result:
UNION MINI PROJECT
Prepare catalog data regarding the names of products sold in a store. The data will be used in the meeting to review which products will continue to be sold and which will be discontinued.
Prepare only product data with a price below 100K for kode_produk ‘prod-1’ to ‘prod-5’; and below 50K for kode_produk ‘prod-6’ to ‘prod-10’.
When checking the product data on database, there are tow tables
which both contain catalog data:
First we select data from ms_produk_1 which met the condition of price below 100000.
Then, we UNION with the data from ms_produk_2 which met the condition of price below 50000.
Use WHERE clause to meet the condition for both table.
We see that it will only display prod-1 to prod-5 with price below 100000 and prod-6 to prod-10 with price below 50000.
Wow, we’ve finished the Fundamental SQL module Using INNER JOIN and UNION. While studying with this module, we have been able to understand and be able to practice:
- Merging of two tables using WHERE clause and cross join technique.
- Merging of two tables using INNER JOIN
- Merge two tables vertically using UNION.
- Working on a mini project which is an integration of the whole material and of course the materials in the previous modules to solve business problems.
With this capability, we can be confident in processing data with SQL. This skill alone is 60% of the initial activity an analyst will undertake.
See you on the next one!
‘Without data, you’re just another person with an opinion’
- W. Edwards Deming