Fundamental SQL Using Select Statement
Hello all! I am starting my journey on exploring more about data analyst.
Let’s start with SQL!
What is SQL ?
SQL is the abbreviation of Structured Query Language, which is a standard computer language used to interact with a database system — or rather a Relational Database Management System (RDBMS).
So, user can insert, alter, update, searching, and delete data from database system using SQL
There are two categories of SQL Interaction:
- Data Definition Language (DDL) -> various commands that have function to manipulate database structures, such as create (CREATE), change (ALTER), and delete (DROP) data storage structures, namely databases, tables, columns and data types.
- Data Manipulation Language (DML) -> various commands that have function insert data (INSERT), retrieve data or query (SELECT), change data (UPDATE) and delete data (DELETE).
Here is the query for selecting data:
With that query, database system will understand that it must display data ‘nama_produk’ from a table named ‘ms_produk’ as follows:
We can see that there are 10 office supplies with DQLab label displayed. This show that interaction between SQL and database system runs correctly.
TIDAK BISA SCREENSHOOT
Why Do We Need to Learn SQL ?
Basically, every company has a data storage system, especially for companies that have IT systems. This storage system is not on a computer or laptop in the form of files or folders, but in a database system. Well, this database system is usually accessed using SQL. As an analyst, our job is not only to analyze the data that is already available but also to be able to retrieve, modify and access the data from the data source, which is the database.
Mastery of SQL will also help companies in the following areas:
- Management needs reports with increasingly diverse information, such as: month-to-month sales trends, customer growth, whether the company has reached its target, and others. And this requires good SQL skills.
- Programmers who build application systems almost certainly always depend on the SQL database system so that their applications run well. Thus, mastery of SQL is absolute.
- Can improve company performance because rich information can be generated via SQL.
Where is SQL used ?
Companies that have implemented IT systems must have a database system and store the data in a database. For example, technology-based companies, such as e-commerce, store data, be it user profile data, purchase and sale transaction data, product data and traffic data from user visits to website pages in a database system — or more precisely database management system (DBMS). ).
All information or analysis required by management is generally sourced and processed from this DBMS. And in companies, database systems are usually not one, two, three or even tens. Therefore, SQL plays a very important role here, because using SQL can meet these management needs. Without mastery of SQL, it will be difficult to obtain the required data, and it will be difficult to analyze and produce information needed by management and the company.
However, please note that not all database systems support SQL. Only database systems based on the relational database management system (RDBMS) support this language. The RDMBS itself will be explained later.
SQL is only used in database systems based on Relational Database Management System.
First, let’s learn about RDBMS!
Relational Database Management System which is commonly abbreviated as RDMBS, is a program that allows to create, update, and manage a relational database. Generally, RDMBS use SQL to access the database.
Relational database itself is a type of database where data is generally stored a structured form in the form of tables (rows and columns) and each table/data contained in the database is relational to one another, as shown in the following image:
Relational databases are very popular and widely used by companies because this type of database is easy to manage especially if you have a lot of data or information that needs to be stored, scalable and flexible.
- The relational database is quite easy to manage. Each table / data can be updated or modified without disturbing other tables / data.
- Flexible: if you need to update your data, you only need to do it once — so there’s no need to change multiple files one by one. Apart from that, the relational database is also quite easy to extend. For example, when there is more and more data, can easily increase the capacity of the database.
RDBMS Products
Apart from MySQL, there are other RDBMS products, both proprietary and open source. Here are some of the products that are quite popular on the market:
- MySQL -> the open-source SQL database is quite popular. Generally used for web application development.
- PostgreSQL -> open-source RDBMS product, and also commonly used for web application development. However, in terms of performance, PostgreSQL is slower than MySQL.
- Oracle DB -> RDBMS product which is owned by Oracle Corporation and this product is proprietary or not open source. Oracle DB is commonly used in the banking industry.
- Microsoft SQL Server -> QL Server is an RDBMS product that is owned by Microsoft. Just like Oracle DB, SQL Server is proprietary or not open source, SQL Server is generally used in large-scale companies that also use other Microsoft products.
- SQLite -> Open source RDBMS, generally used as a database on cellphones, MP3 players, and other devices.
In addition, there is also MariaDB which is also free or open source, IBM DB2, Microsoft Access, and many others.
Generally RDBMS uses SQL to access database and RDBMS products are not only one but there are various kinds of products, so the SQL syntax can be slightly different for each of these products. Here’s a comparison of MySQL, Oracle, and SQLSERVER to display multiple rows of data from a table:
Storage Structure of RDBMS
As a data store, a relational database system has a hierarchical data storage structure as follows:
- Database
- Table
- Column or Field
Each database can contain multiple tables, and each table can consist of multiple columns. In every database, tables and columns have their own names to identify them. These tables and columns will be filled with data which then forms a row (row of data).
Table and Column
This picture is an example of a table in database. Every table in the database has a name, the name of this table is ms_product.
ms_produk table has four columns, each with the following name:
- no_urut
- kode_produk
- nama_produk
- harga
In the table, there are 10 rows of data with varying data contents, for example the data content for the column “nama_produk” on the fifth row is “Gift Voucher DQLab 250 rb”.
Conclusion
- Relational database management system (RDBMS) is the most popular database system in the world today and uses the SQL language for processing.
- Some of the well-known RDBMS products include Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2, and many others.
- The data storage structure in RDBMS uses a hierarchy as follows:
- Database
- Table
- Column
4. Data is inputted to the table in the form of data rows.
SELECT
Selecting All Columns from a Table
To access data in a database, we use SELECT statement. In SELECT statement, we declare which columns we want to display from a table in the database. SELECT statement is not independent. After declaring which columns we want to display, we continue with FROM. From this form, we state from which table the data that we want to display is located.
SELECT…FROM… is the simplest statement in SQL, and is the main part of the query. We can query the data without this statement.
Here is the basic and simple query of SELECT statement that have function to display all the column:
- The first word, SELECT, is used to inform the system that we want to select the data.
- The * (asterisk) means that all columns need to be retrieved from the referenced table. This sign is often referred to as a wildcard.
- FROM [NAMA_TABLE], it means that the table in which the data will be retrieved.
- ; (semicolon) is a sign that represents the end of another SELECT or SQL command.
Let’s try to display data from ms_produk table !
If you run the query, you will get this result:
Selecting One Column from a Table
Generally, here is the syntax for selecting just one column from a table:
Here is the query to select just nama_produk column:
Result:
Selecting More Than One Column from a Table
ms_produk table has more than one column. If we want to select anpther column, we just need to write each column that we want to display seperated by a comma.
Using ms_produk table, I use this query of SELECT statement to display two column, kode_produk and nama_produk
Result:
Try to select nama_produk and harga column from ms_produk table:
Result:
Limiting the Selecting the Numbers of Row Data
Besides limiting the column, we could limit the number of data rows we select. As it is said before, each product of RDMBS has a different way. For MySQL and PostgreSQL, we can use LIMIT. Here is the syntax:
For example, we can use LIMIT statement to limit the selection of the data from ms_produk for three rows:
Result:
It can be seen only three first rows are displayed out of a total of ten rows of data.
Try to display the first five rows of nama_produk and harga column.
Result:
SELECT DISTINT Statement
Here is the table of ms_pelanggan:
From the table, we see duplicate data in nama_customer and alamat for sequence number 3 & 11, and 5 & 12 which are exactly the same with different customer_code.
To remove the duplicate data, we can use SELECT DISTINCT statement. With SELECT DISTINCT, any duplicate data will be eliminated and only unique data will be displayed.
Here is the syntax:
Try to display nama_customer and alamat from ms_pelanggan table and remove any duplicate data.
Result:
If we count the number of rows of different data, we get 10 different rows of data from the ms_customer table.
Using Prefix on Column Name
Basically, using a complete column name needs to include the table name in front of the column name, with a hyphen in the form of a period. Generally, if we only select columns from one table, this prefix is rarely used because it is clear which table the column came from. But when we take data from two tables, for example by combining 2 tables using JOIN, and there are 2 columns with the same name, then the use of a prefix is important to avoid errors due to ambiguity. Here is the basic syntax:
To display nama_produk column from ms_produk table with table’s name prefix, we use this syntax:
Result:
The use of this prefix will be very useful when you want to identify data from multiple tables, which we will learn in the SQL course for the JOIN topic.
Try to display kode_produk column from ms_produk:
Result:
Using Alias on Column Name
Apart from the prefix, I can change the identity of the column names taken with SELECT using the AS keyword. This is called an alias. Changes in table names are temporary, meaning that they only change when retrieving / querying data, while column names in tables in the database will not change.
Here is the example to change kode_produk column name to product_code from ms_product table
Result:
We can see that kode_produk has change to product_code.
Note: alias cannot be used for wildcard(*)
Let’s try changing the column name with this details:
- no_urut becomes nomor
- nama_produk becomes nama
Result:
Remove ‘AS’ Keyword
The ‘AS’ keyword that is used as an alias marker in a column can be removed with the syntax:
Here we change kode_produk column name to product_code from ms_product table without ‘AS’ Keyword.
We get the same result as before:
Let’s try changing the column name without ‘AS’:
- no_urut becomes nomor
- nama_produk becomes nama
Result:
Combining Prefix and Alias
Prefix and alias can also be used at the same time.
Let’s implement it with the ms_produk table, using the table name prefix and alias to change nama_produk to nama.
Result:
Try to display harga column as harga_jual from ms_produk table complete with the prefix
Result:
Using Alias for The Table
Apart from columns, aliase can also be used for tables using the AS keyword after the table name. And, this keyword can also be used or not. Generally we use aliase on tables if the table name is long enough and appears or is referenced several times in a query. So by using alias fortables, it can save time in writing queries, especially for queries that are quite complicated, long and, involve many tables.
Here is the example of using alias t1 for the ms_produk table
Result:
Try to change the name of ms_produk table to t2, and display all the content without using AS keyword
Result:
Prefix with Table Alias
If we use table alias, the prefix name used for columns is the table alias. Using the original table name as a prefix will cause an error when the query is executed because with the use of an alias, the table name has been temporarily replaced by the alias.
Try to display kode_produk and nama_produk column by using prefix of the table alias.
Result:
Change the command in the code editor with the alias t2 — without using the AS keyword — for the ms_produk table and display the nama_produk and harga columns, complete with the alias prefix.
Result:
Sometimes we only want to display data based on certain conditions, so we don’t need all the data from the table. If that is the case, it means that we only want to retrieve product data with a specific product name. Well, what do you do in SQL?
For cases where we want to retrieve data based only on certain conditions, we can use filters. SQL has a filter function using the WHERE clause. If the WHERE condition is met, the query results will only display filtered data.
WHERE
WHERE clause from SELECT is used for filtering data based on certain conditions. Here is the syntax:
This is the format of the simplest condition:
Suppose we want to display data in which nama_produk consist of ‘Gantungan Kunci DQlab’. Here is the syntax:
Result:
It can be seen that only one row of data displayed from a total of ten, namely data where the nama_produk column contains the value ‘
It can be seen that only one row of data has been removed from a total of ten, namely data where the product_name column contains the value ‘DQLab Key Chain’.
How about we try to select data in which nama_produk consist of ‘Tas Travel. Organizer DQLab’
Result:
Using OR Operand
In the previous section, we used simple text filter to display data with the product_name ‘Gantungan Kunci DQLab’ and ‘Tas Travel Organizer DQLab’ respectively.
How about if we want to display both of them ?
We can use OR operand.
To display query result which contains product data with nama_produk ‘Gantungan Kunci DQLab’ and ‘Tas Travel Organizer DQLab’, we can use a simple logic, t means we need to select data with the condition that the product_name is ‘Gantungan Kunci DQLab’ OR ‘Tas Travel Organizer DQLab’. This logic can be denoted using the logic OR.
So, using OR logic, we can combine two or more conditions to filter the data. So, to display data with the condition that the nama_produk column is ‘Gantungan Kunci DQLab’ OR ‘Tas Travel Organizer DQLab’, you can use the following syntax:
Result:
It can be seen that not only one row of data is displayed, but there are two rows of data because it mets the condition that is given to SELECT statement
Add one more nama_produk ‘Flashdisk DQLab 64 GB’ to the table
Result:
Filter for Numbers
Before, we’ve already filtering based on text. However, WHERE is not limited to string, but generally for number.
Here is an example of filter where harga column must have value below 50000
Result:
We can see that all the data selected has met the condition in which the price is below 50000
Try to display the product data in which harga column is above 50000
Result:
Using AND Operand
Previously, we studied the OR operand, we can also use the AND operand so that two or more conditions are all met. If either of the conditions is not met, the data will not be retrieved.
Here is an example when two conditions are used with AND operand
When you run the syntax, there will be no output. Why ??
It is because there is no nama_produk column consists of “Gantungan Kunci DQLab” and harga column with the value above 50,000. That’s why there is no output as the both conditions are not met.
How about we change so that the price is below 50000, with the sama nama_produk “Gantungan Kunci DQLab”
Here will be the output
MINI PROJECT
Project from Branch A
“So, can you prepare sales transaction data with total revenue> = IDR 100,000?”
The data format that you will display is: kode_pelanggan, nama_produk, qty, harga, dan total, sorted by the largest total revenue.
If this is the case, it means I need to query the data from the tr_penjualan fin the company database.
This is what I can do:
- Multiplication between the qty and the price column to get the total revenue for each customer code that is stated in the total column
- Use “ORDER BY total DESC” at the end of the query to sort the data
Here we go !
It can be seen from the query above,
- We select kode_pelanggan, nama_produk, qty, harga, and also qty*harga, and we name it with total alias.
- We get this data from tr_penjualan table.
- We must met the condition of total revenue is greater or equal to 100000, so we type WHERE qty*harga> =100000.
- Last, we sort it by the largest total revenue, so we use ORDER BY total DESC
- Don’t forget to put ; to represent the end of another SELECT or SQL command.
Result:
From this lessons, we have understood and been able to practice:
- SQL Concept:
- Relational Database Management System (RDBMS)
- RDBMS storage structure consisting of database, tables, columns and rows.
- Introduction to the SELECT statement for retrieving data from tables.
2. SELECT Technique:
- Selecting certain columns
- Selecting specific amount of data
- Using prefix and alias
- Using filter
With this capability, we are eady to retrieve and process data simply. This skill alone is 60% of the initial activity an analyst will undertake.
KEEP LEARNING!
Source: https://dqlab.id/