PMsquare

View Original

Learning SQL

You have probably heard the term “SQL” and may even know what it is used for, but if you don’t know how to write or use it, or simply aren’t using it, this article will help you start your journey into learning SQL (Structured Query Language).

 

While SQL can be used to perform a variety of tasks against a database, the most common is to return rows and columns of data.  This article will focus on the basics in this realm, and will assume the source of data is an SQL supported relational database (RDBMS), with the most common of these being Microsoft’s SQL Server and Oracle. To execute the example SQL statements in this article, the user will need a login with security giving access to the database that has the samples from IBM Cognos Analytics and a tool/program that SQL statements can be executed from, along with some basic knowledge of a RDBMS.

Why is understanding SQL important? Business Analytics/reporting applications ultimately convert the report definitions to SQL that is submitted to the source of the data when the source is an RDBMS. This holds true for IBM’s Cognos Analytics (and all prior versions of Cognos).

History of SQL via Wikipedia

SEQUEL was developed in the 1970s by a team from IBM for accessing data from its System R database. Later renamed SQL (to avoid a trademark infringement), it became a standard of the American National Standards Institute (ANSI) in 1986 and of the International Organization for Standardization (ISO) in 1987. Since then, the standard has been revised to include a larger set of features. Despite the existence of standards, most SQL code requires at least some changes before being ported to different database systems.

Understanding Terms as They Related to Databases and SQL

  • Row in an SQL result set is synonymous with a record in a database

  • Column in an SQL result is synonymous with a column in a database

  • Tables in a database contain have 1 or more columns, and contain 0 or more records

  • Databases typically contain many tables that have some sort of relationship between them (hence the common designator of “Relational Database Management System” or RDBMS)

Most Basic SQL Statements

As mentioned, the most common use of SQL is to retrieve data from a database and that SQL is commonly referred to as a “SELECT statement.”  While the SELECT statement uses the keyword SELECT, there is more to a “SELECT statement” than just the SELECT keyword. Let’s briefly define 3 of the more common keywords used in a typical SELECT statement.

  • SELECT column1, column2, columnN…

    • This is typically the start of an SQL statement. SELECT tells the database I want to retrieve data from the columns that follow it.

    • The multiple columns are separated by a comma

    • The columns can use a 2-part reference, Table.Column

    • To get all columns, a “SELECT *” can be used instead of specifying each column

  • FROM table1, table2, tableN…

    • The FROM clause lists the table(s) that the data is to be retrieved from

  • WHERE expression1 and/or expression2 and/or expressionN…

    • The WHERE clause is used to set filters to constrain the results

    • It can also be used to indicate how 2 tables are related, or JOINed

  • Some quick examples explained:

    • SELECT * FROM table1→ will return all rows and columns from table1

    • SELECT col1, col2 FROM table1 → will return all rows and col1 and col2 from table1

    • SELECT * FROM table1 WHERE col1 = ‘value’ → will return all rows that have col1 = ‘value’

Going Beyond Basic

Now that you have the idea of how a basic SELECT statement is built and works, let’s explore how to enhance it to perform desired activities.

Keywords/Clauses

  • ORDER BY - This is used to order (sort) the rows based on specified criteria. Multiple columns can be specified separated by a comma

  • GROUP BY - This is used to summarize/aggregate columns “By” columns included in this clause, again multiple columns can be grouped, separated by a comma.

  • Summary clauses - used in conjunction with GROUP BY… indicates how the non-grouped column(s) should be summarized

    • SUM - SUM or Total the values (numeric values only)

    • MIN - Use the Minimum value within the grouping

    • MAX - Use the Maximum value within the grouping

    • FIRST - Use the First value within the grouping

    • LAST - Use the Last value within the grouping

    • others

  • INNER JOIN - Used within the FROM clause to define the relationship/join between 2 tables (instead of doing so in the WHERE clause). INNER keyword indicates the record must exist in both tables. Rows with no match in the other table will be excluded from the result

  • <left or right> OUTER JOIN - Similar to INNER JOIN except the LEFT or RIGHT designator indicates that a match is required on the indicated table. This preserves records from that table even if there are no matching criteria on the other table

  • ON clause is used with the JOIN clauses to indicate the criteria the join must meet

Beyond basic examples (Notice I have a new line when the section of SQL changes keyword):

  • SELECT col1, col2, col3, col4 
     FROM table1 
     WHERE col1 = ‘value’ 
     ORDER BY col2 ASC, col3 DESC 

This will return 4 columns, that meet the criteria, and be sorted by col2 in ascending order and then col3 in descending order within the col2 values.

  • SELECT col1, col2, sum(col3) 
     FROM table1 
     WHERE col1 = ‘value’ 
     GROUP BY col1, col2

This will sum up all col3 values for each unique grouping (all combinations of col1 and col2). SUM is probably the most common aggregation/summary function

  • SELECT T1.col1, T1.col2, T1.col3, T2.col1
    FROM table1 T1 
    INNER JOIN table2 T2 ON T1.col4 = T2.col2

Here we are getting 4 columns from 2 tables, using the INNER JOIN clause with ON to set the relationship between the tables.  In this example, I am also utilizing aliases for the tables in the FROM clause… this is done with a space following the table, followed by the alias: table1 T1 with T1 being the alias which is now used in the SELECT and INNER JOIN clauses.

Conclusion

SQL is probably one of the more useful languages to know in the Data/Analytics world. Understanding SQL and what is happening behind the scenes with applications like IBM Cognos Analytics, will ultimately help you get to the next level in your Business Analytics journey. Learning the basic SELECT statements and how to use all the various clauses will open up a world of possibilities for what SQL can do for you

In Cognos Analytics, within Report authoring, once you have a report defined, you can see the SQL that Cognos Analytics is producing to send to the database.

In report authoring/edit mode, click the more button and select Show generated SQL/MDX

This is from the sample report Examine coffee sales hourly by store

Next Steps

We hope you found this article informative. If you have any questions regarding SQL or how to use it, reach out to us! Be sure to also subscribe to our newsletter to have PMsquare original articles, updates, and insights delivered directly to your inbox.

See this gallery in the original post