Blogs

Join Query Optimization in Data Modules

Sonya Fournier, May 8, 2020

Get the Best Solution for
Your Business Today!

Improve Performance

Optimization of joined queries in Cognos Analytics is a useful way to improve performance by reducing the data returned by the database. Joins with one-to-many or many-to-one cardinality use key values from the table on the one side of the join to filter rows in the table on the many sides of the join. Like a developer who uses IN, BETWEEN, or a subquery in their custom SQL query, so can the Cognos query engine.

Join optimization in models and reports was introduced in 10.2.1. In Report Studio, it was available in the Query Explorer in the properties of a join:

In Framework Manager, you could set it on the join in Properties > (DQM) Filter Types:

Framework Manager, Filter Type in Properties

It is also a Filter Type property in a Cognos Analytics report:

Cognos Analytics Report Filter Type Property

Join optimization in Data Modules is defined in the relationship between two tables, and a few more options are included:

What’s the difference?

The three types of filters are (in FM, Report Studio, and report authoring in CA):

IN: the generated filter in the SQL is an IN predicate comprised of a set of values of the join keys from the one side.
BETWEEN: the generated filter is a BETWEEN predicate comprised of the minimum and maximum of the join key values from the one side.
TABLE: the generated filter is an IN predicate containing a list of join keys from the one side, but the list is compiled by a subquery.

In data modules, the names have changed but they work the same way.

No filtering is the same as (Default) or None.
The Unique values option is the same as IN.
The Range of values option is the same as BETWEEN.
Unique values in a subquery setting are the same as TABLE.

To illustrate this I built a data module that contains two tables, Product and Order Details. They have a relationship defined that looks like this:

Data Modules - Product and Order Details Relationship

The join keys for these two tables are Product.Product Number and Order Details.Product Number.

No filtering

If I do not set the join optimization filter, this is the SQL I get when I pull in Product.Product Type Code and Order Details.Quantity into a list.

SELECT

    "PRODUCT0"."PRODUCT_TYPE_CODE" AS "Product_Type_Code",

    SUM("ORDER_DETAILS0"."QUANTITY") AS "Quantity"

FROM

    "GOSALES"."gosales"."PRODUCT" "PRODUCT0"

INNER JOIN "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS0"

    ON "PRODUCT0"."PRODUCT_NUMBER" = "ORDER_DETAILS0"."PRODUCT_NUMBER"

GROUP BY

    "PRODUCT0"."PRODUCT_TYPE_CODE"

Unique values (IN)

With the Unique values setting, a list of unique join key values from the query the one side is used to filter the query on the many side. (In the case of a one-to-one relationship the filter is applied to the second table.)

Using the example above, if I change the filter optimization in the join in the data module to Unique values, the SQL is very different. We are filtering the many side of the join (Order Details) by every unique value for Product Number (the join key) in the Products table.

WITH

"ORDER_DETAILS0" AS

    SELECT

 "ORDER_DETAILS"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

 "ORDER_DETAILS"."QUANTITY" AS "QUANTITY",

 "ORDER_DETAILS"."ORDER_DETAIL_CODE" AS "ORDER_DETAIL_CODE",

 "ORDER_DETAILS"."ORDER_NUMBER" AS "ORDER_NUMBER",

 "ORDER_DETAILS"."SHIP_DATE" AS "SHIP_DATE",

 "ORDER_DETAILS"."PROMOTION_CODE" AS "PROMOTION_CODE",

 "ORDER_DETAILS"."UNIT_COST" AS "UNIT_COST",

 "ORDER_DETAILS"."UNIT_PRICE" AS "UNIT_PRICE",

 "ORDER_DETAILS"."UNIT_SALE_PRICE" AS "UNIT_SALE_PRICE"

    FROM

 "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS"

    WHERE

 "ORDER_DETAILS"."PRODUCT_NUMBER" IN (

     1110, 2110, 3110, 4110, 5110, 6110, 7110, 8110, 9110, 10110, 11110, 12110,  13110, 14110, 15110, 16110, 17110, 18110, 19110, 20110, 21110, 22110, 23110, 24110, 25110, 26110, 27110, 28110, 29110, 30110, 31110, 32110, 33110, 34110, 35110, 36110, 37110, 38110, 39110, 40110, 41110, 42110, 43110, 44110, 45110, 46110, 47110, 48110, 49110, 50110, 51110, 52110, 53110, 54110, 55110, 56110, 57110, 58110, 59110, 60110, 61110, 62110, 63110, 63130, 63140, 64110, 65110, 65120, 66110, 67110, 68110, 68120, 68190, 68200, 68220, 68230, 68240, 68250, 69110, 69120, 70110, 70120, 70140, 70160, 70200, 70240, 70250, 71110, 72110, 73110, 74110, 75110, 76110, 77110, 78110, 79110, 80110, 81110, 82110, 83110, 84110, 85110, 86110, 87110, 88110, 89110, 90110, 91110, 92110, 93110, 94110, 95110, 96110, 97110, 98110, 99110, 100110, 101110, 102110, 103110, 104110, 105110, 106110, 107110, 108110, 109110, 110110, 111110, 112110, 113110, 114110, 115110, 122110, 122120, 122130, 122140, 122150, 123110, 123120, 123130, 123140, 123150, 123160, 124110, 124120, 124130, 124140, 124150, 124160, 124170, 124180, 124190, 125110, 125120, 125130, 125140, 125150, 125160, 125170, 126110, 126120, 126130, 126140, 126150, 127110, 127120, 127130, 127140, 127150, 128110, 128120, 128130, 128140, 128150, 128160, 128170, 128180, 128190, 128200, 128210, 129110, 129120, 129130, 129140, 129150, 129160, 129170, 129180, 130110, 130120, 130130, 130140, 130150, 131110, 131120, 131130, 132110, 132120, 132130, 132140, 132150, 132160, 132170, 133110, 134110, 134120, 134130, 134140, 135110, 135120, 135130, 136110, 136120, 136130, 136140, 136150, 140110, 141110, 142110, 143110, 143120, 143130, 143140, 143150, 144110, 144120, 144130, 144140, 144150, 144160, 144170, 144180, 144190, 144200, 145110, 145120, 145130, 145140, 145150, 145160, 145170, 145180, 146110, 146120, 146130, 146140, 147110, 149150, 149160, 151110, 151120, 151130, 152110, 153110,  154110, 154120, 154130, 154140, 154150 )

),

"ORDER_DETAILS_1" AS

    SELECT

 "ORDER_DETAILS0"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

 "ORDER_DETAILS0"."QUANTITY" AS "QUANTITY"

    FROM

 "ORDER_DETAILS0"

SELECT

    "PRODUCT_1"."PRODUCT_TYPE_CODE" AS "Product_Type_Code",

    SUM("ORDER_DETAILS_1"."QUANTITY") AS "Quantity"

FROM

    "GOSALES"."gosales"."PRODUCT" "PRODUCT_1"

 INNER JOIN "ORDER_DETAILS_1"

 ON "PRODUCT_1"."PRODUCT_NUMBER" = "ORDER_DETAILS_1"."PRODUCT_NUMBER"

GROUP BY

    "PRODUCT_1"."PRODUCT_TYPE_CODE"

In this example, because we are pulling from the same source system, we are probably not going to see much of a difference in run time for the query. As this is a sample source of data, all the products are represented in the Order Details table. The filter will not make a big difference (and could actually slow it down).

When would I use Unique values?

Where this type of filter optimization will really help is if we are using different sources of data. Let’s say for example we have an Excel sheet where we have selected our highest revenue-generating products for each product line. We bring it into a data module and join it to our billion-row Order Details table in the data warehouse. Because the sources are different, when querying for a report that joins the two sources, Cognos will run two separate queries – one for the spreadsheet and one for the enormous orders table. If we use a Unique values filter, it will take the unique product numbers from the product spreadsheet and use them to filter the orders warehouse table by those values.

Cognos Analytics Top Products and Order Details Relationship

Excel query:

SELECT

    'Page1_10'.'Product_number',

    'Page1_10'.'Product_line',

    'Page1_10'.'Product_type',

    'Page1_10'.'Product',

    'Page1_10'.'Revenue'

FROM

      'GO_sales_for_Joins_-_top_5_products_for_each_line_xlsx_Page1_1' 'Page1_10'

Order Details table query:

SELECT

    "ORDER_DETAILS"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

    "ORDER_DETAILS"."QUANTITY" AS "QUANTITY",

    "ORDER_DETAILS"."UNIT_SALE_PRICE" AS "UNIT_SALE_PRICE"

FROM

    "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS"

WHERE

    "ORDER_DETAILS"."PRODUCT_NUMBER" IN (

 13110, 26110, 106110, 44110, 90110, 86110, 97110, 144180, 85110, 11110, 12110, 105110, 102110, 107110, 62110, 48110, 45110, 94110, 73110, 145170 )

In this situation, we should see a faster query return from the Order Details query since we are only focusing on a few products.

Range of values (BETWEEN)

The Range of values filters the many side with a BETWEEN operator using the minimum value and maximum value applied to the join keys from the one side.

So, in our first example of Product and Order Details from the same source, when the SQL is generated, we should see this (remember – Product is on the one side and Order Details is on the many side):

WITH

"ORDER_DETAILS0" AS

    SELECT

        "ORDER_DETAILS"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

        "ORDER_DETAILS"."QUANTITY" AS "QUANTITY",

        "ORDER_DETAILS"."ORDER_DETAIL_CODE" AS "ORDER_DETAIL_CODE",

        "ORDER_DETAILS"."ORDER_NUMBER" AS "ORDER_NUMBER",

        "ORDER_DETAILS"."SHIP_DATE" AS "SHIP_DATE",

        "ORDER_DETAILS"."PROMOTION_CODE" AS "PROMOTION_CODE",

        "ORDER_DETAILS"."UNIT_COST" AS "UNIT_COST",

        "ORDER_DETAILS"."UNIT_PRICE" AS "UNIT_PRICE",

        "ORDER_DETAILS"."UNIT_SALE_PRICE" AS "UNIT_SALE_PRICE"

    FROM

        "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS"

    WHERE

        "ORDER_DETAILS"."PRODUCT_NUMBER" BETWEEN 1110 AND 154150

),

"ORDER_DETAILS_2" AS

    SELECT

        "ORDER_DETAILS0"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

        "ORDER_DETAILS0"."QUANTITY" AS "QUANTITY"

    FROM

        "ORDER_DETAILS0"

SELECT

    "PRODUCT_2"."PRODUCT_TYPE_CODE" AS "Product_Type_Code",

    SUM("ORDER_DETAILS_2"."QUANTITY") AS "Quantity"

FROM

    "GOSALES"."gosales"."PRODUCT" "PRODUCT_2"

        INNER JOIN "ORDER_DETAILS_2"

        ON "PRODUCT_2"."PRODUCT_NUMBER" = "ORDER_DETAILS_2"."PRODUCT_NUMBER"

GROUP BY

    "PRODUCT_2"."PRODUCT_TYPE_CODE"

In this case, we are filtering on the range of product numbers between 1110 and 154150.

Again, with the same source, this is not particularly important to do and will likely not make a huge difference in performance.

If we look at the Excel example, we will see this:

SELECT

    'Page1_10'.'Product_number',

    'Page1_10'.'Product_line',

    'Page1_10'.'Product_type',

    'Page1_10'.'Product',

    'Page1_10'.'Revenue'

FROM

    'ft2656444632_GO_sales_for_Joins_-_top_5_products_for_each_line_xlsx_Page1_1' 'Page1_10'

SELECT

    "ORDER_DETAILS"."PRODUCT_NUMBER" AS "PRODUCT_NUMBER",

    "ORDER_DETAILS"."QUANTITY" AS "QUANTITY",

    "ORDER_DETAILS"."UNIT_SALE_PRICE" AS "UNIT_SALE_PRICE"

FROM

    "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS"

WHERE

    "ORDER_DETAILS"."PRODUCT_NUMBER" BETWEEN 11110 AND 145170

This one isn’t going to help us with performance since the range is so broad and we are really only focusing on a few products.

When would I use Range of values?

We would probably use Range of values when there are a lot of unique values for a key column and the sources are different and where there are a lot of key values in the many-side query that are outside of the range of values in the one-side table.

It doesn’t make any real sense to use Range of values with our previous two examples but if we were joining on time, then it makes sense that we may want to use Range of values.

In this example, we have a relationship between the Go Time dimension (from the Great outdoors data warehouse) and Order Details from the transactional data source.

Cognos Relationship - Go To Time dimension and Order Details

In this data module, there is a filter on the GO Time Dim that filters on just two years (2012 and 2013).

If we use a Range of values filter, we will see the following SQL:

SELECT

    "GO_TIME_DIM0"."DAY_DATE" AS "DAY_DATE",

    "GO_TIME_DIM0"."MONTH_EN" AS "MONTH_EN"

FROM

    "GOSALESDW"."gosalesdw"."GO_TIME_DIM" "GO_TIME_DIM0"

WHERE

    "GO_TIME_DIM0"."CURRENT_YEAR" IN (

        2012,

        2013 )

SELECT

    "ORDER_DETAILS"."SHIP_DATE" AS "SHIP_DATE",

    "ORDER_DETAILS"."QUANTITY" AS "QUANTITY"

FROM

    "GOSALES"."gosales"."ORDER_DETAILS" "ORDER_DETAILS"

WHERE

    "ORDER_DETAILS"."SHIP_DATE" BETWEEN CONVERT(DATETIME2, '2012-01-01 00:00:00') AND CONVERT(DATETIME2, '2013-12-31 00:00:00')

We are limiting the Order Details query to just the two years we are filtering on in the GO Time dimension.

We should see an optimization in the query using this filter.

Other options:

The other two options for filtering, Unique values in a subquery (Table) and Unique or range of values, work the same way. Unique values in a subquery works like Unique values but uses a subquery to define the values for the IN predicate. You may not see this in the native SQL but if you look at IBM Cognos SQL you will see that same IN statement.

Cognos Analytics Unique values in a subquery

Unique or range of values option is unusual: while it ignores cardinality it can use either the either a list of unique values (IN) or a range of values (BETWEEN) from the one side to filter values from the many side.

Cognos Analytics Unique or range of values

Conclusion

I hope this clears up (even slightly) a fairly complicated topic. I will admit that I have been skimming over that part of the Relationship dialog when I’m teaching Data Modules. I’m so glad I took the time to get a deeper idea of how this all works. Please let me know in the comments if you have any questions or comments.

Next Steps

We hope you found this article informative. Be sure to subscribe to our newsletter for data and analytics news, updates, and insights that are delivered directly to your inbox.

If you have any questions or would like PMsquare to provide guidance and support for your analytics solution, contact us today.

Tags:

Ideation Session

GenAI Assessment

Health Check

Legacy Systems Transformation

Digital Application Modernization

Financial Operations

Data Analytics Optimization

Application Modernization

Staff Augmentation

SaaS Anywhere

Rapid Application Prototyping

Proof of Value

Financial Services

Health Care and Life Sciences

Manufacturing

Media and Entertainment

State and Local Government

Code Assist

Automatey.ai

Status Crow

CogBox

Thrive

Incorta Integrations

Customer Churn

Who We Are

Technology Partners

Training and Enablement

Careers

Blogs

Events

Success Stories

Blogs

Join Query Optimization in Data Modules

Share

Sonya Fournier, May 8, 2020

Improve Performance

What’s the difference?

No filtering

Unique values (IN)

When would I use Unique values?

Range of values (BETWEEN)

When would I use Range of values?

Other options:

Conclusion

You may also like..

Financial
Operations

Staff
Augmentation