Standardizing Values in the Data Warehouse - Pentaho Data Integration ETL Tutorial

in #utopian-io8 years ago (edited)

Pentaho Data Integration ETL tool has many built in capabilities that assist the programmer in building business intelligent data for the organization.

What Will I Learn?

  • You will learn Converting Values to Characters or Strings using ETL
  • You will learn Converting Data to Integer
  • You will learn Setting All Numeric Values in the Same Format

Requirements

Pentaho Data Integration ( ETL) - also called Kettle

Difficulty

  • Intermediate

Tutorial Contents

  • How to Convert Values to Characters or Strings
  • Incoming Values as Dates
  • The Value Converted to a Decimal Numeric
  • Need a Value in Hex
  • Converting to Integer
  • All Numeric Values in the Same Format

Standardizing Values in the Data Warehouse - Pentaho Data Integration ETL Tutorial

There are multiple was to generate intelligence from data. A programmer can derive intelligence from an individual field, multiple fields, decision constructs, and subsets of field contents. Using the elements from the extract step the programmer can do many things utilizing function expressions.

How to Convert Values to Characters or Strings

ToChar returns the value of a date or a number as a character representation of the value. Using the ToChar on a date field requires the use of the format parameter. An example includes myage=46 would be converted as ToChar(myage) returning a value of '46'.

Converting dates is a bit trickier. Consider MyDate = July 13 1963, the example ToChar(MyDate, 'MM/DD/YY') would return '07/13/63' but ToChar(MyDate,'YYYYMMDD') would return 19630713 . It is important that the date formats used as a parameter are carefully considered to follow the accepted format of the data warehouse.

Additionally, a TRUE or FALSE boolean value can be returned as ToChar(1=2) would return FALSE.

image.png

Incoming Values as Dates

ToDate returns the value of a character, date, or integer value as a date representation. Again the parameter of the date format must be carefully considered. NewDate = '18/01/2018'. If the parameter is set incorrectly as in ToDate(NewDate,'DD/MM/YYYY') then the value returned would be January 18 2018 instead of December 01 2018.

If an integer value is used as the input value the date format must be Julian (the number of days since Dec.1, 4713BC). This function expression can also be used to apply date formatting to any value coming in as a date.

Suppose your invoicing system in Europe uses a different date format than your invoicing system in the United States. Using the ToDate function allows that all dates entering the warehouse can be easily standardized.

image.png

The Value Converted to a Decimal Numeric

ToDouble converts a value to a double precision, floating point number. When the incoming value is already a numeric value it simply returns the incoming value. When the value is a character the returned value represents the string from the left to the first character that is not a digit or a decimal point.

When the value is a date, ToDouble returns the Julian representation of the value, and for BOOLEAN values it returns an integer value of 1 or 0. Where Invalue = 123.45, ToDouble(Invalue) would return 123.45. In the case of Invalue = '123..45' then ToDouble(Invalue) would return 123.00 because when it found the second period in the value it would stop processing.

Utilizing ToDouble(1=2) would cause a FALSE to be generated and a value of 0 returned.

image.png

Need a Value in Hex

ToHex will convert any value incoming to the function into its hex string representation. While this is not very useful to a business user, if there tends to be data quality problems this function can be used to assist the programmer in finding and resolving the issue. ToHex('ABC') would return 0x41424300.

image.png

Converting to Integer

ToInteger will taking an incoming value and return a number as a type integer. For input values of type integer it returns the value unchanged. When the input value is a decimal or floating point number the returned value is the whole number with the decimal positions truncated.

If using this function on a date, the equivalent returned value is a Julian date. Finally, if the value being acted upon is a BOOLEAN type, the result will be a 1 or a 0. ToInteger(123.45) would return 123 and ToInteger(1=1) would return 1.

image.png

All Numeric Values in the Same Format

ToNumber is used to incorporate numeric formatting. The format of the function requires the programmer to predetermine the maximum number of digits the number can contain and the maximum number of digits to the right of the decimal position.

This can be useful when working with values that absolute precision is not necessary and causes overhead that is difficult to maintain. ToNumber(8.544444,15,2) would truncate the value to 8.54, losing the decimal precision.

While standardizing the values in the data warehouse can seem intimidating at first the tool has predefined functionality to allow the programmer to do bring the data into line regardless of the source system the data initiates from.



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

Hey @yissakhar I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Suggestions

  • Contribute more often to get higher and higher rewards. I wish to see you often!
  • Work on your followers to increase the votes/rewards. I follow what humans do and my vote is mainly based on that. Good luck!

Get Noticed!

  • Did you know project owners can manually vote with their own voting power or by voting power delegated to their projects? Ask the project owner to review your contributions!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Coin Marketplace

STEEM 0.04
TRX 0.32
JST 0.074
BTC 64559.21
ETH 1683.31
USDT 1.00
SBD 0.41