Clean up your data using SQL for effective analysis
Clean up your data using SQL for effective analysis
03 March 2022
Clean and reliable data is very essential for an effective analysis. There are a number of tools with which you can do that. SQL is one such tool with which you can clean your data. There are various functions in SQL with which you can carry that out.
LEFT
LEFT can be used to pull out a certain number of characters from a string and provide it as a separate string.
Syntax:
LEFT(string, number of characters)
Example:
SELECT employee_id, fullname, LEFT(joining_date, 10) AS cleaned_joining_date FROM employees
The above query will return the first ten characters from the strings as separate strings from the column full_name.
RIGHT
RIGHT performs a similar operation as LEFT but from the right side. It pulls a certain number of characters from the right side of the string.
Syntax:
RIGHT(string, number of characters)
Example:
SELECT purchase_id, RIGHT(purchase_date, 10) AS cleaned_purchase_time FROM purchases
The above query will return the last ten characters from the strings as separate strings from the column full_name.
TRIM
TRIM function in SQL is used to remove characters from the beginning and end of a string. It takes 3 arguments
Syntax:
TRIM([ LEADING | TRAILING | BOTH ] trim_character FROM source_string);
Examples:
If you want to remove the leading and trailing characters from a string:
SELECT store_address, TRIM(both '()' FROM store_address) FROM stores;
The TRIM function takes three arguments. First and foremost, you have to specify whether you want to remove characters from the leading side, the trailing side, or both. Next you must specify that all characters to be trimmed. Any characters included in the single quotes will be removed from leading, trailing or both sides of the string. Finally, you must specify that the text you want to trim using FROM.
LENGTH
LENGTH helps in returning the length of the string. This means that it can be combined with LEFT and RIGHT while cleaning up the strings in our table
You can use length in the following way
SELECT purchase_id, date, LEFT(date, 10) AS purchased_date, RIGHT(date, LENGTH(date) - 11) AS purchased_time FROM purchase;
SUBSTR
LEFT and RIGHT can create substrings of a string but only from the end or the beginning of a string. If you want to extract a substring from the middle of a string, you need to work with the SUBSTR function.
Syntax:
SUBSTR(*string*, *starting position of the character*, *number of characters*);
Example:
SELECT prchase_id, date, SUBSTR(date, 4, 2) AS day FROM purchase;
CONCAT
CONCAT function comes in handy when you want to combine the strings from multiple columns.
You can separate these combined values with a character or symbol of your choice.
Syntax:
CONCAT(string_1, string_2, string3, ...);
Example:
SELECT customer_id, first_name, last_name CONCAT(first_name, ' ', last_name) AS full_name FROM customers;
In the above example the first_name and the last_name are separated by a space.
UPPER and LOWER
The UPPER and LOWER functions can be used to change the strings into upper and lower cases.
Syntax:
UPPER(string); LOWER(string);
Example:
SELECT shop_id, shop_address, UPPER(address) AS address_upper, LOWER(address) AS address_lower FROM shops;
COALESCE
There are instances where there could be null value in your table and you’d like to add another value in their place. It can be done by using the COALESCE function. COALESCE does not change the values in the original table, it just returns a temporary view of the table with the values changed
Syntax:
COALESCE(column, value to be added)
Example:
SELECT fname, COALESCE (address, 'Address unknown') AS address FROM employees;