Tag Archives: howto

How to count nulls and hard-coded text that signifies null in a Pandas DataFrame

Use case: sometimes files are processed were nulls are represented with text like “NULL” meaning the field is not actually empty or null.

Below are some python functions and a test demonstrating functionality.

def getListOfMissingValues():
    """
    desc: List of common words used to represent null that are often found in files as text
    """
    lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	']
    return lst
	
def advanceMissingValues(df):
    """
    desc: Count nulls and hardcoded text that represents nulls
    param p1: DataFrame name
    return: DataFrame of field names and count values
    """
    lstMissingVals = getListOfMissingValues()
    col_list = getListOfFieldNames(df)
    output = pd.DataFrame(col_list)
    output.rename(columns = {0:'FieldName'}, inplace = True)
    output['Count'] = ''
    
    #For each field name count nulls and other null type values
    for col in col_list:
        nullCnt = df[col].isnull().sum(axis=0)
        #For each missing value perform count on column
        missValCnt = 0
        for missVal in lstMissingVals:
            missValCnt = missValCnt + len(df[(df[col]==missVal)])
 
        cntTotal = nullCnt + missValCnt
        output.loc[output['FieldName'] == col, 'Count'] = cntTotal

    return output

#Test Setup
lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	' ,None]
mdf = pd.DataFrame(lst)
mdf.rename(columns = {0:'NullTypes'}, inplace = True)
print(mdf)

#Run Test
chk = advanceMissingValues(mdf)
chk

Sample output:

How to convert Panda DataFrame headers to snake case

# Python code demonstrate 
# Make headers snake case
 
import pandas as pd
 
# initialise data of lists.
data = {'First Name':['Tom', 'nick', 'krish', 'jack'], 'Age of Person':[20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

# Make headers snake case
df.columns = [x.lower() for x in df.columns]
df.columns = df.columns.str.replace("[ ]", "_", regex=True)

# Print the output.
print(df)

How to drop a Spark Delta table and associated files using Spark SQL and cmd

#Step 1
#Find and replace schemaName
#Find and replace tableName

#Step 2 
#Find the table 
#Via Databricks run the Spark SQL query below
#default is schema, change as needed
DESC FORMATTED schemaName.tableName;

#Step 3
#From the table returned scroll down to "location" and copy the field value
#Find and replace locationFieldValue

#Step 5
#Via Databricks using Spark SQL drop the table
DROP TABLE tableName

#Step 6
#Find and replace locationFieldValue
#By the means you use to interact with Databricks File System (dbfs), e.g. cmd python virtual environment
#Run command below
dbfs rm -r "locationFieldValue"

How to dynamically pivot a SQL Server table using dynamic T-SQL

A dynamic pivot table means you do not need to define hard coded column names as a dynamic query will fetch the field values from a column and use them as the column names while pivoting the source table.

Sounds complicated?

It is!

Good thing there are some code examples below you can just steal and alter as you need.

The first example will just return as a SELECT, the second example will write the results to a global temp table called ##Result.

A use case for this might be a continuous requirement to pivot a table however the column name requirements keep changing as field values change.

Example 1: Return as SELECT

/*Mock Table*/
IF OBJECT_ID('tempdb.dbo.#Fruits', 'U') IS NOT NULL
	DROP TABLE #Fruits;

CREATE TABLE #Fruits (
	Fruit VARCHAR(255)
	,Quantity INT
	,DateOf DATETIME
	);

INSERT INTO #Fruits (
	Fruit
	,Quantity
	,DateOf
	)
VALUES 
('Apple', 10	,GETDATE())
,('Orange', 10	,GETDATE())
,('Banana', 10, GETDATE())
,('Apple', 11, DATEADD(DAY, - 1, GETDATE()))
,('Orange', 11, DATEADD(DAY, - 1, GETDATE()))
,('Banana', 11, DATEADD(DAY, - 1, GETDATE()))
,('Apple', 12, DATEADD(DAY, - 2, GETDATE()))
,('Orange', 12, DATEADD(DAY, - 2, GETDATE()))
,('Banana', 12, DATEADD(DAY, - 2, GETDATE()))
,('Apple', 13, DATEADD(DAY, - 3, GETDATE()))
,('Orange', 13, DATEADD(DAY, - 3, GETDATE()))
,('Banana', 13, DATEADD(DAY, - 3, GETDATE()));

/*Demo Mock table*/
SELECT *
FROM #Fruits;

/*Logic to dynamically pivot table*/
DECLARE @cols AS NVARCHAR(MAX)
	,@query AS NVARCHAR(MAX);

SELECT @cols = STUFF((
			SELECT DISTINCT QUOTENAME(f.[Fruit]) + ', '
			FROM #Fruits AS f
			FOR XML PATH('')
				,TYPE
			).value('.', 'NVARCHAR(MAX)'), 1, 1, '');

/*Add missing square bracket to start of string*/
SET @cols = '[' + @cols;
/*Remove last comma from string*/
SET @cols = SUBSTRING(@cols, 1, (LEN(@cols) - 1));
SET @query = 'SELECT [DateOf], ' + @cols + ' FROM 
             (
              SELECT *
			  FROM #Fruits
            ) x
            pivot 
            (
                min(Quantity)
                for [Fruit] in (' + @cols + ')
            ) p ORDER BY RIGHT([DateOf], 4) ASC
			,LEFT(RIGHT([DateOf], 7), 2) ASC
			,LEFT([DateOf], 2) ASC';

EXECUTE (@query);

DROP TABLE #Fruits;

Example 2: Write output to a table

IF OBJECT_ID('tempdb.dbo.##Result', 'U') IS NOT NULL
	DROP TABLE ##Result;
/*Mock Table*/
IF OBJECT_ID('tempdb.dbo.#Fruits', 'U') IS NOT NULL
	DROP TABLE #Fruits;

CREATE TABLE #Fruits (
	Fruit VARCHAR(255)
	,Quantity INT
	,DateOf DATETIME
	);

INSERT INTO #Fruits (
	Fruit
	,Quantity
	,DateOf
	)
VALUES 
('Apple', 10	,GETDATE())
,('Orange', 10	,GETDATE())
,('Banana', 10, GETDATE())
,('Apple', 11, DATEADD(DAY, - 1, GETDATE()))
,('Orange', 11, DATEADD(DAY, - 1, GETDATE()))
,('Banana', 11, DATEADD(DAY, - 1, GETDATE()))
,('Apple', 12, DATEADD(DAY, - 2, GETDATE()))
,('Orange', 12, DATEADD(DAY, - 2, GETDATE()))
,('Banana', 12, DATEADD(DAY, - 2, GETDATE()))
,('Apple', 13, DATEADD(DAY, - 3, GETDATE()))
,('Orange', 13, DATEADD(DAY, - 3, GETDATE()))
,('Banana', 13, DATEADD(DAY, - 3, GETDATE()));

/*Demo Mock table*/
SELECT *
FROM #Fruits;

/*Logic to dynamically pivot table*/
DECLARE @cols AS NVARCHAR(MAX)
	,@query AS NVARCHAR(MAX);

SELECT @cols = STUFF((
			SELECT DISTINCT QUOTENAME(f.[Fruit]) + ', '
			FROM #Fruits AS f
			FOR XML PATH('')
				,TYPE
			).value('.', 'NVARCHAR(MAX)'), 1, 1, '');

/*Add missing square bracket to start of string*/
SET @cols = '[' + @cols;
/*Remove last comma from string*/
SET @cols = SUBSTRING(@cols, 1, (LEN(@cols) - 1));
SET @query = 'SELECT [DateOf], ' + @cols + ' INTO ##Result FROM 
             (
              SELECT *
			  FROM #Fruits
            ) x
            pivot 
            (
                min(Quantity)
                for [Fruit] in (' + @cols + ')
            ) p ORDER BY RIGHT([DateOf], 4) ASC
			,LEFT(RIGHT([DateOf], 7), 2) ASC
			,LEFT([DateOf], 2) ASC';

EXECUTE (@query);

SELECT * FROM ##Result;

DROP TABLE ##Result;

DROP TABLE #Fruits;

How to create a Spark SQL table with a SELECT statement

The following is a code snippet that would create a table in a “sales” schema called customer.

If no reference to a schema is given the table will be created in the default Spark location.

CREATE TABLE sales.customer USING DELTA AS (SELECT 'John' AS fn, 'Smith' AS sn, 55 AS age)

How to run a PowerShell script

So you have a PowerShell script and you just want to run it without messing around with permissions, policies, signing it or any other crap. (Yes yes I know all those things are vital for system wide security but you’re in a hurry damn it!)

Right click PowerShell and run as administrator.


When the terminal is open run the following line:


Set-ExecutionPolicy RemoteSigned


When prompted type the letter A and press Enter (if applicable).


Run the below including “&” at start of line with reference to your script, i.e.


& “C:\YourTestDirectory\YourTestFile.ps1”

How to create a directory structure from file creation dates using PowerShell

Below is example PowerShell code to create a directory structure from file creation dates.

A use case for this code might be a need to organize a collection of files stored on an SFTP server or NAS. The code can be modified to loop through the files putting each one into sub directories corresponding to nested folders organized into Year\Month\Day folders.

To use the logic first create a test folder on your computer.
Then copy the logic below to a text file editor and find and replace the following with reference to your test folder location:

C:\YourTestDirectory

Then save the code below as a PowerShell file called “DirFromDate.ps1” into your test folder.

To then run the file open up PowerShell and paste the following line:

Set-ExecutionPolicy RemoteSigned

Hit return and then enter the letter A if requested.

Then enter the following replacing “YourTestDirectory” with your actual test directory location:

& “C:\YourTestDirectory\DirFromDate.ps1”

Hit return.

<#
.SYNOPSIS
   <A brief description of the script>
.DESCRIPTION
   <A detailed description of the script>
.PARAMETER <paramName>
   <Description of script parameter>
.EXAMPLE
   <An example of using the script>
#>

#FIND AND REPLACE THE FOLLOWING WITH A DIRECTORY
C:\YourTestDirectory

#How Run In Powershell
#Open Powershell terminal and run the folling line:
#Set-ExecutionPolicy RemoteSigned
#Type A and press Enter (if applicable).
#Run the below including "&" at start of line
#& "C:\YourTestDirectory\DirFromDate.ps1"

#VARIABLES
#Define folder variables
$workingFolder = "C:\YourTestDirectory\"
#Define file variable
$testFile = "\ThisIsATestFile.txt"
$testFile = $workingFolder + $testFile


###############################################
#(!!DO NOT ALTER SCRIPT PASSED THIS POINT!!)
###############################################
#Write variables to terminal
Write-Host "Filepath: $workingFolder";

#Create test File

New-Item $testFile

#Define source directory variable
$files = Get-ChildItem -Path $workingFolder 

#3 steps: move raw files to azure, zip file, move zipped file to azure
foreach ($file in $files)
{
		#FileName
		Write-Host "File: $file";
		#Get file creation date from file name
		$fileDate = $file.CreationTime.ToString("ddMMyyyy")
		#Get file creation date (last time data was written to file) from file name
		#$fileDate = $file.LastWriteTime.ToString("ddMMyyyy")
		Write-Host "Filedate: $fileDate";
		$fileDate = $fileDate.Substring($fileDate.Length - 4) + "\" + $fileDate.Substring(3, 2) + "\" + $fileDate.Substring(0, 2)
		Write-Host "Filedate: $fileDate";
		$DirectoryStructure = $workingFolder + "\" + $fileDate
		Write-Host "DirStructure: $DirectoryStructure";

}
md $DirectoryStructure

Move-Item -Path $testFile -Destination $DirectoryStructure

How to check SQL Server schema user permissions

The script below will, providing the login you are using has adequate permissions, return the schema permissions a user has.

Commented out at the end of the script are examples of the types of permission you can assign, again providing the login you are using has adequate permissions.

SELECT ClassDescription
	,StateDescription
	,PermissionName
	,SchemaName
	,UserName
FROM (
	SELECT class_desc AS ClassDescription
		,state_desc AS StateDescription
		,permission_name AS PermissionName
		,SCHEMA_NAME(major_id) AS SchemaName
		,USER_NAME(grantee_principal_id) AS UserName
	FROM sys.database_permissions AS PERM
	JOIN sys.database_principals AS Prin ON PERM.major_ID = Prin.principal_id
		AND class_desc = 'SCHEMA'
	) AS schemaPermissions
WHERE 1=1 
/*Uncomment below to check permissions on a specific schema and/or specific user*/
--	AND SchemaName = 'dbo'
--	AND UserName = 'SomeGuy'
ORDER BY UserName ASC
,SchemaName ASC
GO



/*
--Grant schema permission examples
GRANT SELECT ON SCHEMA::dbo TO SomeGuy;
GRANT UPDATE ON SCHEMA::dbo TO SomeGuy;
GRANT ALTER ON SCHEMA::dbo TO SomeGuy;
GRANT DELETE ON SCHEMA::dbo TO SomeGuy;
*/

How to update/replace or remove an email address from all SSRS subscriptions with T-SQL

Updating/replacing or removing an email address from SSRS subscriptions manually is far too time consuming and tedious. Use the below script instead to either update/replace an email address with a new one across all subscriptions or remove the email address from all subscriptions.

To exclude subscriptions, i.e. keep the email address active for a particular subscription, find the subscription Id for that subscription and include it in the WHERE clause. Remember to uncomment that line in order for the clause to be active.

If you’re worried about messing anything up then back up the table before running the script!

Create backup:

SELECT *
INTO [dbo].[Subscriptions_bk]
FROM [dbo].[Subscriptions]

Update/replace or remove an email address:

/*
To replace an email address with another email address:
Find and replace the following email addresses (Ctrl+H) with the new email address*/
/*
Email address to update/replace:
replaceSomeGuy@someCompany.com

Email address replacement:
newGuy@someCompany.com
*/
UPDATE [dbo].[Subscriptions]
SET ExtensionSettings = REPLACE(CONVERT(VARCHAR(MAX), ExtensionSettings), 'replaceSomeGuy@someCompany.com', 'newGuy@someCompany.com')
WHERE CHARINDEX('replaceSomeGuy@someCompany.com', ExtensionSettings) <> 0
--AND SubscriptionID NOT IN ('FindTheSubscriptionId')
;

/*
To remove an email address:
NOTE:Run both of the following scripts as an email address may or may not end with ";"

Find and replace the following email address (Ctrl+H)
*/
/*
Email address to remove
removeSomeGuy@otherCompany.com
*/
UPDATE [dbo].[Subscriptions]
SET ExtensionSettings = REPLACE(CONVERT(VARCHAR(MAX), ExtensionSettings), 'removeSomeGuy@otherCompany.com;', '')
WHERE CHARINDEX('removeSomeGuy@otherCompany.com;', ExtensionSettings) <> 0
--AND SubscriptionID NOT IN ('FindTheSubscriptionId')
;


UPDATE [dbo].[Subscriptions]
SET ExtensionSettings = REPLACE(CONVERT(VARCHAR(MAX), ExtensionSettings), 'removeSomeGuy@otherCompany.com', '')
WHERE CHARINDEX('removeSomeGuy@otherCompany.com', ExtensionSettings) <> 0
--AND SubscriptionID NOT IN ('FindTheSubscriptionId')
;