How to count nulls and hard-coded text that signifies null in a Pandas DataFrame

🧪 Validating Non-Empty Fields in Python

When working with data validation—especially in web forms, APIs, or data pipelines—it’s common to check whether a field is empty or null. But sometimes, a field might appear empty at first glance, yet still contain whitespace, hidden characters, or default values that make it technically non-null.

Let’s explore how to determine whether a field is actually empty or null, and how to handle it properly in Python.

🔍 What Does “Not Empty or Null” Really Mean?

A field is considered not empty or null if:

It is not None
It is not an empty string ("")
It does not consist solely of whitespace (" ")
It is not an empty container (like [], {}, or ())

These subtle distinctions are important when validating user input or cleaning data.

🧰 Python Functions for Validation

Here are some Python functions that help determine whether a field is truly non-empty:

def getListOfMissingValues():
    """
    desc: List of common words used to represent null that are often found in files as text
    """
    lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	']
    return lst
	
def advanceMissingValues(df):
    """
    desc: Count nulls and hardcoded text that represents nulls
    param p1: DataFrame name
    return: DataFrame of field names and count values
    """
    lstMissingVals = getListOfMissingValues()
    col_list = getListOfFieldNames(df)
    output = pd.DataFrame(col_list)
    output.rename(columns = {0:'FieldName'}, inplace = True)
    output['Count'] = ''
    
    #For each field name count nulls and other null type values
    for col in col_list:
        nullCnt = df[col].isnull().sum(axis=0)
        #For each missing value perform count on column
        missValCnt = 0
        for missVal in lstMissingVals:
            missValCnt = missValCnt + len(df[(df[col]==missVal)])
 
        cntTotal = nullCnt + missValCnt
        output.loc[output['FieldName'] == col, 'Count'] = cntTotal

    return output

#Test Setup
lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	' ,None]
mdf = pd.DataFrame(lst)
mdf.rename(columns = {0:'NullTypes'}, inplace = True)
print(mdf)

#Run Test
chk = advanceMissingValues(mdf)
chk

Sample output:

Tidbytez

Delivering Bite-Sized Tech Knowledge

How to count nulls and hard-coded text that signifies null in a Pandas DataFrame

🧪 Validating Non-Empty Fields in Python

🔍 What Does “Not Empty or Null” Really Mean?

🧰 Python Functions for Validation

Leave a comment Cancel reply

🧪 Validating Non-Empty Fields in Python

🔍 What Does “Not Empty or Null” Really Mean?

🧰 Python Functions for Validation

Share this:

Related

Leave a comment Cancel reply