🧪 Validating Non-Empty Fields in Python
When working with data validation—especially in web forms, APIs, or data pipelines—it’s common to check whether a field is empty or null. But sometimes, a field might appear empty at first glance, yet still contain whitespace, hidden characters, or default values that make it technically non-null.
Let’s explore how to determine whether a field is actually empty or null, and how to handle it properly in Python.
🔍 What Does “Not Empty or Null” Really Mean?
A field is considered not empty or null if:
- It is not
None - It is not an empty string (
"") - It does not consist solely of whitespace (
" ") - It is not an empty container (like
[],{}, or())
These subtle distinctions are important when validating user input or cleaning data.
🧰 Python Functions for Validation
Here are some Python functions that help determine whether a field is truly non-empty:
def getListOfMissingValues(): """ desc: List of common words used to represent null that are often found in files as text """ lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', ' '] return lst def advanceMissingValues(df): """ desc: Count nulls and hardcoded text that represents nulls param p1: DataFrame name return: DataFrame of field names and count values """ lstMissingVals = getListOfMissingValues() col_list = getListOfFieldNames(df) output = pd.DataFrame(col_list) output.rename(columns = {0:'FieldName'}, inplace = True) output['Count'] = '' #For each field name count nulls and other null type values for col in col_list: nullCnt = df[col].isnull().sum(axis=0) #For each missing value perform count on column missValCnt = 0 for missVal in lstMissingVals: missValCnt = missValCnt + len(df[(df[col]==missVal)]) cntTotal = nullCnt + missValCnt output.loc[output['FieldName'] == col, 'Count'] = cntTotal return output #Test Setup lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', ' ' ,None] mdf = pd.DataFrame(lst) mdf.rename(columns = {0:'NullTypes'}, inplace = True) print(mdf) #Run Test chk = advanceMissingValues(mdf) chk
Sample output:
