Tag Archives: guide

How to add Android as a separate platform in Daijisho

Copy the text below and save it as Android.json

{
    "databaseVersion": 8,
    "platform": {
        "name": "Android",
        "uniqueId": "android",
        "shortname": "android",
        "description": null,
        "acceptedFilenameRegex": "^.*$",
        "scraperSourceList": [
            "RAW:Android"
        ],
        "boxArtAspectRatioId": 0,
        "useCustomBoxArtAspectRatio": false,
        "customBoxArtAspectRatio": null,
        "screenAspectRatioId": 0,
        "useCustomScreenAspectRatio": false,
        "customScreenAspectRatio": null,
        "retroAchievementsAlias": null,
        "extra": ""
    },
    "playerList": [
        {
            "name": "android - activity component player",
            "description": "Android activity component player",
            "acceptedFilenameRegex": "^$",
            "amStartArguments": "-n {android.activity}\n",
            "killPackageProcesses": false,
            "killPackageProcessesWarning": true,
            "extra": ""
        }
    ]
}

Open Daijishou > Settings > (Under All settings) Library > Import Platform > Select the Android.json file.

Now go to the Android Platform > Path > Sync

Note: It is not an official platform and you can flag whether an app is a game or not if you go to daijisho apps section and then long press on an app and mark it as a game/not a game. It will show up in this android platform after syncing. By default the emulators themselves will likely be wrongly flagged as games.

A generic python based ETL pipeline solution for Databricks

Below is the code necessary to create a Databricks notebook source file that can be imported into Databricks. This file can act as a template for creating ETL logic to build tables in Databricks. Once the notebook is prepared it can be set to run by a Databricks workflow job.

The template is parameterized. This means the developer just needs to provide the destination database, the destination schema, the destination table and the SQL logic.

(Note: this simple example is a full load solution and not a incremental load solution. An incremental load solution can be achieve by writing sufficiently robust SQL that is use case specific.)

The SQL is provided as a variable and the table or table names are stored in a list allowing for a large degree of flexibility for creating a single pipeline that builds multiple database objects.

Another important feature of the code is that it compensates for the fact that Databricks does not have a native acknowledgement of primary keys or restrictions on their violations. A list of primary keys can be provided and if any of those keys are null or not distinct the code will throw an error.

The code will also assign metadata fields to each record created including the job run id as the ETL id, the created date and the updated date.

# Databricks notebook source
# MAGIC %md
# MAGIC https://tidbytez.com/<br />
# MAGIC This is an ETL notebook.<br />

# COMMAND ----------

# Libraries
import os
from pyspark.sql import functions as F
from pyspark.sql.types import StringType
from pyspark.sql.functions import lit, concat_ws, isnan, when, count, col
from datetime import datetime, timedelta

# COMMAND ----------

# Functions

# Generate ETL ID
def get_etl_id():
    try:
        run_id = (
            dbutils.notebook.entry_point.getDbutils()
            .notebook()
            .getContext()
            .currentRunId()
            .toString()
        )
        if run_id.isdigit():
            etl_id = bigint(run_id)
            return etl_id
        else
            etl_id = bigint(1)
            return etl_id
    except:
        print("Could not return an etl_id number")


# Build database object
def build_object(dest_db_name, schema_name, table_name, pk_fields, sql_query):

    # Destination Database and table
    table_location = dest_db_name + "." + table_name
    # External table file location
    file_location = "/mnt/" + schema_name + "/" + table_name

    # Create Dataframe
    df = sql_query

    # Count nulls in Primary Key
    cnt_pk_nulls = df.select(
        [count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in pk_fields]
    ).collect()[0][0]
    # Dataframe record count
    cnt_rows = df.count()
    # Primary Key distinct count
    cnt_dist = df.dropDuplicates(pk_fields).count()
    # Error message
    message = ""

    # Join metadata to dataframe
    global meta_df
    meta = meta_df
    df = df.withColumn("key", lit(1))

    # inner join on two dataframes
    df = df.join(meta, df.key == meta.key, "inner").drop(df.key).drop(meta.key)

    # Write dataframe to table
    if cnt_pk_nulls == 0:
        if cnt_rows == cnt_dist:
            df.write.mode("overwrite").format("delta").option(
                "mergeSchema", "false"
            ).option("path", file_location).saveAsTable(table_location)
        else:
            message = "Primary Key is not unique"
    else:
        message = "Primary Key contains nulls"

    if message != "":
        raise Exception(message)


# COMMAND ----------

# Variables

# Destinations

# File location
schema_name = "YOUR_SCHEMA_NAME_HERE"

# Database location
dest_db_name = "YOUR_DEST_DATABASE_NAME_HERE"

# PK fields
pk_fields = ["EXAMPLE_ID", "EXAMPLE_LOCATION"]

# Metadata
etl_id = get_etl_id()
t = datetime.utcnow()

# Create metadata dataFrame
data = [(1, etl_id, t, t)]
columns = ["key", "ETL_ID", "CREATED_DATE", "UPDATED_DATE"]
meta_df = spark.createDataFrame(data, columns)
meta_df = meta_df.withColumn("ETL_ID", meta_df["ETL_ID"].cast("int"))

# COMMAND ----------

# Table name variable list
table_list = [
    {"table_name": "EXAMPLE_TABLE"}
]

# COMMAND ----------

# Iterate through table variables
for i in range(len(table_list)):

    table_name = table_list[i].get("table_name")

    # SQL query
    sql_query = spark.sql(
        f"""
        SELECT 1 AS EXAMPLE_ID,
        'TEXAS' AS "EXAMPLE_LOCATION"
        """
    )
    build_object(dest_db_name, schema_name, table_name, pk_fields, sql_query)

How to fix Dolphin GameCube controller button mappings and keep them from being overwritten by RetroBat or Emulation Station

RetroBat and Emulation Station do a great job of mapping controller buttons straight out of the box but sometimes these settings do not map correctly onto specific emulators.

Dolphin’s GameCube is one such emulator that seems to get the buttons jumbled.

If you have ever tried to fix the button mappings via Dolphin directly you might have been frustrated that your manual settings have not stuck as next time you run a game you are back to the same wrong button layout.

This is because RetroBat or Emulation Stations front-end settings take precedence over the individual emulator settings, i.e. the expectation is you will set the emulator settings via these front-ends not in the emulators individually. Mostly this works great however some of the more detailed settings cannot be set via the front-ends and the front-ends overwrite the emulators with incorrect default settings.

To solve the GameCube button mapping problem do the following:

Open Dolphin directly, there are various means to achieve this one being via the RetroBat settings.

Click on the Controller icon.

For each port (controller you have connected) click on the “Configure” icon.

In the “Device” drop down select your controller.

In the GameCube Controller options set the buttons as follows:

For a PlayStation controller:

A : Cross

B : Square

X : Circle

Y : Triangle

Z : R1

Start: Start (Options)

For a XBox controller:

A : A

B : X

X : B

Y : Y

Z : Right Bumper

Start: Start

Now save the settings as a profile.

Reopen RetroBat/Emulation Station and press Start on your controller.

From the Main Menu:

Game Settings > Per System Advanced Configuration > GameCube > Autoconfigure Controllers = “OFF”

This should resolve the problem going forward.

How to replace multiple words within a string at once using python

Below is a quick code snippet example you can reuse to replace multiple words within a string using python.

s = "The quick brown fox jumps over the lazy dog"
print(s)
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)
print(s)

PlayStation 1 not showing up as an option in “Consoles” Tab of GarlicOS

If you have populated your RG35XX PS folder with games yet GarlicOS has not presented PlayStation as a console option this is likely due to GarlicOS not having the functionality to read sub folders and that your games each have dedicated folders. For GarlicOS to see your games all your games must be directly in the console folder.

However typically PS games are in .bin format and are saved in folders because even single disk games will have at least two associated files i.e. the .bin files and the .cue file. For multi disk games, where there is a .bin file and a .cue file for each disk, and potentially a .m3u file to handle multi disk operation, the problem is exacerbated.

One solution would be to convert your PS games to the .chd format. Converting the PS “disks”, i.e. pairs of .cue and .bin files to the .chd format will result in a single file per disk which is also compressed taking up much less space.

To convert “disks” to .chd download the zip of the software “CHDMAN” below:

https://archive.org/details/chdman

Unzipping the file will create a folder CHDMAN.

In this folder open the batch file called “Cue or GDI to CHD” with a text editor and replace the line:

for /r %%i in (*.cue, *.gdi) do chdman createcd -i “%%i” -o “%%~ni.chd”

with:

for /r %%i in (*.cue, *.gdi, *.iso) do chdman createcd -i “%%i” -o “%%~ni.chd”

This update allows the batch file to work with ISO files too.

Now to convert “disks” simply drag and drop the .cue and .bin files into the CHDMAN folder and then double click the batch file “Cue or GDI to CHD” to run it.

This will produce a single .chd file you can then save to the PS folder of your GarlicOS games directory.

Comparing two tables for equality with Spark SQL

The best way of comparing two tables to determine if they are the exact same is to calculate the hash sum of each table and then compare the sum of hash. The benefit of the technique below are that no matter how many fields there are and no matter what data types the fields may be, you can use following query to do the comparison:

SELECT SUM(HASH(*)) FROM t1;
SELECT SUM(HASH(*)) FROM t2;

Of course if the schemas of the two tables are different this will by default produce different hash values.

How to insert a record with Spark SQL

INSERT INTO tables with VALUES option as achieved with other SQL variants is not supported in Spark SQL as of now. For single record inserts the below example provides two options:

--CREATE test table
CREATE TABLE TestSchema.InsertTest USING DELTA AS (SELECT 1 AS row_id, 'value1' AS field_1, 'value2' AS field_2)

--INSERT INTO test table
INSERT INTO TestSchema.InsertTest SELECT t.* FROM (SELECT 2, 'value3', 'value4') t;

--INSERT INTO test table while aliasing field names
INSERT INTO TestSchema.InsertTest SELECT t.* FROM (SELECT 3 AS row_id, 'value5' AS field_1, 'value6' AS field_2) t;

--Confirm insert
SELECT * FROM TestSchema.InsertTest

How to count nulls and hard-coded text that signifies null in a Pandas DataFrame

🧪 Validating Non-Empty Fields in Python

When working with data validation—especially in web forms, APIs, or data pipelines—it’s common to check whether a field is empty or null. But sometimes, a field might appear empty at first glance, yet still contain whitespace, hidden characters, or default values that make it technically non-null.

Let’s explore how to determine whether a field is actually empty or null, and how to handle it properly in Python.

🔍 What Does “Not Empty or Null” Really Mean?

A field is considered not empty or null if:

  • It is not None
  • It is not an empty string ("")
  • It does not consist solely of whitespace (" ")
  • It is not an empty container (like [], {}, or ())

These subtle distinctions are important when validating user input or cleaning data.

🧰 Python Functions for Validation

Here are some Python functions that help determine whether a field is truly non-empty:

def getListOfMissingValues():
    """
    desc: List of common words used to represent null that are often found in files as text
    """
    lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	']
    return lst
	
def advanceMissingValues(df):
    """
    desc: Count nulls and hardcoded text that represents nulls
    param p1: DataFrame name
    return: DataFrame of field names and count values
    """
    lstMissingVals = getListOfMissingValues()
    col_list = getListOfFieldNames(df)
    output = pd.DataFrame(col_list)
    output.rename(columns = {0:'FieldName'}, inplace = True)
    output['Count'] = ''
    
    #For each field name count nulls and other null type values
    for col in col_list:
        nullCnt = df[col].isnull().sum(axis=0)
        #For each missing value perform count on column
        missValCnt = 0
        for missVal in lstMissingVals:
            missValCnt = missValCnt + len(df[(df[col]==missVal)])
 
        cntTotal = nullCnt + missValCnt
        output.loc[output['FieldName'] == col, 'Count'] = cntTotal

    return output

#Test Setup
lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	' ,None]
mdf = pd.DataFrame(lst)
mdf.rename(columns = {0:'NullTypes'}, inplace = True)
print(mdf)

#Run Test
chk = advanceMissingValues(mdf)
chk

Sample output:

How to convert Panda DataFrame headers to snake case

# Python code demonstrate 
# Make headers snake case
 
import pandas as pd
 
# initialise data of lists.
data = {'First Name':['Tom', 'nick', 'krish', 'jack'], 'Age of Person':[20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

# Make headers snake case
df.columns = [x.lower() for x in df.columns]
df.columns = df.columns.str.replace("[ ]", "_", regex=True)

# Print the output.
print(df)

How to drop a Spark Delta table and associated files using Spark SQL and cmd

🧹 How to Drop a Spark Delta Table and Clean Up Associated Files in Databricks

When working with Delta Lake tables in Databricks, it’s not enough to simply drop the table from the metastore—you also need to ensure that the underlying data files are removed to prevent clutter and maintain a clean data lake. This process is especially important when dealing with external Delta tables, where Spark does not automatically manage file deletion.

The following steps outline a reliable method to fully remove a Delta table and its associated files using Spark SQL and command-line tools.

🔹 Step 1: Identify the Schema and Table

Begin by locating the schema and table you want to delete. Replace placeholder values like schemaName and tableName with the actual names used in your environment. This ensures you’re targeting the correct table throughout the process.

🔹 Step 2: Inspect the Table Metadata

Using Spark SQL within Databricks, run a query to describe the table. This will return detailed metadata, including the location of the table’s data files in DBFS (Databricks File System). If you’re using the default schema, it may be named default, but adjust as needed.

🔹 Step 3: Locate the Storage Path

In the metadata output, scroll down to find the Location field. This value points to the directory where the table’s data files are stored. Copy this path—it will be used later to manually delete the files if necessary.

🔹 Step 4: Drop the Table from the Metastore

Execute a Spark SQL command to drop the table. This removes the table’s metadata from the catalog. If the table is managed, this step may also delete the associated files. However, for external tables, the files will remain and must be deleted manually.

🔹 Step 5: Delete the Data Files from DBFS

Using your preferred method of interacting with DBFS—whether through the command line, a Python script, or a Databricks notebook—delete the directory identified earlier. This ensures that all data files associated with the table are removed from storage.

✅ Why This Matters

Delta tables support ACID transactions and maintain a transaction log. Improper deletion—such as manually removing files without dropping the table—can corrupt the log and lead to inconsistent behavior. By following this structured approach, you ensure both the metadata and physical files are properly cleaned up.

This method is especially useful when:

  • Decommissioning obsolete datasets
  • Resetting environments for testing
  • Automating cleanup in CI/CD pipelines

Let me know if you’d like help turning this into a reusable script or integrating it with your workflow.

#Step 1
#Find and replace schemaName
#Find and replace tableName

#Step 2 
#Find the table 
#Via Databricks run the Spark SQL query below
#default is schema, change as needed
DESC FORMATTED schemaName.tableName;

#Step 3
#From the table returned scroll down to "location" and copy the field value
#Find and replace locationFieldValue

#Step 5
#Via Databricks using Spark SQL drop the table
DROP TABLE tableName

#Step 6
#Find and replace locationFieldValue
#By the means you use to interact with Databricks File System (dbfs), e.g. cmd python virtual environment
#Run command below
dbfs rm -r "locationFieldValue"