Tag Archives: cmd

How to drop a Spark Delta table and associated files using Spark SQL and cmd

🧹 How to Drop a Spark Delta Table and Clean Up Associated Files in Databricks

When working with Delta Lake tables in Databricks, it’s not enough to simply drop the table from the metastore—you also need to ensure that the underlying data files are removed to prevent clutter and maintain a clean data lake. This process is especially important when dealing with external Delta tables, where Spark does not automatically manage file deletion.

The following steps outline a reliable method to fully remove a Delta table and its associated files using Spark SQL and command-line tools.

🔹 Step 1: Identify the Schema and Table

Begin by locating the schema and table you want to delete. Replace placeholder values like schemaName and tableName with the actual names used in your environment. This ensures you’re targeting the correct table throughout the process.

🔹 Step 2: Inspect the Table Metadata

Using Spark SQL within Databricks, run a query to describe the table. This will return detailed metadata, including the location of the table’s data files in DBFS (Databricks File System). If you’re using the default schema, it may be named default, but adjust as needed.

🔹 Step 3: Locate the Storage Path

In the metadata output, scroll down to find the Location field. This value points to the directory where the table’s data files are stored. Copy this path—it will be used later to manually delete the files if necessary.

🔹 Step 4: Drop the Table from the Metastore

Execute a Spark SQL command to drop the table. This removes the table’s metadata from the catalog. If the table is managed, this step may also delete the associated files. However, for external tables, the files will remain and must be deleted manually.

🔹 Step 5: Delete the Data Files from DBFS

Using your preferred method of interacting with DBFS—whether through the command line, a Python script, or a Databricks notebook—delete the directory identified earlier. This ensures that all data files associated with the table are removed from storage.

✅ Why This Matters

Delta tables support ACID transactions and maintain a transaction log. Improper deletion—such as manually removing files without dropping the table—can corrupt the log and lead to inconsistent behavior. By following this structured approach, you ensure both the metadata and physical files are properly cleaned up.

This method is especially useful when:

  • Decommissioning obsolete datasets
  • Resetting environments for testing
  • Automating cleanup in CI/CD pipelines

Let me know if you’d like help turning this into a reusable script or integrating it with your workflow.

#Step 1
#Find and replace schemaName
#Find and replace tableName

#Step 2 
#Find the table 
#Via Databricks run the Spark SQL query below
#default is schema, change as needed
DESC FORMATTED schemaName.tableName;

#Step 3
#From the table returned scroll down to "location" and copy the field value
#Find and replace locationFieldValue

#Step 5
#Via Databricks using Spark SQL drop the table
DROP TABLE tableName

#Step 6
#Find and replace locationFieldValue
#By the means you use to interact with Databricks File System (dbfs), e.g. cmd python virtual environment
#Run command below
dbfs rm -r "locationFieldValue"

An icon depicting a calendar and clock

How to create a datetime stamp within a batch script in the format YYYY-MM-DD

Below is a batch script that populates the variable Timestamp with a datetime value in the format YYYY-MM-DD_HH:MM:SS.

This technique can then be used when creating files etc.

To try out the example below copy the text and paste it into a text editor like notepad. Save the file as timestamp with the extension .bat and then double click on the filestamp.bat file.

This code example will display the current datetime in windows command shell. It will also create a blank file called Test along with the current datetime in the format yyyymmddhhmmss in the same directory the batch script is run from.

@ECHO off
CLS
ECHO Date format = %date%

REM Breaking down the format 
FOR /f "tokens=2 delims==" %%G in ('wmic os get localdatetime /value') do set datetime=%%G
ECHO dd = %datetime:~6,2%
ECHO mth = %datetime:~4,2% 
ECHO yyyy = %datetime:~0,4%
ECHO/
ECHO Time format = %time%
ECHO hh = %time:~0,2%
ECHO mm = %time:~3,2%
ECHO ss = %time:~6,2%
ECHO/

REM Variable format 1
SET Timestamp=%date:~6,8%-%date:~3,2%-%date:~0,2%_%time:~0,2%:%time:~3,2%:%time:~6,2%
ECHO New Format 1: %Timestamp%
ECHO/
REM Variable Format 2
SET Timestamp=%date:~6,8%%date:~3,2%%date:~0,2%%time:~0,2%%time:~3,2%%time:~6,2%
ECHO New Format 2: %Timestamp%
ECHO/
REM Building a timestamp from variables
SET "dd=%datetime:~6,2%"
SET "mth=%datetime:~4,2%"
SET "yyyy=%datetime:~0,4%"
SET "Date=%yyyy%%mth%%dd%"
ECHO Built Date from variables: %Date%
ECHO/

REM Write Timestamp into file name
REM BREAK>Test%Timestamp%.txt 
PAUSE

 

How to enable and disable xp_cmdshell

xp_cmdshell is an extended stored procedure provided by Microsoft and stored in the master database. This procedure allows you to issue operating system commands directly to the Windows command shell via T-SQL code.

By default, the xp_cmdshell option is disabled on new installations. Along with other unwanted behavior malicious users can elevate their privileges with the use of xp_cmdshell. For this reason it is generally recommend to leave xp_cmdshell disabled. It can be enabled by using the Policy-Based Management or by running the sp_configure system stored procedure as shown in the following code example:

Use Master
GO

EXEC master.dbo.sp_configure 'show advanced options', 1
RECONFIGURE WITH OVERRIDE
GO

EXEC master.dbo.sp_configure 'xp_cmdshell', 1
RECONFIGURE WITH OVERRIDE
GO

 

To disable xp_cmdshell use the following code example. Note: The following example also sets show advanced options server configuration option to 0. It is best practice to leave this option in the default state of 0 to stop the enabling of features.

Use Master
GO

EXEC master.dbo.sp_configure 'xp_cmdshell', 0
RECONFIGURE WITH OVERRIDE
GO

EXEC master.dbo.sp_configure 'show advanced options', 0
RECONFIGURE WITH OVERRIDE
GO

How to copy only files from a folder and respective subfolders using a batch script

Ok so here’s a nifty little batch script that will loop through all the directories in a directory to gather all the subsequent files into a new directory. (Code at bottom of the page)

(For those of you that don’t know what a batch script is here’s a quick definition. A batch file is a text file that contains a sequence of commands for a computer operating system. It’s called a batch file because it batches (bundles or packages) into a single file a set of commands that would otherwise have to be presented to the system interactively from a keyboard one at a time.)

Here’s a diagram explaining how the batch script provided at the bottom of the page works. (Note that the folders aren’t copied just the files.)

diagram of process

So why would you use this?

Lets say you’re dealing with a system that has outputted thousands of folders into a single folder and each folder contains files.

You could be looking at millions of files to be gathered, a process that cannot be done manually.

Or a simpler use case might be you’ve downloaded thousands of movies and each was put into a separate folder. Now you just want the media files in one location.

How to use:

Open the program notepad.

Copy the code below and past it into notepad saving the file with the extension .bat

Place the batch file in the parent folder, i.e. the folder all the other folders are in.

Double click the batch file.

The batch file will create a folder one directory level above where the parent folder is located and copy all the files to this location.

An example of how to use the file is as follows:

If you had the parent folder “My Movies” on your desktop.

Pasting the batch file into this folder, and double clicking on it, would copy every movie into a folder called “Copied from My Movies” on your desktop.

And here’s the code,
Adiós.


@ECHO OFF
TITLE WAIT !
:: ASSIGN THE FILE PATH OF BATCH FILE TO A VARIABLE
SET "sourceDir=%CD%"
:: GET THE NAME OF THE FOLDER WHICH THE BATCH FILE IS IN
FOR %%a IN (.) DO SET currentFolder=%%~na
:: GO UP ONE DIRECTORY
CD ..
:: MAKE A DYNAMIC FOLDER NAME
::SET folderName=Copied From %currentFolder%
SET "folderName=Copied from %currentFolder%"
:: CREATE A FOLDER TO PUT THE COPIED FILES IN
:: IF FOLDER ALREADY EXISTS DELETE IT
IF EXIST "%folderName%" RMDIR "%folderName%" /S /Q
MKDIR "%folderName%"
:: ASSIGN DESTINATION FOLDER TO A VARIABLE
SET "destinationFolder=%CD%\%folderName%"
:: CREATE A LOG FILE IN DESTINATION FOLDER
SET "_report=%destinationFolder%\logxcopy.txt"
:: CREATE ERROR MESSAGE
IF NOT EXIST "%sourceDir%" (ECHO.Could not find %sourceDir% &GoTo:DONE)
:: OVERWRITE PREVIOUS LOG
>"%_report%" (
echo.%date% – %time%
echo.—————————————————
echo.
)
:: COPY FILES
FOR /F "Delims=" %%! IN ('DIR "%sourceDir%\" /b /s /a-d 2^>NUL') DO (
@ECHO.%%! &(
@XCOPY "%%!" "%destinationFolder%\" /i /y /h /f /c >>"%_report%",2>&1)
)
:DONE
TITLE,Done…….
ECHO.&PAUSE>NUL

view raw

batch copy file

hosted with ❤ by GitHub