Category Archives: Data

Why you (probably) don’t need a Data Lake

Summary

Among data specialist that do not work in the field of Big Data there can be confusion surrounding the term Data Lake. This is because there is apparent overlap in terms of role and function between Data Lakes and, the more traditional, Data Warehouses the likes of which data professionals will be more familiar with. This confusion is not helped by the term Data Lake itself being overloaded which will be discussed later in this article. However despite this overlap Data Lakes do occupy their own distinct role and perform functions Data Warehouses cannot.

Data Lakes have tremendous utility but damagingly there is also a mass of literature surrounding Data Lakes pushing the concept as a cure-all that coincidentally will also require you to migrate your organizations Business Intelligence center into the cloud. The following statements will hopefully dispel some of the associated hucksterism.

  • Data Lakes are not Data Warehouses 2.0, i.e. they are not the evolution of a Data Warehouse.
  • Data Lakes have not replaced Data Warehouses in performing the role of housing aggregated data.
  • Data Lakes will not free you from the burden of developing ETLs or establishing robust Data Architecture and strong Data Governance.

Introduction

It is important to first clarify that both Data Warehouses and Data Lakes are abstract concepts independent of any particular software or vendor. A Data Warehouse can be created in any database engine such as SQL Server, PostgreSQL, Oracle or MySql. Similarly a Data Lake can be deployed across any suitably large data storage platform, i.e. an on-site data center or hosted in the cloud.

In basic terms both Data Warehouses and Data Lakes can be thought of as the place where all data relevant to an organization’s goals is pulled together from various sources both internal and external (increasingly external). They both exist to facilitate an all encompassing view of an organization and how will it performs or provide a greater understanding of the organization’s environment, opportunities (e.g. customer preferences and habits) and threats. However they differ in terms of the data they are optimized to handle and are therefore better suited to different use cases.

 

What is a Data Warehouse?

A Data Warehouse is a method for storing and organising data that is optimized to support Business Intelligence (BI) activities such as analytics. To put it another way they solely exist and are constructed in a manner to best answer big questions efficiently. For this reason they typically hold vast quantities of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files but primarily transaction applications (Oracle, 2019). However in contrast to a transactional database were each transaction is a separate record, the relevant entries in a Data Warehouse are typically aggregated although they can also hold transaction records for archival purposes.

Figure 1: Typical Data Warehouse architecture of an SME (Databricks, 2019)

 

Single transaction records on their own are not typically very insightful to an organization, trying to identify consumer trends for example. Aggregating data based on facts and dimensions, e.g. the number of sales (fact) for a particular store (dimension), saves disk space and allows queries looking for that specific answer to be returned quickly. Data Warehouses mostly contain numeric data which is easily manipulated. As an example store sales might be the summation of thousands of rows of data to a single row.

 

Figure 2: Simplified example of a Data Warehouse internal structure (BIDataPro, 2018)

 

Data Warehouses also solve the problem of trying to derive information when there are too many sources, e.g. a multinational with thousands of store locations and subsidiaries, by creating a “single source of truth”. Effectively this means pulling all the information to one central location, transforming the data for uniformity and storing like for like data together. For example this could mean gathering all sales data from multiple locations and converting the currency to dollars. All of the data in one place together allows for different sources, which serve different purposes, to be combined via a single query. For example a report that links sales data and logistical data, coming from POS and SCM systems respectively, may not be possible with a single query if the systems are not linked. If best practices regarding I.T. security are being followed they certainly should not be.

Data Warehouses are fed from source systems using an extract, transform and load (ETL) solution. This means data is extracted from a source system, transformed to meet schema and business requirements of the Data Warehouse and then loaded. This is a data delivery method independent of any particular software vendor. There are various software to accomplish ETLs including the option to create a custom application. A variation of this process is extract, load and transform (ELT) in which the data is landed into tables raw and later transformed to meet the schema and business requirements of their intended final table. This method allows for greater auditability which could aid in regulatory compliance or post-mortems if the transformation process fails.

Once set up the Data Warehouse can facilitate statistical analysis, reporting, data mining and more sophisticated analytical applications that generate actionable information by applying machine learning and artificial intelligence (AI) algorithms (Oracle, 2019).

For an organization a single source of truth which will eliminate inconsistencies in reporting, establish a single set of global metrics and allow everyone in the organization to “sing from the same hymn sheet” is very important due to how beneficial the information provided is in directing informed decisions.

So if Data Warehouses have proven such an excellent platform for generating information why are alternatives needed? Well by design only a subset of the attributes are examined, so only pre-determined questions can be answered (Dixon, 2010). Also the data is aggregated so visibility into the lowest levels is lost (Dixon, 2010). The final major factor is that some of the most vital sources of information are no longer simply numerical in nature and generated by an organizations internal transactional system. So what has changed?

 

The Digital Universe

The data landscape has changed drastically in just a few short years. Like the physical universe, the digital universe is large and growing fast. It was estimated that by 2020 there would be nearly as many digital bits as there are stars in the observable universe (Turner, 2014). That estimate is somewhere in the region of 44 zettabytes, or 44 trillion gigabytes (Turner, 2014). Even though this quantity of data is already beyond human comprehension the rate of growth is probably the more impressive feat. For context there is over 10 times more data now than there was in 2013 when the digital universe was an estimated 4.4 zettabytes (Turner, 2014). The data we create and copy annually is estimated to reach 175 zettabytes by 2025 (Coughlin, 2018).

 

Where is all this data coming from?

The short answer is predominately us and the systems that service our needs. In the not too distant past the only entities to have computers generating and storing data were businesses, governments and other institutions. Now everyone has a computer of some description and with the advent of social media mass consumers became mass creators. When you stop to think of how many interactions a person has with electronic devices every day, directly or indirectly, you soon get a picture of how much data is actually being generated.

As an example of this endless generation of data the following are average social media usage stats over the course of one minute from 2018 (Marr, 2018):

  • Twitter users sent 473,400 tweets
  • Snapchat users shared 2 million photos
  • Instagram users posted 49,380 pictures
  • LinkedIn gained 120 new users

Other extraordinary data stats include (Marr, 2018):

  • Google processes more than 40,000 searches every second or 3.5 billion searches a day.
  • 5 billion people are active on Facebook every day. That’s one-fifth of the world’s population.
  • Two-thirds of the world’s population now owns a mobile phone.

Our way of life has become increasingly digitized with no better example than the effective global lockdown during the 2020 pandemic. Hundreds of millions of employees from around the world managed to continue working from home and did so effectively (Earley, 2020). This would have been unimaginable even by the late nineties. And yet as digitized as our world has become it is only the start. With emerging technologies such as self-driving cars, IoT smart devices and ever increasingly sophisticated robots entering our homes the 175 zettabytes of data by 2025 maybe a conservative estimate.

With so much of the stuff you would be forgiven for thinking all of this data is just a by-product but it is anything but. The data generated is an incredibly valuable asset if it can be analyzed properly and transformed into business relevant information.

 

What types of data are there?

The state of data within the digital universe can be summarized as structured, semi-structured and unstructured (Hammer, 2018).

The following is a non-exhaustive list of data types (Hammer, 2018):

  • CRM
  • POS
  • Financial
  • Loyalty card
  • Incident ticket
  • Email
  • PDF
  • Spreadsheet
  • Word processing
  • GPS
  • Log
  • Images
  • Social media
  • XML/JSON
  • Click stream
  • Forums
  • Blogs
  • Web content
  • RSS feed
  • Audio
  • Transcripts

Only the data types above in bold are suitable for aggregation (Hammer, 2018). The rest of the data types are typical of what now makes up a large proportion of the digital universe, and despite their value as data assets they are not suitable for storage or analysis within a Data Warehouse. This is because data needs to meet the predefined structure of a Data Warehouse in order for it to be accepted and aggregating these raw unstructured files, e.g. video and audio files etc., is not possible. So how are these types of valuable data turned into actionable information?

 

What is a Data Lake?

Data Warehouses have been utilized by data specialists for decades but the concept of Data Lakes is much more contemporary and much better suited to deal with storage, analysis and analytics of the semi-structured and unstructured data listed above. By design storage within a Data Lake of these kinds of data does not require files to be transformed as the file is kept in a raw state. Files can be simply copied from one file structure to another. Data Lakes also allow for working off the files directly which means the data can be used effectively immediately, i.e. as soon as it lands, rather than waiting weeks for the Data Warehouse developers to massage the data into a format that the data warehouse can accept if that is even possible (Hammer, 2018). Working with this type of data has become synonymous with the field of Big Data, which is defined by high velocity, high volume and high variability. As such the two methodologies of Data Warehouses and Data Lakes are not necessarily in competition with each other either, in fact depending on their definition (Data Lake is somewhat of an overloaded term (Bethke, 2017)) they could be argued to resolve difference problems and can complement each other when deployed within the same architecture.

There is some contention as to the definition of a Data Lake. Some would argue that original meaning  implied the Lake was a raw data reservoir solely (Bethke, 2017). By this definition the Data Lake is not too dissimilar to a staging area or Operational Data Store (ODS) in a data warehouse were raw copies of data from source systems are landed (Bethke, 2017). This would coincide with an ELT process as opposed to an ETL process. The transform and integration of the data happens later downstream during the populating of the data warehouse (Bethke, 2017). This understanding of a Data Lake still persists today in the minds of many data specialist as can be seen below in the overly simplified illustration.

Figure 3: Overly simplified illustration of a Data Lake architecture  (Hammer, 2018)

(Note: no indication of analysis being performed on the lake directly, the lake services the warehouse solely)

However it is an inaccurate understanding as the person who is credited with coining the term, James Dixon, used the following analogy when he explained a Data Lake:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” (Dixon, 2010)

By stating “various users of the lake can come to examine, dive in, or take samples” Dixon is clearly implying that a feature of the Data Lake is that it is accessible prior to the data being transformed and made available in a Data Warehouse.

This is where Data Lakes and Data Warehouses take an opposing strategy on applying structure to data which is perhaps why they are often mistaken as alternative competing concepts to each other. A Data Warehouse requires Schema on Write whereas a Data Lake uses Schema on Read.

With schema on write all of the relevant data structure needs to be prepared in advance which means all of the relevant business questions need to be thought of in advance. This rarely results in a situation where all the relevant stakeholders have their needs met and if they do it will not be for very long.  This scenario is workable by an organization looking to aggregate finance data they are very familiar with but it is especially difficult when dealing with Big Data were the questions are unknown.

With schema on read the schema is only applied when the data is read allowing for a schema that is adaptable to the queries being issued. This means you are not stuck with a predetermined one-size-fits-all schema (Pasqua, 2014). This allows for the storage of unstructured data and since it is not necessary to define the schema before storing the data it makes it easier to bring in new data sources on the fly. The exploding growth of unstructured data and overhead of ETL for storing data in RDBMS is the offered as a leading reason for the shift to schema on read (Henson, 2016).

When dealing with Big Data the problem of a predefined schema can be so burdensome that it can sink a data project or increase the time-to-value past the point of relevance (Pasqua, 2014). Using a schema on read approach on data as-is means getting value from it right away (Pasqua, 2014). The flexibility of Data Lakes in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner.

 

Data Lakes Scalability

By using schema on read the constraint on scale is virtually removed. The threat of a bottleneck still exists but now in the form of physical constraints in terms of the hardware available. This is why online cloud offerings such as Amazon S3 and Azure Data Lake from Microsoft have become so popular. Of course on-site data centers are also an option with Hadoop being a very popular solution which combines a Data Lake structure with analytically capabilities.  This level of scalability also safe guards against Data Silos. A Data Silo is an undesirable situation where only one group or a limited number of people in an organization have access to a source of data that has a broader relevance to people across an organization (Plixer, 2018).

Data Lakes are intended by design and philosophy to be an antithesis to Data Silos where all an organizations data is stored together in one lake. However centrally storing all data is not without significant security concerns and losing sight of what customer data is on hand can run afoul of numerous legal requirements such as GDPR.

 

Data Lakes Analysis & Analytics

A defining feature of Big Data analytics is the concept of bringing the analytics to the data rather than the data to the analytics. Traditionally analytics was carried out by feeding single flat files into an algorithm with the time taken to prepare these files being significant. Although accessing the raw files directly is potentially a failing as it has the potential to break the principle of a single source of truth and therefore runs the risk of introducing inconsistencies between reports and other forms of analysis. As you can imagine this is complex and disciplined work which is why Data Lakes, at this point in their maturity, are best suited to Data Scientists and advanced Data Analysts (Hammer, 2018). However this goes against the Data Lake ethos of “data for all” as it only allows the very skilled to have access. This creates the problem Data Lakes were meant to solve by imposing restrictions or “data for the select few”. With Data scientists acting as the gatekeepers an organizations stakeholders can lose sight of the useful data available to them. Worse still is that valuable data may come from external sources with stakeholders have no visibly of it prior to it landing in the Data Lake. This may leave stakeholders with no option but to take action based on an analysis produced by a Data Scientist with accuracy of the analysis being a matter of fate because the stakeholder has no data to say otherwise.  In comparison the creation of a Data Warehouse is usually a collaboration of stakeholders, familiar with internal sources systems and data, and developers. Once a Data Warehouse is created, far less skilled (and cheaper) Data Analysts will have the ability to navigate the internal structure and compile valuable reports.

Despite the obvious concerns the significance of scalability and direct raw data analysis cannot be overlooked. The sooner an organization is informed the sooner it can act. In real world terms this could save millions of dollars, save thousands of jobs or stop the organizations itself from going under. However the benefits of scalability and earlier data access are not without risks as poorly managed Data Lakes have the potential to turn into Data Swamps. Data Swamps are poorly managed Data Lakes that become a dumping ground for data. Though the data may be unstructured the method in which it is stored must not be or visibility of what is stored and where it is stored will be lost. Failure to catalogue the data letting users know what is available while making the attributes of the data known will overwhelm users and result in the garbage results (Hammer, 2018). Successful implementation of a Data Lake is complex and requires ongoing commitment to maintain but for a large organization that needs to make better use of the wider range of data available in the digital universe a Data Lake is a necessity.

 

Conclusion

A Data Lake is not a replacement for a Data Warehouse. Data Lakes are better equipped to solve the different problems associated with dealing with semi-structured to unstructured data. Their flexibility in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner. However Data Lakes are not without their drawbacks. They require highly skilled and expensive staff to develop and maintain. They potentially run a greater risk of failing spectacularly by devolving into a Data Swamp and could potentially become a serious liability from a regulatory standpoint if this was to happen. Organisations can also be left at the mercy of Data Scientists in how accurate they are in analyzing data and producing correct reports as stakeholders may not have the expertise to retrieve data from the Data Lake themselves. 

Thankfully Data Warehouses are still perfectly suited for dealing with numeric data and for organizations that still predominately use their own internal transactional systems in the creation of actionable information these organisations have no immediate need to utilize any alternatives.

 

 

References:

Bethke, U. (2017) ‘Are Data Lakes Fake News?’, Sonra, 8 August. Available at: http://www.kdnuggets.com/2017/09/data-lakes-fake-news.html (Accessed: 4 July 2020).

BIDataPro (2018) ‘What is Fact Table in Data Warehouse’, BIDataPro, 23 April. Available at: https://bidatapro.net/2018/04/23/what-is-fact-table-in-data-warehouse/ (Accessed: 4 July 2020).

Coughlin, T. (2018) 175 Zettabytes By 2025, Forbes. Available at: https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-by-2025/ (Accessed: 4 July 2020).

Databricks (2019) ‘Unified Data Warehouse’, Databricks, 8 February. Available at: https://databricks.com/glossary/unified-data-warehouse (Accessed: 4 July 2020).

Dixon, J. (2010) ‘Pentaho, Hadoop, and Data Lakes’, James Dixon’s Blog, 14 October. Available at: https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ (Accessed: 4 July 2020).

Earley, K. (2020) Google and Facebook extend work-from-home policies to 2021, Silicon Republic. Available at: https://www.siliconrepublic.com/companies/google-facebook-remote-work-until-2021 (Accessed: 5 July 2020).

Hammer, D. (2018) What is a data lake? – The Hammer | The Hammer. Available at: https://www.sqlhammer.com/what-is-a-data-lake/ (Accessed: 4 July 2020).

Henson, T. (2016) ‘Schema On Read vs. Schema On Write Explained’, Thomas Henson, 14 November. Available at: https://www.thomashenson.com/schema-read-vs-schema-write-explained/ (Accessed: 6 July 2020).

Marr, B. (2018) How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, Forbes. Available at: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ (Accessed: 5 July 2020).

Oracle (2019) What Is a Data Warehouse | Oracle Ireland. Available at: https://www.oracle.com/ie/database/what-is-a-data-warehouse/ (Accessed: 5 July 2020).

Pasqua, J. (2014) Schema-on-Read vs Schema-on-Write, MarkLogic. Available at: https://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/ (Accessed: 6 July 2020).

Plixer (2018) What is a Data Silo and Why is It Bad for Your Organization? Available at: https://www.plixer.com/blog/data-silo-what-is-it-why-is-it-bad/ (Accessed: 6 July 2020). Turner, V. (2014) The Digital Universe of Opportunities. Available at: https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm (Accessed: 5 July 2020).

How to handle a Database creation request

If you are working as a DBA you may find that developers will ask you to create a database having given little thought to what the database will be used for or the impact the database could have to the resources or security of the hosting environment.

If you find yourself in that situation I would suggest you walk the requester through the questionnaire from the previous article “How to determine where a new database should be deployed“.

Once you have completed that process I would then suggest that an official request to deploy a new database be made using the DATABASE REQUEST FORM provided here link. If you have a change request process I would still suggest you use this form. Having a database specific request form covers more relevant and vital information.

This is a fairly high level request form with most of the technical details still to be defined by the DBA but it provides documentation of the request and states the requester’s initial expectations and requirements.

Following the database deployment if the actual footprint of the database does not match up with what was agreed the form will confirm if the requested resources were under specced or misleading.

The form is outlined as below.

DATABASE REQUEST FORM image

Some important points the form clarifies:

There’s a difference between requester and owner. If the database runs into any problems you don’t want to be contacting the temporary intern that requested it instead of say the department head.

The application the database supports. Most of the time the database name will have some tie-in to the application name but maybe it does not. For instance the database could be named something generic like Reporting which could be the back end for really anything.

The form asks the requester to prepare a profile for the database. I could have named this section “who is your daddy and what does he do?”. If the requester states they are looking for a reporting database but operationally it’s running as a transnational database, that could make a big difference in terms of the resources provided for the database and underlying hardware.

Possibly most importantly the form helps to establish the likely impact of the new database with the Resource Impact Estimation section. For example if a requester asks for 10 Gb of space and states they expect space usage to increase by 5 Gb a year but the disk has lost a terabyte in the first few months the form will clarify who got their numbers wrong.

The user and groups section will clarify who should have access to the database. Effectively everything related to data and data access should be okayed by a compliance officer to confirm everything is GDPR compliant. This form will assist the compliance officer in establishing that.

The Business continuity & Upkeep section is really the domain of the DBA but it helps to get requester input on these matters. For instance establishing maintenance windows.

If you have any additional questions you feel should be on the form please feel free to contact me and I’ll add them.

 

 

How to assess a SQL Server instance for GDPR compliance by writing every table and column to Excel

This post uses the script I had written before here LINK modified slightly to include a count of each table. If you are a DBA you’re likely assisting compliance offers to find personal identifiable data within the databases at this time. This script will allow you to provide them with a record of every database, table and column on an entire instance. The compliance offer can then sieve through all the columns and highlight any columns that look like they contain personal data for further investigation.

Below is a SQL query that will return the following metadata about each table from each database located on a SQL server database server:

  • ServerName
  • DatabaseName
  • SchemaName
  • TableName
  • CountOfRows
  • ColumnName
  • KeyType

The output is provided in a table format with these additional formatted lines of text which can be used as queries or as part of queries.

  • A Select table query
  • A Count table row columns
  • A Select column query
  • Each column bracketed
  • Each table and column bracketed

The table returned by the query can be exported to excel. Using excels filter option applied to the columns of the table makes finding and selecting specific tables and columns very easy.

This process can be repeated for every SQL Server instance used by the business to generate a single mapped servers master excel file allowing the user to find any table or column available to the organization quickly.

Applying some colour coding like below adds to the ease of use.

Image of excel file with mapped database server structure

How to use:

Simply open SQL Server Management Studio and from object explorer right click on the server name and select new query. This will open a window set to the master database of the server. Copy and paste the SQL below into this SQL Server window and execute. When the query is finished you will have created the table above.

/*
SCRIPT UPDATED
20180316
*/

USE [master]
GO

/*DROP TEMP TABLES IF THEY EXIST*/
IF OBJECT_ID('tempdb..#DatabaseList') IS NOT NULL
	DROP TABLE #DatabaseList;

IF OBJECT_ID('tempdb..#TableStructure') IS NOT NULL
	DROP TABLE #TableStructure;

IF OBJECT_ID('tempdb..#ErrorTable') IS NOT NULL
	DROP TABLE #ErrorTable;

IF OBJECT_ID('tempdb..#MappedServer') IS NOT NULL
	DROP TABLE #MappedServer;
	
IF OBJECT_ID('tempdb..#TableCount') IS NOT NULL
	DROP TABLE #TableCount;
	
IF OBJECT_ID('tempdb..#Count') IS NOT NULL
	DROP TABLE #Count;

DECLARE @ServerName AS SYSNAME
DECLARE @Count INT

SET @ServerName = @@SERVERNAME

CREATE TABLE #DatabaseList (
	Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY
	,ServerName SYSNAME
	,DbName SYSNAME
	);

CREATE TABLE [#TableStructure] (
	[DbName] SYSNAME
	,[SchemaName] SYSNAME
	,[TableName] SYSNAME
	,[ColumnName] SYSNAME
	,[KeyType] CHAR(7)
	) ON [PRIMARY];
	
CREATE TABLE [#TableCount] (
	[Id_TableCount] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY
	,[ServerName] SYSNAME
	,[DatabaseName] SYSNAME
	,[SchemaName] SYSNAME
	,[TableName] SYSNAME
	,[SQLCommand] VARCHAR(MAX)
	,[TableCount] INT
	);

CREATE TABLE #Count (ReturnedCount INT);

/*THE ERROR TABLE WILL STORE THE DYNAMIC SQL THAT DID NOT WORK*/
CREATE TABLE [#ErrorTable] ([SqlCommand] VARCHAR(MAX)) ON [PRIMARY];

/*
A LIST OF DISTINCT DATABASE NAMES IS CREATED
THESE TWO COLUMNS ARE STORED IN THE #DatabaseList TEMP TABLE
THIS TABLE IS USED IN A FOR LOOP TO GET EACH DATABASE NAME
*/
INSERT INTO #DatabaseList (
	ServerName
	,DbName
	)
SELECT @ServerName
	,NAME AS DbName
FROM master.dbo.sysdatabases WITH (NOLOCK)
WHERE NAME <> 'tempdb'
ORDER BY NAME ASC

/*VARIABLES ARE DECLARED FOR USE IN THE FOLLOWING FOR LOOP*/
DECLARE @sqlCommand AS VARCHAR(MAX)
DECLARE @DbName AS SYSNAME
DECLARE @i AS INT
DECLARE @z AS INT

SET @i = 1
SET @z = (
		SELECT COUNT(*) + 1
		FROM #DatabaseList
		)

/*WHILE 1 IS LESS THAN THE NUMBER OF DATABASE NAMES IN #DatabaseList*/
WHILE @i < @z
BEGIN
	/*GET NEW DATABASE NAME*/
	SET @DbName = (
			SELECT [DbName]
			FROM #DatabaseList
			WHERE Id = @i
			)
	/*CREATE DYNAMIC SQL TO GET EACH TABLE NAME AND COLUMN NAME FROM EACH DATABASE*/
	SET @sqlCommand = 'USE [' + @DbName + '];' + '

INSERT INTO [#TableStructure]
SELECT DISTINCT ' + '''' + @DbName + '''' + ' AS DbName
	,SCHEMA_NAME(SCHEMA_ID) AS SchemaName
	,T.NAME AS TableName	
	,C.NAME AS ColumnName
	,CASE 
		WHEN OBJECTPROPERTY(OBJECT_ID(iskcu.CONSTRAINT_NAME), ''IsPrimaryKey'') = 1 
			THEN ''Primary'' 
		WHEN OBJECTPROPERTY(OBJECT_ID(iskcu.CONSTRAINT_NAME), ''IsForeignKey'') = 1 
			THEN ''Foreign''
		ELSE NULL 
		END AS ''KeyType''
FROM SYS.TABLES AS t WITH (NOLOCK)
INNER JOIN SYS.COLUMNS C ON T.OBJECT_ID = C.OBJECT_ID
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS iskcu WITH (NOLOCK) 
ON SCHEMA_NAME(SCHEMA_ID) = iskcu.TABLE_SCHEMA 
	AND T.NAME = iskcu.TABLE_NAME
	AND C.NAME = iskcu.COLUMN_NAME
ORDER BY SchemaName ASC
	,TableName ASC
	,ColumnName ASC;
';

	/*ERROR HANDLING*/
	BEGIN TRY
		EXEC (@sqlCommand)
	END TRY

	BEGIN CATCH
		INSERT INTO #ErrorTable
		SELECT (@sqlCommand)
	END CATCH

	SET @i = @i + 1
END

/*
JOIN THE TEMP TABLES TOGETHER TO CREATE A MAPPED STRUCTURE OF THE SERVER
ADDITIONAL FIELDS ARE ADDED TO MAKE SELECTING TABLES AND FIELDS EASIER
*/
SELECT DISTINCT @@SERVERNAME AS ServerName
	,DL.DbName
	,TS.SchemaName
	,TS.TableName
	,TS.ColumnName
	,TS.[KeyType]
	,',' + QUOTENAME(TS.ColumnName) AS BracketedColumn
	,',' + QUOTENAME(TS.TableName) + '.' + QUOTENAME(TS.ColumnName) AS BracketedTableAndColumn
	,'SELECT * FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + '--WHERE --GROUP BY --HAVING --ORDER BY' AS [SelectTable]
	,'SELECT ' + QUOTENAME(TS.TableName) + '.' + QUOTENAME(TS.ColumnName) + ' FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + '--WHERE --GROUP BY --HAVING --ORDER BY' AS [SelectColumn]
	,'SELECT COUNT(*) FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + 'WITH (NOLOCK)' AS [PerformTableCount]
INTO #MappedServer
FROM [#DatabaseList] AS DL
INNER JOIN [#TableStructure] AS TS ON DL.DbName = TS.DbName
ORDER BY DL.DbName ASC
	,TS.SchemaName ASC
	,TS.TableName ASC
	,TS.ColumnName ASC

/*
HOUSE KEEPING
*/
IF OBJECT_ID('tempdb..#DatabaseList') IS NOT NULL
	DROP TABLE #DatabaseList;

IF OBJECT_ID('tempdb..#TableStructure') IS NOT NULL
	DROP TABLE #TableStructure;
	
INSERT INTO #TableCount (
	[ServerName]
	,[DatabaseName]
	,[SchemaName]
	,[TableName]
	,[SQLCommand]
	)
SELECT DISTINCT [ServerName]
	,[DbName]
	,[SchemaName]
	,[TableName]
	,[PerformTableCount]
FROM #MappedServer
ORDER BY [ServerName] ASC
	,[DbName] ASC
	,[SchemaName] ASC
	,[TableName] ASC

SET @i = 1
SET @z = (
		SELECT COUNT(*) + 1
		FROM #TableCount
		)
	
WHILE @i < @z
BEGIN
	SET @SQLCommand = (
			SELECT SQLCommand
			FROM #TableCount
			WHERE Id_TableCount = @i
			)

	--ERROR HANDLING
	BEGIN TRY
		INSERT INTO #Count
		EXEC (@SqlCommand)
	END TRY

	BEGIN CATCH
		INSERT INTO #ErrorTable
		SELECT (@sqlCommand)
	END CATCH

	SET @Count = (
			SELECT ReturnedCount
			FROM #Count
			)

	TRUNCATE TABLE #Count

	UPDATE #TableCount
	SET TableCount = @Count
	WHERE Id_TableCount = @i;

	SET @i = @i + 1
END

SELECT *
FROM #ErrorTable;

IF OBJECT_ID('tempdb..#ErrorTable') IS NOT NULL
	DROP TABLE #ErrorTable;

/*
THE DATA RETURNED CAN NOW BE EXPORTED TO EXCEL
USING A FILTERED SEARCH WILL NOW MAKE FINDING FIELDS A VERY EASY PROCESS
*/
SELECT DISTINCT ms.ServerName
	,ms.DbName
	,ms.SchemaName
	,ms.TableName
	,ts.TableCount AS CountOfRows
	,ms.ColumnName
	,ms.KeyType
	,ms.BracketedColumn
	,ms.BracketedTableAndColumn
	,ms.SelectColumn
	,ms.SelectTable
FROM #MappedServer AS ms
LEFT JOIN #TableCount AS ts ON ms.ServerName = ts.ServerName
AND ms.DbName = ts.DatabaseName
AND ms.SchemaName = ts.SchemaName
AND ms.TableName = ts.TableName
ORDER BY ms.DbName ASC
	,ms.SchemaName ASC
	,ms.TableName ASC
	,ms.ColumnName ASC;
	
IF OBJECT_ID('tempdb..#MappedServer') IS NOT NULL
	DROP TABLE #MappedServer;

IF OBJECT_ID('tempdb..#TableCount') IS NOT NULL
	DROP TABLE #TableCount;
	
IF OBJECT_ID('tempdb..#Count') IS NOT NULL
	DROP TABLE #Count;

 

How to determine where a new database should be deployed

Below I’ve listed 22 questions to consider when deciding where a new SQL Server database should be deployed given the four main options of:

  1. Add Database to existing Instance
  2. Create separate Instance for Database
  3. Create separate VM for Instance and Database
  4. New physical machine needed

There’s no hard and fast rule for determining where a database should go however there are often a lot of limitations dictating your choices. I’ve started the list with a big limitation, money.

If you want to use the list like a score card I’ve prepared one called “New Database Hosting Environment Assessment Form”, I know not the snappiest of titles.

New Database Hosting Environment Assessment Form

Use: I’d recommend stepping through the questions with the requester while gathering feedback. You can then fill in the option scoring with weighted figures or simply use the boxes to record notes. The main deliverable of the form is you will have some documentation that demonstrates the location for the database was assessed and the logic and reasoning for the location chosen has been recorded.

Of course consider adding additional questions relevant to your case to the form. If you think your additional questions would be relevant to everyone deploying a new database free to forward your questions on to me and I’ll include them in the form.

The form can be downloaded here link.

Question 1: Can the business afford a new physical machine?

Considerations: If the financial resources are not available option four
can be eliminated.

Question 2: Are there OS or SQL Server Licensing restrictions?

Considerations: If there are no restrictions to a new VM with it’s own instance this will likely offer the most scalable solution and produce the best performance. If restrictions exist options are limited to 1 & 2.

Question 3: Is the Database the back end for a stand-alone application?

Considerations: The back end for a stand-alone application should be isolated as much as possible from other impacting factors. The higher the number of the option chosen the better.

Question 4: What is the primary purpose of the Database?

Considerations: What is the business use\s and in what capacity will the database function i.e. Transactional Db, Reporting Db, Datastore?

Question 5: Do you want\need to be able to limit access to hardware resources?

Considerations: Access to resources, ram etc. can only be limited at an instance level. Option 1 cannot be used.

Question 6: Are there any SQL Server Service Pack dependencies?

Considerations: SQL Server can have different SPs on different instances but cannot have different SPs for different databases within an instance.

Question 7: What is the current excess capacity of the hardware?

Considerations: Can the Hardware handle the additional workload? If not either reduce resource consumption of resources on the hardware, add RAM and/or cores or choose option 4.

Question 8: Is there a VM capacity restraint on the hardware?

Considerations: Can the hardware actually host a new VM without a trade off in resources or a decrease in performance? If restrained option 3 cannot be used.

Question 9: What is the VM capacity?

Considerations: Is the OS already maxed out on resources? Can the VM handle the additional workload?

Question 10: Is there an expected increase in size and usage in the Database over time?

Considerations: If known and minimal option 1 can be used. If unknown or unlimited the higher the number of the option chosen the better.

Question 11: Is the resource usage of the new Database known?

Considerations: Benchmarking RAM, CPU and bandwidth usage should be carried out prior to installation.

Question 12: What are the disaster recovery requirements?

Considerations: Should the databases that share common dependencies be hosted on
the same server?

Question 13: What is the required operational up time? 24/7 etc.

Considerations: Does this operational run time tie in with the rest of the databases
on the instance, or the instances hosted on a VM or physical server?

Question 14: What are the Maintenance requirements?

Considerations: Will new index rebuilds etc. take a long time and effect the schedule of the instance overall?

Question 15: What are the Backup requirements?

Considerations: Will the backup schedule of the other databases be impacted
by the addition?

Question 16: Is the Database functionally similar or supporting databases currently on the instance?

Considerations: Does the new Database logically fit in with the database/s currently running on an instance?

Question 17: Have server security concerns been addressed?

Considerations: Will people with administrative access to the server have access to a Database and data they should not be able to see?

Question 18: Does hosting particular databases together compound the potential
damage If an unauthorised person was to gain access to the server?

Considerations: Will an unauthorised person have access to data that represents a major security breach if the data is combined from different databases?

Question 19: Does a vendor have control over the instance or server?

Considerations: Will putting a Database on a particular instance leave you at the mercy of a vendor?

Question 20: Will stacking instances make tracking down performance issues across
the VM difficult?

Considerations: Will this create a server level noisy neighbour problem?

Question 21: Will packing databases too densely make tracking down performance issues across the Instance difficult?

Considerations: Will this create an instance level noisy neighbour problem?

Question 22: Will moving the Database to be hosted somewhere else in the
future be possible?

Considerations: Does this decision need to be gotten right the first time?

How to pass arguments from command line to a console application written in C#

This is a simple tutorial on passing arguments or parameter values from command line to a console application written in C#. Using the example below you should be able to edit and expand on the logic to fit your own needs.

First you’ll need to create a new Visual Studio C# console application, to do so follow these steps:

To create and run a console application

  1. Start Visual Studio.

  2. On the menu bar, choose FileNewProject.
  3. Expand Installed, expand Templates, expand Visual C#, and then choose Console Application.
  4. In the Name box, specify a name for your project, and then choose the OK button.
  5. If Program.cs isn’t open in the Code Editor, open the shortcut menu for Program.cs in Solution Explorer, and then choose View Code.
  6. Replace the contents of Program.cs with the following code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace TestArgsInput
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
				// Display message to user to provide parameters.
                System.Console.WriteLine("Please enter parameter values.");
                Console.Read();
            }
            else
            {
                // Loop through array to list args parameters.
                for (int i = 0; i < args.Length; i++)
                {
                    Console.Write(args[i] + Environment.NewLine);
                    
                }
                // Keep the console window open after the program has run.
                Console.Read();
            }
        }
    }
}

 

The Main method is the entry point of a C# application. When the application is started, the Main method is the first method that is invoked.

The parameter of the Main method is a String array that represents the command-line arguments. Usually you determine whether arguments exist by testing the Length property as in the example above.

When run the example above will list out the parameters you have provided to the command window. The delimiter for command line separating arguments or parameter values is a single space. For example the following would be interpreted as two arguments or parameter values:

“This is parameter 1” “This is parameter 2”

If the arguments were not enclosed by double quotes each word would be considered an argument.

To pass arguments to the console application when testing the application logic the arguments can be written into the debug section of the project properties as shown below.

TestArgs

So if the app is run with the command line arguments provided as above in the image the command window will list:
Parameter 1
Parameter 2
If you would like to know how to create a console application in Visual Studio that won’t open a command window when it runs see this tutorial link.
If you would like to know how to create a csv file with C# see this tutorial link.

How to copy a large result set from SQL Server Management Studio to Excel

So you’ve tried copying and pasting the results of a query into an excel file only to get the out of memory exception. Now the fun starts!

There are multiple options to achieve copying data from SQL Server Management Studio to excel however most of them are a pain.

You’ve tried the save as option but the csv and text files have jumbled up content. You could use the Management Studio export function, but this is intended for physical tables not results sets and you might not have the permissions to create tables in the environment. You could use the bcp Utility but you’ve probably read leaving this option turned on represents a security risk. You could create an SSIS package . . . yeah that’s an efficient and effective option when you just want the damn results of an ad hoc query!!!

I’d suggest splitting the result set into chunks. You can then copy and paste the chunks into the excel file without running out of memory. Sure it’s kinda manual but trust me it’ll take less time than the options above.

I’d wager you probably only need the result set split into two, so you’ve to copy and paste twice rather than once. Not that big a deal right? I’ve even provided some code below that will really move things along.

Start by writing your query results into a temporary tablet called #QueryResult, for example SELECT * INTO #QueryResult FROM TableName.

Then all you need to do is determine how many segments you need. NTILE(n) is a function that allocates your output into n segments, each of the same size (give or take rounding when the number of rows isn’t divisible by n).

So this produces an output like:

Id Name Ntile
1 Mickey 1
2 Leo 1
3 Raph 2
4 Donnie 2

Start by leaving n set to the default of 2. Once the data is written to the table #QueryResult run the code below in the same SSMS window the temp table was created in. Running the code should produce the same number of returned result sets as the n value you provided. Use a higher n number to create more segments if you still run out of memory when you try to copy and paste the first segment.

/*
Write your query results to a temp table here
i.e. SELECT * INTO #QueryResult FROM TableName
*/
DECLARE @n INT
DECLARE @i INT

/*
Set n to how many segments/results set returned you need
*/
SET @n = 2
SET @i = 1

SELECT *
	,NTILE(@n) OVER (
		ORDER BY RowNum
		) AS NtileGroup
INTO #Export
FROM (
	SELECT ROW_NUMBER() OVER (
			ORDER BY (
					SELECT NULL
					)
			) AS RowNum
		,*
	FROM #QueryResult
	) AS a

WHILE @i <= @n
BEGIN
	SELECT *
	FROM #Export
	WHERE NtileGroup = @i
	ORDER BY RowNum ASC

	SET @i = @i + 1
END

DROP TABLE #QueryResult

DROP TABLE #Export

 

So that’s it, you should now be able to copy and paste your results. Maybe someday in the future Microsoft will add the option of saving results directly to excel . . .

How to assess T-SQL code quickly

I’m sure you’re an excellent SQL coder writing beautiful efficient queries, but your predecessor . . . well they might have just been lucky to have a job.

Going through someone else’s bad code is usually tiresome, tedious and often very confusing.

I’ve created the T-SQL Assessor excel file to help in this task.

DOWNLOAD (Dropbox link)

The assessor will colour code the sql to highlight the lines of importance. With the Key Word column you can then simply filter to words like INSERT, UPDATE, MERGE and EXEC to see where the data is going or filter the column by the word FROM to see where the data has come from.

To use the T-SQL Assessor file you will first have to format your code using Poor Man’s T-SQL Formatter. This excellent tool can be installed in Visual Studio, SQL Server Management Studio or Notepad++.

http://architectshack.com/PoorMansTSqlFormatter.ashx

You can also use the online option:

http://poorsql.com/

Poor Man’s T-SQL Formatter makes text that contains a SQL command a new line, so you can’t have INSERT and FROM on one line. This is what allows Excel formula’s to highlight the lines with key words as each line can only contain one key word, excluding comments.

Once the code is formatted simply paste it into the first sheet of the file, “SQL”.

That’s it, all the work is then done for you on the second sheet of the file, “SQL Assessed”

T-SQL Assessor is also great at preparing a report from a schema compare script created by Visual Studio. It’s very annoying Microsoft didn’t provide a way of exporting the comparison directly into excel the way Redgate did but this will help. Simply filter the file to only include the keywords.

DOWNLOAD (Dropbox link)

How to archive tables that contain a keyword from one database to another using T-Sql

In the previous article I wrote about how to identify and remove unwanted tables. Link

In that tutorial I suggested prefixing the tables you want to remove with _DELETE_.

I also suggested you may want to archive the tables in some manner before you delete them. This could be because there is a chance someone might come looking for the data that was in a deleted table or something might break by removing the table and you might want to put it back asap.

If you have only a few tables you could script the tables out but if you have a lot of tables that becomes a little unmanageable. Also if the tables total in size to 10 Gb the script to recreate the tables will be a lot lot larger.

An alternative method is to create an Archive database, copy the tables across to this target database and then delete the tables in the source database. You can then backup and drop the Archive database saving the .bak file somewhere cheaper.

The script below will allow you to do just that. It prints the T-Sql to do the job, it doesn’t carry out the job, so it’s completely safe to execute and review.

To use the script below create a target database.

Use the target database name for the variable value @TargetDb

Use the source database name for the variable value @SourceDb

The @KeyWord variable is used to gather all the tables that contain the string of choice, in the example below _DELETE_.

The @RemoveKeyWord variable is a flag that will remove the keyword string from the target database table name, e.g. _DELETE_Sales will become Sales.

IF OBJECT_ID('tempdb..#Table') IS NOT NULL
	DROP TABLE #Table
GO

DECLARE @RemoveKeyWord BIT
DECLARE @Id AS INT
DECLARE @KeyWord AS VARCHAR(256)
DECLARE @SourceDb AS SYSNAME
DECLARE @TargetDb AS SYSNAME
DECLARE @TableName AS SYSNAME
DECLARE @SchemaName AS SYSNAME
DECLARE @Sql AS VARCHAR(MAX)

SET @RemoveKeyWord = 1
SET @KeyWord = '_DELETE_'
SET @TargetDb = 'Archive'
SET @SourceDb = 'Source'
SET NOCOUNT ON

CREATE TABLE #Table (
	Id_Table INT IDENTITY(1, 1)
	,SchemaName SYSNAME
	,TableName SYSNAME
	);

SET @Sql = '
INSERT INTO #Table (
	SchemaName
	,TableName
	)
SELECT s.NAME
	,so.NAME
FROM ' + QUOTENAME(@SourceDb) + '.sys.tables AS so
LEFT JOIN ' + QUOTENAME(@SourceDb) + '.sys.schemas AS s ON so.schema_id = s.schema_id
WHERE so.NAME LIKE ' + '''' + '%' + @KeyWord + '%' + '''' + '
ORDER BY s.NAME ASC'

EXEC (@Sql)

SET @SchemaName = ''

WHILE @SchemaName IS NOT NULL
BEGIN
	SET @SchemaName = (
			SELECT MIN(SchemaName)
			FROM #Table
			WHERE SchemaName > @SchemaName
				AND SchemaName <> 'dbo'
			)

	PRINT 'USE ' + QUOTENAME(@TargetDb) + ';
GO
	
IF NOT EXISTS (SELECT * FROM ' + QUOTENAME(@TargetDb) + '.sys.schemas WHERE name = ' + '''' + @SchemaName + '''' + ')
  BEGIN
    EXEC (' + '''' + 'CREATE SCHEMA ' + QUOTENAME(@SchemaName) + ';' + '''' + ');
  END;
  
'
END

SET @Id = 1

WHILE @Id IS NOT NULL
BEGIN
	SELECT @TableName = TableName
		,@SchemaName = SchemaName
	FROM #Table
	WHERE Id_Table = @Id

	IF @RemoveKeyWord = 1
	BEGIN
		PRINT '
SELECT *
INTO ' + QUOTENAME(@TargetDb) + '.' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(REPLACE(@TableName, @KeyWord, '')) + ' 
FROM ' + QUOTENAME(@SourceDb) + '.' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName)
	END
	ELSE
	BEGIN
		PRINT '
SELECT *
INTO ' + QUOTENAME(@TargetDb) + '.' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + ' 
FROM ' + QUOTENAME(@SourceDb) + '.' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName)
	END

	SET @Id = (
			SELECT MIN(Id_Table)
			FROM #Table
			WHERE Id_Table > @Id
			)
END

How to identify and remove unused tables in SQL Server with T-Sql

In a perfect world an organisation should never get itself into the situation where tables need to be identified as functionally obsolete and removed in bulk. Schema changes should be stepped through the cycle of development, test, staging and live with developers cleaning as they go, but we don’t live in a perfect world as you may have noticed.

What do I mean by functionally obsolete? This means the tables are no longer being interacted with by user generated objects like stored procedures, functions or views. There may also be tables that could be classified as business redundant. That is they are being referenced by user generated objects frequently but they no longer have a use to the business, i.e. one job may have been replaced by another without the former being disabled or dropped. Business redundant objects are more difficult to determine and finding them may require input from multiple stakeholders.

A good approach for removing objects is to rename the objects first. This makes it easier to put the environment back the way it was if there are any problems encountered. After a set period of time if there is no impact on the overall environment script out the object then drop it. (Obviously do this in a test environment first if possible)

The script below is mostly a light weight SELECT statement that can be run on any environment. It does not execute any of the code it generates. It uses the sys.dm_db_index_usage_stats dynamic management view to determine when the tables were last interacted with. Interaction being defined as the following actions being applied against the object, update, seek, scan, lookup.

Caveat: Entries in this view reset to NULL after a Server reboot. Also the DMV has been known to be a bit unreliable with earlier versions of SQL Server with cases of the view being reset when a full index rebuild is carried out against a table. So don’t execute any code without first reviewing it.

Tables with NULL values for the fields below should be tables that have not been referenced at all or at least since the last time the server was rebooted.

LastUserUpdate
LastUserSeek
LastUserScan
LastUserLookup

Removing these objects is the low hanging fruit of cleaning up an environment. The script also provides stats on how many times these interactions happened and the size of the object. All these stats together should help you determine if an object is functionally redundant or business redundant and can be removed.

To aid further in the cleanup the script also creates the fields Action, Comments, Renamed, RenamedDate, RenameForDeletion, RestoreOriginalName, DropTable, DroppedDate.

The script output can then be copied and pasted into an Excel spread sheet and used to coordinate and track the cleanup progress.

SET NOCOUNT ON

DECLARE @Database TABLE ([DbName] [sysname])
DECLARE @DbName AS [sysname]
DECLARE @Sql AS [varchar] (max)

IF OBJECT_ID('tempdb..#TableStats', 'U') IS NOT NULL
	DROP TABLE #TableStats

IF OBJECT_ID('tempdb..#IndexStats', 'U') IS NOT NULL
	DROP TABLE #IndexStats

IF OBJECT_ID('tempdb..#TableUsageStats', 'U') IS NOT NULL
	DROP TABLE #TableUsageStats

IF OBJECT_ID('tempdb..#TableSizeStats', 'U') IS NOT NULL
	DROP TABLE #TableSizeStats

CREATE TABLE #TableStats (
	[DbName] [sysname]
	,[SchemaName] [sysname]
	,[ObjectId] [bigint]
	,[TableName] [sysname]
	,[ModifiedDate] [datetime]
	);

CREATE TABLE #IndexStats (
	[DbName] [sysname]
	,[ObjectId] [bigint]
	,[HasIndex] [bit]
	);

CREATE TABLE #TableSizeStats (
	[DbName] [varchar](255) NULL
	,[SchemaName] [varchar](255) NULL
	,[ObjectId] [bigint]
	,[TableName] [varchar](255) NULL
	,[RowCount] [bigint] NULL
	,[AvailableSpacePercentage] [numeric](6, 2) NULL
	,[UnusedSpaceGb] [numeric](10, 3) NULL
	,[UsedSpaceGb] [numeric](10, 3) NULL
	,[TotalSpaceGb] [numeric](10, 3) NULL
	,[UnusedSpaceMb] [numeric](13, 3) NULL
	,[UsedSpaceMb] [numeric](13, 3) NULL
	,[TotalSpaceMb] [numeric](13, 3) NULL
	,[UnusedSpaceKb] [bigint] NULL
	,[UsedSpaceKb] [bigint] NULL
	,[TotalSpaceKb] [bigint] NULL
	)

CREATE TABLE #TableUsageStats (
	[DbName] [sysname]
	,[ObjectId] [bigint]
	,[TableName] [nvarchar](128) NULL
	,[LastUserUpdate] [datetime] NULL
	,[LastUserSeek] [datetime] NULL
	,[LastUserScan] [datetime] NULL
	,[LastUserLookup] [datetime] NULL
	,[UserUpdateCount] [bigint] NOT NULL
	,[UserSeekCount] [bigint] NOT NULL
	,[UserScanCount] [bigint] NOT NULL
	,[UserLookupCount] [bigint] NOT NULL
	);

SET @DbName = '';

INSERT INTO @Database (DbName)
SELECT NAME
FROM sys.databases 
WHERE NAME NOT IN (
		'tempdb'
		,'master'
		,'mode'
		,'model'
		)
AND state_desc = 'ONLINE'
ORDER BY NAME ASC;

WHILE @DbName IS NOT NULL
BEGIN
	SET @DbName = (
			SELECT MIN(DbName)
			FROM @Database
			WHERE DbName > @DbName
			);
	SET @Sql = '
INSERT INTO #TableStats (
	DbName
	,schemaName
	,ObjectId
	,TableName
	,ModifiedDate
	)
SELECT DbName
	,SchemaName
	,ObjectId
	,TableName
	,ModifiedDate
FROM (
SELECT DISTINCT ' + '''' + @DbName + '''' + ' AS DbName
	,s.NAME AS SchemaName
	,t.object_id AS ObjectId
	,t.NAME AS TableName
	,t.modify_date AS ModifiedDate
FROM ' + QUOTENAME(@DbName) + '.sys.tables AS t
INNER JOIN ' + QUOTENAME(@DbName) + '.sys.schemas AS s ON t.schema_id = s.schema_id
LEFT JOIN ' + QUOTENAME(@DbName) + '.sys.extended_properties AS ep ON ep.major_id = t.[object_id] /*Exclude System Tables*/
WHERE t.NAME IS NOT NULL
	AND s.NAME IS NOT NULL
	AND (ep.[name] IS NULL OR ep.[name] <> ''microsoft_database_tools_support'')
	) AS rd
WHERE rd.SchemaName IS NOT NULL
ORDER BY DbName ASC
	,TableName ASC;
'

	EXEC (@Sql)

	SELECT @Sql = '
INSERT INTO #IndexStats (
	DbName
	,ObjectId
	,HasIndex
	)
SELECT ' + '''' + @DbName + '''' + ' AS DbName
	,OBJECT_ID AS ObjectId
	,IndexCheck AS HasIndex
FROM (
	SELECT DISTINCT OBJECT_ID
		,CASE 
			WHEN (
					[TYPE] > 0
					AND is_disabled = 0
					)
				THEN 1
			ELSE 0
			END AS IndexCheck
	FROM ' + QUOTENAME(@DbName) + '.sys.indexes
	) AS rd
WHERE rd.IndexCheck = 1
'

	EXEC (@Sql)

	SET @Sql = 
		'
INSERT INTO #TableSizeStats (
	[DbName]
	,[SchemaName]
	,[ObjectId]
	,[TableName]
	,[RowCount]
	,[AvailableSpacePercentage]
	,[UnusedSpaceGb]
	,[UsedSpaceGb]
	,[TotalSpaceGb]
	,[UnusedSpaceMb]
	,[UsedSpaceMb]
	,[TotalSpaceMb]
	,[UnusedSpaceKb]
	,[UsedSpaceKb]
	,[TotalSpaceKb]
	)
SELECT DISTINCT rd.[DbName]
	,rd.[SchemaName]
	,rd.[ObjectId]
	,rd.[TableName]
	,rd.[RowCount]
	,CASE 
		WHEN TotalSpaceKb > 0
			THEN ((UnusedSpaceKb / TotalSpaceKb) * 100)
		ELSE 0
		END AS AvailableSpacePercentage
	,CONVERT(NUMERIC(10, 3), (rd.[UnusedSpaceKb] / 1024.) / 1024.) AS UnusedSpaceGb
	,CONVERT(NUMERIC(10, 3), (rd.[UsedSpaceKb] / 1024.) / 1024.) AS UsedSpaceGb
	,CONVERT(NUMERIC(10, 3), (rd.[TotalSpaceKb] / 1024.) / 1024.) AS TotalSpaceGb
	,CONVERT(NUMERIC(13, 3), (rd.[UnusedSpaceKb] / 1024.)) AS UnusedSpaceMb
	,CONVERT(NUMERIC(13, 3), (rd.[UsedSpaceKb] / 1024.)) AS UsedSpaceMb
	,CONVERT(NUMERIC(13, 3), (rd.[TotalSpaceKb] / 1024.)) AS TotalSpaceMb
	,rd.[UnusedSpaceKb]
	,rd.[UsedSpaceKb]
	,rd.[TotalSpaceKb]
FROM (
	SELECT ' 
		+ '''' + @DbName + '''' + ' AS DbName
		,t.Object_id AS ObjectId
		,s.[name] AS [SchemaName]
		,t.[name] AS [TableName]
		,p.[rows] AS [RowCount]
		,SUM(a.[used_pages]) * 8 AS [UsedSpaceKb]
		,(SUM(a.[total_pages]) - SUM(a.[used_pages])) * 8 AS [UnusedSpaceKb]
		,SUM(a.[total_pages]) * 8 AS [TotalSpaceKb]
	FROM ' + QUOTENAME(@DbName) + '.sys.tables AS t
	INNER JOIN ' + QUOTENAME(@DbName) + '.sys.schemas AS s ON t.schema_id = s.schema_id
	INNER JOIN ' + QUOTENAME(@DbName) + '.sys.indexes AS i ON t.OBJECT_ID = i.object_id
	INNER JOIN ' + QUOTENAME(@DbName) + '.sys.partitions AS p ON i.object_id = p.OBJECT_ID
		AND i.[index_id] = p.[index_id]
	INNER JOIN ' + QUOTENAME(@DbName) + '.sys.allocation_units a ON p.[partition_id] = a.[container_id]
	WHERE t.[is_ms_shipped] = 0
		AND i.OBJECT_ID > 255
	GROUP BY s.[name]
		,t.[name]
		,t.[object_id]
		,p.[rows]
	) AS rd
ORDER BY DbName ASC
	,SchemaName ASC
	,TableName ASC;
'

	EXEC (@Sql)

	SET @Sql = '
INSERT INTO #TableUsageStats (
	[DbName]
	,[ObjectId]
	,[TableName]
	,[LastUserUpdate]
	,[LastUserSeek]
	,[LastUserScan]
	,[LastUserLookup]
	,[UserUpdateCount]
	,[UserSeekCount]
	,[UserScanCount]
	,[UserLookupCount]
	)
SELECT DbName
	,ObjectId
	,TableName
	,LastUserUpdate
	,LastUserSeek
	,LastUserScan
	,LastUserLookup
	,UserUpdateCount
	,UserSeekCount
	,UserScanCount
	,UserLookupCount
FROM (
	SELECT DISTINCT ROW_NUMBER() OVER (
			PARTITION BY ius.Object_Id ORDER BY last_user_update DESC
			) AS RN
		,' + '''' + @DbName + '''' + ' AS DbName
		,ius.OBJECT_ID AS ObjectId
		,o.NAME AS TableName
		,ius.last_user_update AS LastUserUpdate
		,ius.last_user_seek AS LastUserSeek
		,ius.last_user_scan AS LastUserScan
		,ius.last_user_lookup AS LastUserLookup
		,ius.user_updates AS UserUpdateCount
		,ius.user_seeks AS UserSeekCount
		,ius.user_scans AS UserScanCount
		,ius.user_lookups AS UserLookupCount
	FROM ' + QUOTENAME(@DbName) + 
		'.sys.dm_db_index_usage_stats AS ius
	INNER JOIN ' + QUOTENAME(@DbName) + '.sys.objects AS o ON ius.OBJECT_ID = o.OBJECT_ID
		AND o.NAME IS NOT NULL
	) AS rd
WHERE rd.RN = 1
ORDER BY rd.DbName ASC
	,rd.TableName ASC
	,rd.LastUserUpdate DESC
	,rd.LastUserSeek DESC
	,rd.LastUserScan DESC
	,rd.LastUserLookup DESC;
'

	EXEC (@Sql)
END;
GO

SELECT DISTINCT ROW_NUMBER() OVER (
		ORDER BY ts.[DbName] ASC
			,ts.[SchemaName] ASC
			,ts.[TableName] ASC
		) AS Row
	,ts.[DbName]
	,ts.[SchemaName]
	,ts.[TableName]
	,'' AS Action --Rename, keep etc.
	,'' AS Comments
	,'' AS Renamed --boolean flag
	,'' AS RenamedDate 
	,i.[HasIndex] --Tables without an index are heaps
	,tss.[TotalSpaceMb] AS TableSizeInMb
	,ts.[ModifiedDate]
	,tus.[LastUserUpdate]
	,tus.[LastUserSeek]
	,tus.[LastUserScan]
	,tus.[LastUserLookup]
	,tus.[UserUpdateCount]
	,tus.[UserSeekCount]
	,tus.[UserScanCount]
	,tus.[UserLookupCount]
	,tss.[AvailableSpacePercentage]
	,tss.[UnusedSpaceGb]
	,tss.[UsedSpaceGb]
	,tss.[TotalSpaceGb]
	,tss.[UnusedSpaceMb]
	,tss.[UsedSpaceMb]
	,tss.[TotalSpaceMb]
	,tss.[UnusedSpaceKb]
	,tss.[UsedSpaceKb]
	,tss.[TotalSpaceKb]
	,'USE ' + QUOTENAME(ts.[DbName]) + '; EXEC sp_rename ' + '''' + ts.[SchemaName] + '.' + ts.[TableName] + '''' + ', ' + '''' + '_DELETE_' + ts.[TableName] + '''' + ';' AS RenameForDeletion
	,'USE ' + QUOTENAME(ts.[DbName]) + '; EXEC sp_rename ' + '''' + ts.[SchemaName] + '.' + '_DELETE_' + ts.[TableName] + '''' + ', ' + '''' + ts.[TableName] + '''' + ';' AS RestoreOriginalName
	,'USE ' + QUOTENAME(ts.[DbName]) + '; DROP TABLE ' + QUOTENAME(ts.[SchemaName]) + '.' + '[' + '_DELETE_' + ts.[TableName] + ']' + ';' AS 'DropTable'
	,'' AS DroppedDate
FROM #TableStats AS ts
LEFT JOIN #TableSizeStats AS tss ON ts.ObjectId = tss.ObjectId
	AND tss.DbName = ts.DbName
LEFT JOIN #IndexStats AS i ON ts.ObjectId = i.ObjectId
	AND i.DbName = ts.DbName
LEFT JOIN #TableUsageStats AS tus ON ts.ObjectId = tus.ObjectId
	AND tus.DbName = ts.DbName
ORDER BY ts.[DbName] ASC
	,ts.[SchemaName] ASC
	,ts.[TableName] ASC

	-- REF: https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-usage-stats-transact-sql