Category Archives: Data

Ireland’s Housing Market: A Reassessment of Risks in 2025

June 21, 2025Data, ITanalysis, Created with the aassistance of AI, economy, finance, housing, investing, irish, market, politics, stock-marketTidbytez

“Ireland’s housing market in 2025 is not a bubble in the classical sense—but it is a system loaded with powder kegs. A market correction is not just probable—it’s increasingly inevitable if current global trends persist.”

1. Introduction

Ireland’s property market has returned to centre stage in 2025, with home prices soaring beyond the reach of ordinary buyers. While mainstream institutions maintain that the current boom is structurally different from the 2007 bubble, deeper analysis reveals worrying echoes of the past—and several new vulnerabilities.

This report presents a critical re-evaluation of official narratives by combining institutional data, macroeconomic indicators, and emerging global risks. The analysis reflects both established trends and previously underexamined fault lines—including trade exposure, political positioning, debt structure, and the nature of immigration-led demand.

2. Optimism in Current Forecasts

2.1 The Official Story: A Soft Landing

Current institutional analysis:

Prices are 17–20% above 2007 peaks, but in nominal (not inflation-adjusted) terms.
Lending is tightly regulated, making another crash less likely.
Housing supply remains low (~33k new units annually vs ~50k demand), limiting downside risk.
Immigration-driven demand was keeping the market buoyant.
Most models projected a soft landing or mild correction (‑5% to ‑10%) by 2026–2027.

Position at this point: “Overvalued but not a bubble. Expect a soft or moderate correction.”

2.2 Why This Is Problematic

Such forecasts underplay—or entirely omit—structural vulnerabilities:

Fragility of the demand base
Shifts in global trade and investor sentiment
Overlooked personal debt risks
Political frictions with key trading and investment partners

They assume a smooth continuation of post-COVID growth—an assumption increasingly detached from global and domestic realities.

3. Structural Risks Reassessed

3.1 Population Growth and the Migration Illusion

While net migration continues to boost short-term demand, it is increasingly volatile and conditional:

Over 100,000 Ukrainian refugees and thousands more economic migrants are not permanent residents—many may leave when wars end or economic conditions deteriorate.
A rising tide of Irish nationalism and political discontent may lead to tightening of visa rules or reduced attractiveness.
Migration has been urban-renter-heavy, which means a downturn in cities could quickly expose oversupply at the higher end of the rental market.

This mirrors the 2008 reversal, when net migration turned negative as job losses mounted.

3.2 Personal Debt and Financial Fragility

Ireland’s banks are no longer heavily exposed to risky mortgages—but instead have developed a quiet dependence on personal lending:

Car loans, credit card balances, holiday loans, and renovation debt have soared.
Many households are leveraged against lifestyle consumption, not long-term asset investment.
In the event of job losses, especially in higher-income brackets, this could lead to a wave of defaults, credit tightening, and consumer recession.

This debt pattern is under-discussed but could amplify a housing correction, just as mortgage debt did in 2008.

4. External Risks and Global Headwinds

4.1 Trump’s Second Term and Ireland’s Trade Exposure

The return of Trump marks a resumption of trade protectionism:

Tariffs on European pharmaceuticals (a pillar of Ireland’s economy) are now likely.
Ireland faces retaliation risk from the EU toward American tech giants—the very companies that dominate employment in Dublin and Cork.

As a result, Ireland’s economic foundation—built on tech and pharma FDI—is exposed to political volatility outside its control.

4.2 AI and the Automation Shock

The Irish workforce is highly concentrated in sectors that are:

Low-hanging fruit for AI automation (e.g., finance support, customer service, junior programming)
Heavily composed of renters and young workers, who form a key demand pillar

As AI displaces or compresses these roles, urban housing demand—particularly rental—could shrink rapidly.

4.3 Geopolitics and Investor Sentiment

Ireland’s vocal positions on the Israel-Palestine conflict, including the attempted disruption of Israel’s bond markety by Irish politicians, may strain ties with the U.S. and major allies. Combined with ongoing Middle East instability (e.g., Israel–Iran conflict), this could:

Trigger energy price spikes
Deter U.S. FDI into Ireland
Increase capital flight from Irish real estate and tech sectors

5. Shock Scenario Forecast (2025–2028)

Based on the convergence of risks, a multi-scenario model was built to forecast likely market responses:

Scenario	Description	Estimated Price Drop
Soft Landing	Gradual slowdown, no major external shocks	‑2%
Mild Recession	Domestic slowdown, slower hiring, minor defaults	‑8%
Trump Tariff + AI Shock	Trade disruption, layoffs in tech/pharma	‑15%
Geopolitical Crisis	Trade war + Middle East conflict + capital outflows	‑25% to ‑30%

These projections are not alarmist—they reflect well-established downside risks based on economic history and emerging geopolitical realities.

6. Conclusion

Ireland’s housing market in 2025 is not a classic speculative bubble. But that doesn’t mean it’s stable. Beneath the surface:

Demand is heavily dependent on uncertain migration trends
Debt is increasingly consumer-focused and underregulated
The economy is highly concentrated in two vulnerable sectors
Global political shifts are no longer hypothetical—they’re already happening

The idea that Ireland’s housing prices can continue to rise or even plateau without consequence is increasingly out of touch with the global environment.

How to Secure Your Azure SQL Database

March 26, 2024Azure, Data, Database Administrator, SQL ServerAD, Azure, backups, best practice, cloud, database, guide, How To, howto, lesson, logins, safe, secure, security, SQL, SQL Server, tutorialTidbytez

Authentication and Authorization:
- Use Azure Active Directory (Azure AD) authentication for better security.
- Implement firewall rules to control access to your database.
- Assign minimal permissions to users based on their roles (principle of least privilege).
Encryption:
- Enable Transparent Data Encryption (TDE) to protect data at rest.
- Use Always Encrypted to secure sensitive data in transit.
- Consider client-side encryption for additional protection.
Auditing and Monitoring:
- Enable Azure SQL Auditing to track database activity.
- Set up Azure Monitor to receive alerts and insights.
- Regularly review logs and audit trails.
Network Security:
- Isolate your database using Virtual Network Service Endpoints.
- Restrict public access and use private endpoints.
Patch Management:
- Keep your database engine up to date with the latest security patches.
- Regularly review vulnerability assessments.
Backup and Recovery:
- Implement automated backups and test recovery procedures (remember a backup is only theoretically there unless it has been tested and proven to work).
- Store backups conforming to the 3-2-1 Backup Rule explained below (do not assume your backups are safe just because they are in the cloud).

The 3-2-1 Backup Rule: Ensuring Data Resilience

The 3-2-1 Rule is a robust strategy that emphasizes redundancy, resilience, and data availability. Here’s what it entails:

Three Copies of Your Data:
- Maintain the original data and create at least two additional copies.
Two Different Types of Media for Storage:
- Store your data on distinct forms of media (e.g., hard drives, tapes) to enhance redundancy.
At Least One Copy Off-Site:
- Safeguard one backup copy in an off-site location, separate from your primary data and on-site backups.

By adhering to this rule, you mitigate single points of failure, protect against corruption, and ensure data safety even in unexpected events or disasters

How to count nulls and hard-coded text that signifies null in a Pandas DataFrame

July 2, 2022Data, Pandas, Python, Sparkdatabricks, guide, how count, how count blank, how count nan, how count none, how count nothing, how count nul, how count nulls dataframe, how count nulls using pandas, How To, howto, lesson, none, Python, tutorialTidbytez

🧪 Validating Non-Empty Fields in Python

When working with data validation—especially in web forms, APIs, or data pipelines—it’s common to check whether a field is empty or null. But sometimes, a field might appear empty at first glance, yet still contain whitespace, hidden characters, or default values that make it technically non-null.

Let’s explore how to determine whether a field is actually empty or null, and how to handle it properly in Python.

🔍 What Does “Not Empty or Null” Really Mean?

A field is considered not empty or null if:

It is not None
It is not an empty string ("")
It does not consist solely of whitespace (" ")
It is not an empty container (like [], {}, or ())

These subtle distinctions are important when validating user input or cleaning data.

🧰 Python Functions for Validation

Here are some Python functions that help determine whether a field is truly non-empty:

def getListOfMissingValues():
    """
    desc: List of common words used to represent null that are often found in files as text
    """
    lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	']
    return lst
	
def advanceMissingValues(df):
    """
    desc: Count nulls and hardcoded text that represents nulls
    param p1: DataFrame name
    return: DataFrame of field names and count values
    """
    lstMissingVals = getListOfMissingValues()
    col_list = getListOfFieldNames(df)
    output = pd.DataFrame(col_list)
    output.rename(columns = {0:'FieldName'}, inplace = True)
    output['Count'] = ''
    
    #For each field name count nulls and other null type values
    for col in col_list:
        nullCnt = df[col].isnull().sum(axis=0)
        #For each missing value perform count on column
        missValCnt = 0
        for missVal in lstMissingVals:
            missValCnt = missValCnt + len(df[(df[col]==missVal)])
 
        cntTotal = nullCnt + missValCnt
        output.loc[output['FieldName'] == col, 'Count'] = cntTotal

    return output

#Test Setup
lst = ['NaN', 'NAN', 'nan', 'null', 'NULL', 'nul', 'NUL', 'none', 'NONE', '', ' ', '	' ,None]
mdf = pd.DataFrame(lst)
mdf.rename(columns = {0:'NullTypes'}, inplace = True)
print(mdf)

#Run Test
chk = advanceMissingValues(mdf)
chk

Sample output:

How to get a substring between two characters with T-SQL

May 2, 2021Data, sql, SQL Server, T-SQL, tsqlExtracting a String From Between Two Delimiting Characters, get string between characters, get substring between characters tsql, guide, How To, howto, lesson, sql delimiting characters, sql get substring, tutorialTidbytez

This is a very common activity in the data world, i.e. there’s some data in a text string you need and the rest of the data in the string is just in your way. Some use cases might be you have a reference in a filename you need to extract, or you may need a snippet of data to create a composite key, or there’s an order number surrounded by other data that is not relevant to your needs etc.

The following is some simple T-SQL that will extract the data you want from a text string providing the data has specific delimiting characters on each side of it.

/*Delimiter variables, first and second position*/
DECLARE @dfp AS CHAR(1);
DECLARE @dsp AS CHAR(1);
DECLARE @text VARCHAR(MAX);

SET @dfp = ';';
SET @dsp = '@';
SET @text = 'I want you to ;Extract this@ substring for me please.';

SELECT SUBSTRING(@text, (CHARINDEX(@dfp, @text) + 1), (CHARINDEX(@dsp, @text) - 2) - CHARINDEX(@dfp, @text) + Len(@dsp))

How to format SQL Server datetime as dd/mm/yyyy hh:mm:ss

February 2, 2021Coding, Data, Database Administrator, Development, How To, sql, SQL Server, T-SQLdatetime, dd/mm/yyyy, dd/mm/yyyy hh:mm, dd/mm/yyyy hh:mm:ss, formatting, guide, How To, howto, lesson, SQL Server, tutorialTidbytez

If you are exporting the results of a SQL Server query to excel typically the recipient of the file wants the dates referenced in the format “dd/mm/yyyy hh:mm:ss” or “dd/mm/yyyy” not in the usual database format yyyy-mm-dd.

The below query formats the datetime as desired. Note that the letter m representing month is capitalised. If they are not the engine will interpret the lowercase letter m as minute so you will end up with days, minutes, years.

Also not that the letter h representing the hours is also capitalised. Capitalising the h makes the time output with the 24 hour format. Lowercase h will be 12 hour format. It is highly recommended not to use the lowercase h.

SELECT FORMAT(GETDATE(), 'dd/MM/yyyy HH:mm:ss', 'en-us')

If you only want the date and not time just remove the relevant text, i.e. just date dd/MM/yyyy or datetime without second dd/MM/yyyy HH:mm.

Do you need a Data Lake?

October 2, 2020Data, System, System architecturedata lake cloud, data lake problems, data lake vs data warehouse, how does a data lake work, should i use a data lake, What is a data lake, what is a data warehouse, when not to use a data lake, when not to use a data warehouse, when to use a data lake, why you don'd need a data lakeTidbytez

Summary

Among data specialist that do not work in the field of Big Data there can be confusion surrounding the term Data Lake. This is because there is apparent overlap in terms of role and function between Data Lakes and, the more traditional, Data Warehouses the likes of which data professionals will be more familiar with. This confusion is not helped by the term Data Lake itself being overloaded which will be discussed later in this article. However despite this overlap Data Lakes do occupy their own distinct role and perform functions Data Warehouses cannot.

Data Lakes have tremendous utility but damagingly there is also a mass of literature surrounding Data Lakes pushing the concept as a cure-all that coincidentally will also require you to migrate your organizations Business Intelligence center into the cloud. The following statements will hopefully dispel some of the associated hucksterism.

Data Lakes are not Data Warehouses 2.0, i.e. they are not the evolution of a Data Warehouse.
Data Lakes have not replaced Data Warehouses in performing the role of housing aggregated data.
Data Lakes will not free you from the burden of developing ETLs or establishing robust Data Architecture and strong Data Governance.

Introduction

It is important to first clarify that both Data Warehouses and Data Lakes are abstract concepts independent of any particular software or vendor. A Data Warehouse can be created in any database engine such as SQL Server, PostgreSQL, Oracle or MySql. Similarly a Data Lake can be deployed across any suitably large data storage platform, i.e. an on-site data center or hosted in the cloud.

In basic terms both Data Warehouses and Data Lakes can be thought of as the place where all data relevant to an organization’s goals is pulled together from various sources both internal and external (increasingly external). They both exist to facilitate an all encompassing view of an organization and how will it performs or provide a greater understanding of the organization’s environment, opportunities (e.g. customer preferences and habits) and threats. However they differ in terms of the data they are optimized to handle and are therefore better suited to different use cases.

What is a Data Warehouse?

A Data Warehouse is a method for storing and organising data that is optimized to support Business Intelligence (BI) activities such as analytics. To put it another way they solely exist and are constructed in a manner to best answer big questions efficiently. For this reason they typically hold vast quantities of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files but primarily transaction applications (Oracle, 2019). However in contrast to a transactional database were each transaction is a separate record, the relevant entries in a Data Warehouse are typically aggregated although they can also hold transaction records for archival purposes.

Figure 1: Typical Data Warehouse architecture of an SME (Databricks, 2019)

Single transaction records on their own are not typically very insightful to an organization, trying to identify consumer trends for example. Aggregating data based on facts and dimensions, e.g. the number of sales (fact) for a particular store (dimension), saves disk space and allows queries looking for that specific answer to be returned quickly. Data Warehouses mostly contain numeric data which is easily manipulated. As an example store sales might be the summation of thousands of rows of data to a single row.

Figure 2: Simplified example of a Data Warehouse internal structure (BIDataPro, 2018)

Data Warehouses also solve the problem of trying to derive information when there are too many sources, e.g. a multinational with thousands of store locations and subsidiaries, by creating a “single source of truth”. Effectively this means pulling all the information to one central location, transforming the data for uniformity and storing like for like data together. For example this could mean gathering all sales data from multiple locations and converting the currency to dollars. All of the data in one place together allows for different sources, which serve different purposes, to be combined via a single query. For example a report that links sales data and logistical data, coming from POS and SCM systems respectively, may not be possible with a single query if the systems are not linked. If best practices regarding I.T. security are being followed they certainly should not be.

Data Warehouses are fed from source systems using an extract, transform and load (ETL) solution. This means data is extracted from a source system, transformed to meet schema and business requirements of the Data Warehouse and then loaded. This is a data delivery method independent of any particular software vendor. There are various software to accomplish ETLs including the option to create a custom application. A variation of this process is extract, load and transform (ELT) in which the data is landed into tables raw and later transformed to meet the schema and business requirements of their intended final table. This method allows for greater auditability which could aid in regulatory compliance or post-mortems if the transformation process fails.

Once set up the Data Warehouse can facilitate statistical analysis, reporting, data mining and more sophisticated analytical applications that generate actionable information by applying machine learning and artificial intelligence (AI) algorithms (Oracle, 2019).

For an organization a single source of truth which will eliminate inconsistencies in reporting, establish a single set of global metrics and allow everyone in the organization to “sing from the same hymn sheet” is very important due to how beneficial the information provided is in directing informed decisions.

So if Data Warehouses have proven such an excellent platform for generating information why are alternatives needed? Well by design only a subset of the attributes are examined, so only pre-determined questions can be answered (Dixon, 2010). Also the data is aggregated so visibility into the lowest levels is lost (Dixon, 2010). The final major factor is that some of the most vital sources of information are no longer simply numerical in nature and generated by an organizations internal transactional system. So what has changed?

The Digital Universe

The data landscape has changed drastically in just a few short years. Like the physical universe, the digital universe is large and growing fast. It was estimated that by 2020 there would be nearly as many digital bits as there are stars in the observable universe (Turner, 2014). That estimate is somewhere in the region of 44 zettabytes, or 44 trillion gigabytes (Turner, 2014). Even though this quantity of data is already beyond human comprehension the rate of growth is probably the more impressive feat. For context there is over 10 times more data now than there was in 2013 when the digital universe was an estimated 4.4 zettabytes (Turner, 2014). The data we create and copy annually is estimated to reach 175 zettabytes by 2025 (Coughlin, 2018).

Where is all this data coming from?

The short answer is predominately us and the systems that service our needs. In the not too distant past the only entities to have computers generating and storing data were businesses, governments and other institutions. Now everyone has a computer of some description and with the advent of social media mass consumers became mass creators. When you stop to think of how many interactions a person has with electronic devices every day, directly or indirectly, you soon get a picture of how much data is actually being generated.

As an example of this endless generation of data the following are average social media usage stats over the course of one minute from 2018 (Marr, 2018):

Twitter users sent 473,400 tweets
Snapchat users shared 2 million photos
Instagram users posted 49,380 pictures
LinkedIn gained 120 new users

Other extraordinary data stats include (Marr, 2018):

Google processes more than 40,000 searches every second or 3.5 billion searches a day.
5 billion people are active on Facebook every day. That’s one-fifth of the world’s population.
Two-thirds of the world’s population now owns a mobile phone.

Our way of life has become increasingly digitized with no better example than the effective global lockdown during the 2020 pandemic. Hundreds of millions of employees from around the world managed to continue working from home and did so effectively (Earley, 2020). This would have been unimaginable even by the late nineties. And yet as digitized as our world has become it is only the start. With emerging technologies such as self-driving cars, IoT smart devices and ever increasingly sophisticated robots entering our homes the 175 zettabytes of data by 2025 maybe a conservative estimate.

With so much of the stuff you would be forgiven for thinking all of this data is just a by-product but it is anything but. The data generated is an incredibly valuable asset if it can be analyzed properly and transformed into business relevant information.

What types of data are there?

The state of data within the digital universe can be summarized as structured, semi-structured and unstructured (Hammer, 2018).

The following is a non-exhaustive list of data types (Hammer, 2018):

CRM
POS
Financial
Loyalty card
Incident ticket
Email
PDF
Spreadsheet
Word processing
GPS
Log
Images
Social media
XML/JSON
Click stream
Forums
Blogs
Web content
RSS feed
Audio
Transcripts

Only the data types above in bold are suitable for aggregation (Hammer, 2018). The rest of the data types are typical of what now makes up a large proportion of the digital universe, and despite their value as data assets they are not suitable for storage or analysis within a Data Warehouse. This is because data needs to meet the predefined structure of a Data Warehouse in order for it to be accepted and aggregating these raw unstructured files, e.g. video and audio files etc., is not possible. So how are these types of valuable data turned into actionable information?

What is a Data Lake?

Data Warehouses have been utilized by data specialists for decades but the concept of Data Lakes is much more contemporary and much better suited to deal with storage, analysis and analytics of the semi-structured and unstructured data listed above. By design storage within a Data Lake of these kinds of data does not require files to be transformed as the file is kept in a raw state. Files can be simply copied from one file structure to another. Data Lakes also allow for working off the files directly which means the data can be used effectively immediately, i.e. as soon as it lands, rather than waiting weeks for the Data Warehouse developers to massage the data into a format that the data warehouse can accept if that is even possible (Hammer, 2018). Working with this type of data has become synonymous with the field of Big Data, which is defined by high velocity, high volume and high variability. As such the two methodologies of Data Warehouses and Data Lakes are not necessarily in competition with each other either, in fact depending on their definition (Data Lake is somewhat of an overloaded term (Bethke, 2017)) they could be argued to resolve difference problems and can complement each other when deployed within the same architecture.

There is some contention as to the definition of a Data Lake. Some would argue that original meaning implied the Lake was a raw data reservoir solely (Bethke, 2017). By this definition the Data Lake is not too dissimilar to a staging area or Operational Data Store (ODS) in a data warehouse were raw copies of data from source systems are landed (Bethke, 2017). This would coincide with an ELT process as opposed to an ETL process. The transform and integration of the data happens later downstream during the populating of the data warehouse (Bethke, 2017). This understanding of a Data Lake still persists today in the minds of many data specialist as can be seen below in the overly simplified illustration.

Figure 3: Overly simplified illustration of a Data Lake architecture (Hammer, 2018)

(Note: no indication of analysis being performed on the lake directly, the lake services the warehouse solely)

However it is an inaccurate understanding as the person who is credited with coining the term, James Dixon, used the following analogy when he explained a Data Lake:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” (Dixon, 2010)

By stating “various users of the lake can come to examine, dive in, or take samples” Dixon is clearly implying that a feature of the Data Lake is that it is accessible prior to the data being transformed and made available in a Data Warehouse.

This is where Data Lakes and Data Warehouses take an opposing strategy on applying structure to data which is perhaps why they are often mistaken as alternative competing concepts to each other. A Data Warehouse requires Schema on Write whereas a Data Lake uses Schema on Read.

With schema on write all of the relevant data structure needs to be prepared in advance which means all of the relevant business questions need to be thought of in advance. This rarely results in a situation where all the relevant stakeholders have their needs met and if they do it will not be for very long. This scenario is workable by an organization looking to aggregate finance data they are very familiar with but it is especially difficult when dealing with Big Data were the questions are unknown.

With schema on read the schema is only applied when the data is read allowing for a schema that is adaptable to the queries being issued. This means you are not stuck with a predetermined one-size-fits-all schema (Pasqua, 2014). This allows for the storage of unstructured data and since it is not necessary to define the schema before storing the data it makes it easier to bring in new data sources on the fly. The exploding growth of unstructured data and overhead of ETL for storing data in RDBMS is the offered as a leading reason for the shift to schema on read (Henson, 2016).

When dealing with Big Data the problem of a predefined schema can be so burdensome that it can sink a data project or increase the time-to-value past the point of relevance (Pasqua, 2014). Using a schema on read approach on data as-is means getting value from it right away (Pasqua, 2014). The flexibility of Data Lakes in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner.

Data Lakes Scalability

By using schema on read the constraint on scale is virtually removed. The threat of a bottleneck still exists but now in the form of physical constraints in terms of the hardware available. This is why online cloud offerings such as Amazon S3 and Azure Data Lake from Microsoft have become so popular. Of course on-site data centers are also an option with Hadoop being a very popular solution which combines a Data Lake structure with analytically capabilities. This level of scalability also safe guards against Data Silos. A Data Silo is an undesirable situation where only one group or a limited number of people in an organization have access to a source of data that has a broader relevance to people across an organization (Plixer, 2018).

Data Lakes are intended by design and philosophy to be an antithesis to Data Silos where all an organizations data is stored together in one lake. However centrally storing all data is not without significant security concerns and losing sight of what customer data is on hand can run afoul of numerous legal requirements such as GDPR.

Data Lakes Analysis & Analytics

A defining feature of Big Data analytics is the concept of bringing the analytics to the data rather than the data to the analytics. Traditionally analytics was carried out by feeding single flat files into an algorithm with the time taken to prepare these files being significant. Although accessing the raw files directly is potentially a failing as it has the potential to break the principle of a single source of truth and therefore runs the risk of introducing inconsistencies between reports and other forms of analysis. As you can imagine this is complex and disciplined work which is why Data Lakes, at this point in their maturity, are best suited to Data Scientists and advanced Data Analysts (Hammer, 2018). However this goes against the Data Lake ethos of “data for all” as it only allows the very skilled to have access. This creates the problem Data Lakes were meant to solve by imposing restrictions or “data for the select few”. With Data scientists acting as the gatekeepers an organizations stakeholders can lose sight of the useful data available to them. Worse still is that valuable data may come from external sources with stakeholders having no visibly of it prior to it landing in the Data Lake. This may leave stakeholders with no option but to take action based on an analysis produced by a Data Scientist with accuracy of the analysis being a matter of fate because the stakeholder has no data to say otherwise. In comparison the creation of a Data Warehouse is usually a collaboration of stakeholders, familiar with internal sources systems and data, and developers. Once a Data Warehouse is created, far less skilled (and cheaper) Data Analysts will have the ability to navigate the internal structure and compile valuable reports.

Despite the obvious concerns the significance of scalability and direct raw data analysis cannot be overlooked. The sooner an organization is informed the sooner it can act. In real world terms this could save millions of dollars, save thousands of jobs or stop the organizations itself from going under. However the benefits of scalability and earlier data access are not without risks as poorly managed Data Lakes have the potential to turn into Data Swamps. Data Swamps are poorly managed Data Lakes that become a dumping ground for data. Though the data may be unstructured the method in which it is stored must not be or visibility of what is stored and where it is stored will be lost. Failure to catalogue the data, letting users know what is available while making the attributes of the data known, will overwhelm users and result in the garbage results (Hammer, 2018). Successful implementation of a Data Lake is complex and requires ongoing commitment to maintain but for a large organization that needs to make better use of the wider range of data available in the digital universe a Data Lake is a necessity.

Conclusion

A Data Lake is not a replacement for a Data Warehouse. Data Lakes are better equipped to solve the different problems associated with dealing with semi-structured to unstructured data. Their flexibility in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner. However Data Lakes are not without their drawbacks. They require highly skilled and expensive staff to develop and maintain. They potentially run a greater risk of failing spectacularly by devolving into a Data Swamp and could potentially become a serious liability from a regulatory standpoint if this was to happen. Organisations can also be left at the mercy of Data Scientists in how accurate they are in analyzing data and producing correct reports as stakeholders may not have the expertise to retrieve data from the Data Lake themselves.

Thankfully Data Warehouses are still perfectly suited for dealing with numeric data and for organizations that still predominately use their own internal transactional systems in the creation of actionable information these organisations have no immediate need to utilize any alternatives.

References:

Bethke, U. (2017) ‘Are Data Lakes Fake News?’, Sonra, 8 August. Available at: http://www.kdnuggets.com/2017/09/data-lakes-fake-news.html (Accessed: 4 July 2020).

BIDataPro (2018) ‘What is Fact Table in Data Warehouse’, BIDataPro, 23 April. Available at: https://bidatapro.net/2018/04/23/what-is-fact-table-in-data-warehouse/ (Accessed: 4 July 2020).

Coughlin, T. (2018) 175 Zettabytes By 2025, Forbes. Available at: https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-by-2025/ (Accessed: 4 July 2020).

Databricks (2019) ‘Unified Data Warehouse’, Databricks, 8 February. Available at: https://databricks.com/glossary/unified-data-warehouse (Accessed: 4 July 2020).

Dixon, J. (2010) ‘Pentaho, Hadoop, and Data Lakes’, James Dixon’s Blog, 14 October. Available at: https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ (Accessed: 4 July 2020).

Earley, K. (2020) Google and Facebook extend work-from-home policies to 2021, Silicon Republic. Available at: https://www.siliconrepublic.com/companies/google-facebook-remote-work-until-2021 (Accessed: 5 July 2020).

Hammer, D. (2018) What is a data lake? – The Hammer | The Hammer. Available at: https://www.sqlhammer.com/what-is-a-data-lake/ (Accessed: 4 July 2020).

Henson, T. (2016) ‘Schema On Read vs. Schema On Write Explained’, Thomas Henson, 14 November. Available at: https://www.thomashenson.com/schema-read-vs-schema-write-explained/ (Accessed: 6 July 2020).

Marr, B. (2018) How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, Forbes. Available at: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ (Accessed: 5 July 2020).

Oracle (2019) What Is a Data Warehouse | Oracle Ireland. Available at: https://www.oracle.com/ie/database/what-is-a-data-warehouse/ (Accessed: 5 July 2020).

Pasqua, J. (2014) Schema-on-Read vs Schema-on-Write, MarkLogic. Available at: https://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/ (Accessed: 6 July 2020).

Plixer (2018) What is a Data Silo and Why is It Bad for Your Organization? Available at: https://www.plixer.com/blog/data-silo-what-is-it-why-is-it-bad/ (Accessed: 6 July 2020). Turner, V. (2014) The Digital Universe of Opportunities. Available at: https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm (Accessed: 5 July 2020).

How to create a C# console application that will solve crosswords

June 2, 2020C#, Coding, Data, Development, Visual Studioassign file directory to variable, C Sharp, C#, clean string with C#, clean string with regex in C#, console application, count characters string, create list variable c#, crossword, guide, How To, howto, lesson, order list c#, read csv file into list, remove duplicates from list C#, solver, tutorial, Visual Studio, write list to console, write variable to consoleTidbytez

This tutorial will cover the following tasks in C#:

How to count the characters in a string
How to assign a file’s directory location to a variable
How to create a list variable
How to pull/read a CSV file column into a list variable
How to clean strings using Regex to remove non alpha numeric characters as the strings are being read into a list
How to remove duplicate word entries from a list
How to order a list
How to write variables to the console, including a list’s elements

Assumptions:

You already know how to create projects in Visual Studio.

If you do not how to do this search online using the following term “how to create C# console applications in visual studio”.

Prerequisites:

First you will need to generate a CSV file with random words using this site:

https://onlinerandomtools.com/generate-random-csv

For the option “how many columns to generate” set the value to 1.

For testing purposes create 1000 rows.

Download the csv file generated and save it using the name “words”.

Summary of how the code works:

The code works by reducing the initial list (i.e. the supplied CSV file of random words) down to only words that match the number of characters of the user word, typically referred to as “string length”.

Once that subset of words has been created the code will then compare the user word’s letters against each letter, referencing the relative position, in each word in the subset.

Note: there is still significant room for optimization but the code is functional and works well as an accessible, human readable tutorial.

Use case example:

If the user enters the word “apple” the dictionary subset will be reduced down to 5 letter words only. These five letter words are then compared to the user word, each word and letter at a time. So if the first word in the list was “cabin” the comparison would jump to the next word in the list as the “a” in “apple” does not match the “c” in “cabin”. If the next word in the dictionary was “acorn” the first letters would match but the comparison would jump to the next word when the “c” and “p” did not match.

Instructions:

Create a C# console application called CrosswordSolver in Visual Studio.

Move the CSV file called “Words” into the bin directory of the project folder, i.e. CrosswordSolver\CrosswordSolver\bin

Open the project CrosswordSolver and paste the C# code below into the default window replacing the default cs page code.

The hardcoded example of a user word is:

string userWord = “a****”;

The user can use * to represent characters unknown, for example ap*le.

Note: The CSV file you randomly generated may have no examples of 5 letter words begining with the letter “a” so experiment with other characters.

You can test the letter comparison functionality by uncommenting the two sections of code immediately following the comments “Test letter comparison”.

To test your CSV file has been read into memory you can uncomment the section of code immediately following “Test that dictionary has been read into memory”.

The C# code:

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace CrosswordSolver
{
    class Program
    {
        static void Main(string[] args)
        {
            int c = 0;
            //User input
            //NOTE: Use * to represent characters unknown 
            string userWord = "a****";
            int wordLength = userWord.Length;

            //Assign directory location of the csv file containing the collection of words to a variable
            string projectFolder = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.FullName;
            string file = Path.Combine(projectFolder, "words.csv");

            //Display dictionary location in console
            Console.WriteLine("Dictionary location: " + file);

            var dictionary = new List<string>();
            using (var rd = new StreamReader(file))

            //Pull file column into dictionary list without cleaning
            //{
            //    while (!rd.EndOfStream)
            //    {
            //        var splits = rd.ReadLine().Split(',');
            //        dictionary.Add(splits[0]);
            //    }
            //}

            //Pull file column into dictionary list while cleaning
            {
                while (!rd.EndOfStream)
                {
                    var splits = rd.ReadLine().Split(',');
                    //string clean is done with Regex
                    dictionary.Add(Regex.Replace(splits[0], "[^A-Za-z0-9 ]", ""));
                }
            }

            //Test that dictionary has been read into memory
            //Console.WriteLine("The dictionary contains the following words:");
            //foreach (var element in dictionary)
            //Console.WriteLine(element);

            //Remove duplicate word entries
            //c = dictionary.Count;
            //Console.WriteLine("The dictionary contains " + c + " words");
            dictionary = dictionary.Distinct().ToList();
            //c = dictionary.Count;
            //Console.WriteLine("The dictionary contains " + c + " words");

            // Count the elements in the List and display test parameters
            c = dictionary.Count;
            Console.WriteLine("The dictionary contains " + c + " words");
            Console.WriteLine("User entered the string: " + userWord);
            Console.WriteLine(userWord + " has " + wordLength + " characters");
            userWord = userWord.ToLower();

            //Reduce the dataset size based on number of characters in string
            IEnumerable<string> query =
                dictionary.Where(word => word.Length == wordLength);

            var subSet = new List<string>();
            foreach (var word in query)
                subSet.Add(word);

            //Order List
            subSet = subSet.OrderBy(x => x).ToList();

            c = subSet.Count;
            if (c != 0)
            {
                Console.WriteLine("The dictionary contains " + c + " words that are " + wordLength + " characters in length");

                //Begin character and position match check
                var result = new List<string>();
                foreach (var word in subSet)

                {
                    for (int i = 0; i <= wordLength - 1; i++)
                    {

                        if ((word.ToLower()[i] == userWord[i]) | (userWord[i] == '*'))
                        {

                            //Test letter comparison (Letters match)
                            //Console.WriteLine(
                            //"Letter " + i + ", which is " + "\"" + word[i] + "\"" + ", of the word " + "\"" + word + "\"" +
                            //" matches letter " + i + ", which is " + "\"" + userWord[i] + "\"" + ", of the user input " + "\"" + userWord + "\""
                            //);

                            if (i == wordLength - 1)
                            { result.Add(word); }

                        }
                        else
                        {
                            //Test letter comparison (Letters do not match)
                            //Console.WriteLine(
                            //"Letter " + i + ", which is " + "\"" + word[i] + "\"" + ", of the word " + "\"" + word + "\"" +
                            //" does not match letter " + i + ", which is " + "\"" + userWord[i] + "\"" + ", of the user input " + "\"" + userWord + "\""
                            //);

                            break;
                        }
                    }
                }

                //Test words that do not match
                //foreach (var word in subSetToRemove)
                //Console.WriteLine(word);

                bool isEmpty = !result.Any();
                if (isEmpty)
                {
                    Console.WriteLine("No matches found");
                }
                else
                {
                    c = result.Count();
                    Console.WriteLine("Potential matches found: " + c);
                    foreach (var word in result)
                        Console.WriteLine(word);
                }
            }
            else
            {
                Console.WriteLine("No words of " + wordLength + " characters long found");
            }
            Console.ReadKey();
        }
    }
}

If you found this code useful be sure to like the post and comment. ☮

If you would like to know how to create a csv file with C# see this tutorial link.

If you would like to know how to create a console application in Visual Studio that won’t open a command window when it runs see this tutorial link.

How to handle a Database creation request

April 2, 2018Data, Database Administrator, dba, Development, Form, How To, Reporting, SQL ServerCreate, database, database request form, Form, GDPR, guide, How To, howto, lesson, requestTidbytez 1

If you are working as a DBA you may find that developers will ask you to create a database having given little thought to what the database will be used for or the impact the database could have to the resources or security of the hosting environment.

If you find yourself in that situation I would suggest you walk the requester through the questionnaire from the previous article “How to determine where a new database should be deployed“.

Once you have completed that process I would then suggest that an official request to deploy a new database be made using the DATABASE REQUEST FORM provided here link. If you have a change request process I would still suggest you use this form. Having a database specific request form covers more relevant and vital information.

This is a fairly high level request form with most of the technical details still to be defined by the DBA but it provides documentation of the request and states the requester’s initial expectations and requirements.

Following the database deployment if the actual footprint of the database does not match up with what was agreed the form will confirm if the requested resources were under specced or misleading.

The form is outlined as below.

DATABASE REQUEST FORM image

Some important points the form clarifies:

There’s a difference between requester and owner. If the database runs into any problems you don’t want to be contacting the temporary intern that requested it instead of say the department head.

The application the database supports. Most of the time the database name will have some tie-in to the application name but maybe it does not. For instance the database could be named something generic like Reporting which could be the back end for really anything.

The form asks the requester to prepare a profile for the database. I could have named this section “who is your daddy and what does he do?”. If the requester states they are looking for a OLAP reporting database but operationally it’s running as a OLTP transactional database, that could make a big difference in terms of the resources provided for the database and underlying hardware.

Possibly most importantly the form helps to establish the likely impact of the new database with the Resource Impact Estimation section. For example if a requester asks for 10 Gb of space and states they expect space usage to increase by 5 Gb a year but the disk has lost a terabyte in the first few months the form will clarify who got their numbers wrong.

The user and groups section will clarify who should have access to the database. Effectively everything related to data and data access should be okayed by a compliance officer to confirm everything is GDPR compliant. This form will assist the compliance officer in establishing that.

The Business continuity & Upkeep section is really the domain of the DBA but it helps to get requester input on these matters. For instance establishing maintenance windows.

If you have any additional questions you feel should be on the form please feel free to contact me and I’ll add them.

How to assess a SQL Server instance for GDPR compliance by writing every table and column to Excel

March 16, 2018Data, Database Administrator, Excel, How To, Project Management, Reporting, SQL Server, T-SQLall columns, All columns SQL Server Instance, all tables, All tables SQL Server Instance, assess, compliance, GDPR, guide, How To, howto, instance, lesson, SQL Server, tutorialTidbytez 1

This post uses the script I had written before here LINK modified slightly to include a count of each table. If you are a DBA you’re likely assisting compliance offers to find personal identifiable data within the databases at this time. This script will allow you to provide them with a record of every database, table and column on an entire instance. The compliance offer can then sieve through all the columns and highlight any columns that look like they contain personal data for further investigation.

Below is a SQL query that will return the following metadata about each table from each database located on a SQL server database server:

ServerName
DatabaseName
SchemaName
TableName
CountOfRows
ColumnName
KeyType

The output is provided in a table format with these additional formatted lines of text which can be used as queries or as part of queries.

A Select table query
A Count table row columns
A Select column query
Each column bracketed
Each table and column bracketed

The table returned by the query can be exported to excel. Using excels filter option applied to the columns of the table makes finding and selecting specific tables and columns very easy.

This process can be repeated for every SQL Server instance used by the business to generate a single mapped servers master excel file allowing the user to find any table or column available to the organization quickly.

Applying some colour coding like below adds to the ease of use.

Image of excel file with mapped database server structure

How to use:

Simply open SQL Server Management Studio and from object explorer right click on the server name and select new query. This will open a window set to the master database of the server. Copy and paste the SQL below into this SQL Server window and execute. When the query is finished you will have created the table above.

/*
SCRIPT UPDATED
20180316
*/

USE [master]
GO

/*DROP TEMP TABLES IF THEY EXIST*/
IF OBJECT_ID('tempdb..#DatabaseList') IS NOT NULL
	DROP TABLE #DatabaseList;

IF OBJECT_ID('tempdb..#TableStructure') IS NOT NULL
	DROP TABLE #TableStructure;

IF OBJECT_ID('tempdb..#ErrorTable') IS NOT NULL
	DROP TABLE #ErrorTable;

IF OBJECT_ID('tempdb..#MappedServer') IS NOT NULL
	DROP TABLE #MappedServer;
	
IF OBJECT_ID('tempdb..#TableCount') IS NOT NULL
	DROP TABLE #TableCount;
	
IF OBJECT_ID('tempdb..#Count') IS NOT NULL
	DROP TABLE #Count;

DECLARE @ServerName AS SYSNAME
DECLARE @Count INT

SET @ServerName = @@SERVERNAME

CREATE TABLE #DatabaseList (
	Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY
	,ServerName SYSNAME
	,DbName SYSNAME
	);

CREATE TABLE [#TableStructure] (
	[DbName] SYSNAME
	,[SchemaName] SYSNAME
	,[TableName] SYSNAME
	,[ColumnName] SYSNAME
	,[KeyType] CHAR(7)
	) ON [PRIMARY];
	
CREATE TABLE [#TableCount] (
	[Id_TableCount] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY
	,[ServerName] SYSNAME
	,[DatabaseName] SYSNAME
	,[SchemaName] SYSNAME
	,[TableName] SYSNAME
	,[SQLCommand] VARCHAR(MAX)
	,[TableCount] INT
	);

CREATE TABLE #Count (ReturnedCount INT);

/*THE ERROR TABLE WILL STORE THE DYNAMIC SQL THAT DID NOT WORK*/
CREATE TABLE [#ErrorTable] ([SqlCommand] VARCHAR(MAX)) ON [PRIMARY];

/*
A LIST OF DISTINCT DATABASE NAMES IS CREATED
THESE TWO COLUMNS ARE STORED IN THE #DatabaseList TEMP TABLE
THIS TABLE IS USED IN A FOR LOOP TO GET EACH DATABASE NAME
*/
INSERT INTO #DatabaseList (
	ServerName
	,DbName
	)
SELECT @ServerName
	,NAME AS DbName
FROM master.dbo.sysdatabases WITH (NOLOCK)
WHERE NAME <> 'tempdb'
ORDER BY NAME ASC

/*VARIABLES ARE DECLARED FOR USE IN THE FOLLOWING FOR LOOP*/
DECLARE @sqlCommand AS VARCHAR(MAX)
DECLARE @DbName AS SYSNAME
DECLARE @i AS INT
DECLARE @z AS INT

SET @i = 1
SET @z = (
		SELECT COUNT(*) + 1
		FROM #DatabaseList
		)

/*WHILE 1 IS LESS THAN THE NUMBER OF DATABASE NAMES IN #DatabaseList*/
WHILE @i < @z
BEGIN
	/*GET NEW DATABASE NAME*/
	SET @DbName = (
			SELECT [DbName]
			FROM #DatabaseList
			WHERE Id = @i
			)
	/*CREATE DYNAMIC SQL TO GET EACH TABLE NAME AND COLUMN NAME FROM EACH DATABASE*/
	SET @sqlCommand = 'USE [' + @DbName + '];' + '

INSERT INTO [#TableStructure]
SELECT DISTINCT ' + '''' + @DbName + '''' + ' AS DbName
	,SCHEMA_NAME(SCHEMA_ID) AS SchemaName
	,T.NAME AS TableName	
	,C.NAME AS ColumnName
	,CASE 
		WHEN OBJECTPROPERTY(OBJECT_ID(iskcu.CONSTRAINT_NAME), ''IsPrimaryKey'') = 1 
			THEN ''Primary'' 
		WHEN OBJECTPROPERTY(OBJECT_ID(iskcu.CONSTRAINT_NAME), ''IsForeignKey'') = 1 
			THEN ''Foreign''
		ELSE NULL 
		END AS ''KeyType''
FROM SYS.TABLES AS t WITH (NOLOCK)
INNER JOIN SYS.COLUMNS C ON T.OBJECT_ID = C.OBJECT_ID
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS iskcu WITH (NOLOCK) 
ON SCHEMA_NAME(SCHEMA_ID) = iskcu.TABLE_SCHEMA 
	AND T.NAME = iskcu.TABLE_NAME
	AND C.NAME = iskcu.COLUMN_NAME
ORDER BY SchemaName ASC
	,TableName ASC
	,ColumnName ASC;
';

	/*ERROR HANDLING*/
	BEGIN TRY
		EXEC (@sqlCommand)
	END TRY

	BEGIN CATCH
		INSERT INTO #ErrorTable
		SELECT (@sqlCommand)
	END CATCH

	SET @i = @i + 1
END

/*
JOIN THE TEMP TABLES TOGETHER TO CREATE A MAPPED STRUCTURE OF THE SERVER
ADDITIONAL FIELDS ARE ADDED TO MAKE SELECTING TABLES AND FIELDS EASIER
*/
SELECT DISTINCT @@SERVERNAME AS ServerName
	,DL.DbName
	,TS.SchemaName
	,TS.TableName
	,TS.ColumnName
	,TS.[KeyType]
	,',' + QUOTENAME(TS.ColumnName) AS BracketedColumn
	,',' + QUOTENAME(TS.TableName) + '.' + QUOTENAME(TS.ColumnName) AS BracketedTableAndColumn
	,'SELECT * FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + '--WHERE --GROUP BY --HAVING --ORDER BY' AS [SelectTable]
	,'SELECT ' + QUOTENAME(TS.TableName) + '.' + QUOTENAME(TS.ColumnName) + ' FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + '--WHERE --GROUP BY --HAVING --ORDER BY' AS [SelectColumn]
	,'SELECT COUNT(*) FROM ' + QUOTENAME(DL.DbName) + '.' + QUOTENAME(TS.SchemaName) + '.' + QUOTENAME(TS.TableName) + 'WITH (NOLOCK)' AS [PerformTableCount]
INTO #MappedServer
FROM [#DatabaseList] AS DL
INNER JOIN [#TableStructure] AS TS ON DL.DbName = TS.DbName
ORDER BY DL.DbName ASC
	,TS.SchemaName ASC
	,TS.TableName ASC
	,TS.ColumnName ASC

/*
HOUSE KEEPING
*/
IF OBJECT_ID('tempdb..#DatabaseList') IS NOT NULL
	DROP TABLE #DatabaseList;

IF OBJECT_ID('tempdb..#TableStructure') IS NOT NULL
	DROP TABLE #TableStructure;
	
INSERT INTO #TableCount (
	[ServerName]
	,[DatabaseName]
	,[SchemaName]
	,[TableName]
	,[SQLCommand]
	)
SELECT DISTINCT [ServerName]
	,[DbName]
	,[SchemaName]
	,[TableName]
	,[PerformTableCount]
FROM #MappedServer
ORDER BY [ServerName] ASC
	,[DbName] ASC
	,[SchemaName] ASC
	,[TableName] ASC

SET @i = 1
SET @z = (
		SELECT COUNT(*) + 1
		FROM #TableCount
		)
	
WHILE @i < @z
BEGIN
	SET @SQLCommand = (
			SELECT SQLCommand
			FROM #TableCount
			WHERE Id_TableCount = @i
			)

	--ERROR HANDLING
	BEGIN TRY
		INSERT INTO #Count
		EXEC (@SqlCommand)
	END TRY

	BEGIN CATCH
		INSERT INTO #ErrorTable
		SELECT (@sqlCommand)
	END CATCH

	SET @Count = (
			SELECT ReturnedCount
			FROM #Count
			)

	TRUNCATE TABLE #Count

	UPDATE #TableCount
	SET TableCount = @Count
	WHERE Id_TableCount = @i;

	SET @i = @i + 1
END

SELECT *
FROM #ErrorTable;

IF OBJECT_ID('tempdb..#ErrorTable') IS NOT NULL
	DROP TABLE #ErrorTable;

/*
THE DATA RETURNED CAN NOW BE EXPORTED TO EXCEL
USING A FILTERED SEARCH WILL NOW MAKE FINDING FIELDS A VERY EASY PROCESS
*/
SELECT DISTINCT ms.ServerName
	,ms.DbName
	,ms.SchemaName
	,ms.TableName
	,ts.TableCount AS CountOfRows
	,ms.ColumnName
	,ms.KeyType
	,ms.BracketedColumn
	,ms.BracketedTableAndColumn
	,ms.SelectColumn
	,ms.SelectTable
FROM #MappedServer AS ms
LEFT JOIN #TableCount AS ts ON ms.ServerName = ts.ServerName
AND ms.DbName = ts.DatabaseName
AND ms.SchemaName = ts.SchemaName
AND ms.TableName = ts.TableName
ORDER BY ms.DbName ASC
	,ms.SchemaName ASC
	,ms.TableName ASC
	,ms.ColumnName ASC;
	
IF OBJECT_ID('tempdb..#MappedServer') IS NOT NULL
	DROP TABLE #MappedServer;

IF OBJECT_ID('tempdb..#TableCount') IS NOT NULL
	DROP TABLE #TableCount;
	
IF OBJECT_ID('tempdb..#Count') IS NOT NULL
	DROP TABLE #Count;

How to determine where a new database should be deployed

March 2, 2018Data, Database Administrator, dba, Project Management, SQL ServerAdd Database to existing Instance, considerations, Create, Create separate Instance for Database, Create separate VM for Instance and Database, database, deploy, Form, guide, How To, How to determine where a new database should be deployed, howto, install, lesson, locate, New Database Hosting Environment Assessment Form, New physical machine needed, questionnaire, tutorial, whereTidbytez 1

Below I’ve listed 22 questions to consider when deciding where a new SQL Server database should be deployed given the four main options of:

Add Database to existing Instance
Create separate Instance for Database
Create separate VM for Instance and Database
New physical machine needed

There’s no hard and fast rule for determining where a database should go however there are often a lot of limitations dictating your choices. I’ve started the list with a big limitation, money.

If you want to use the list like a score card I’ve prepared one called “New Database Hosting Environment Assessment Form”, I know not the snappiest of titles.

New Database Hosting Environment Assessment Form

Use: I’d recommend stepping through the questions with the requester while gathering feedback. You can then fill in the option scoring with weighted figures or simply use the boxes to record notes. The main deliverable of the form is you will have some documentation that demonstrates the location for the database was assessed and the logic and reasoning for the location chosen has been recorded.

Of course consider adding additional questions relevant to your case to the form. If you think your additional questions would be relevant to everyone deploying a new database free to forward your questions on to me and I’ll include them in the form.

The form can be downloaded here link.

Question 1: Can the business afford a new physical machine?

Considerations: If the financial resources are not available option four
can be eliminated.

Question 2: Are there OS or SQL Server Licensing restrictions?

Considerations: If there are no restrictions to a new VM with it’s own instance this will likely offer the most scalable solution and produce the best performance. If restrictions exist options are limited to 1 & 2.

Question 3: Is the Database the back end for a stand-alone application?

Considerations: The back end for a stand-alone application should be isolated as much as possible from other impacting factors. The higher the number of the option chosen the better.

Question 4: What is the primary purpose of the Database?

Considerations: What is the business use\s and in what capacity will the database function i.e. Transactional Db, Reporting Db, Datastore?

Question 5: Do you want\need to be able to limit access to hardware resources?

Considerations: Access to resources, ram etc. can only be limited at an instance level. Option 1 cannot be used.

Question 6: Are there any SQL Server Service Pack dependencies?

Considerations: SQL Server can have different SPs on different instances but cannot have different SPs for different databases within an instance.

Question 7: What is the current excess capacity of the hardware?

Considerations: Can the Hardware handle the additional workload? If not either reduce resource consumption of resources on the hardware, add RAM and/or cores or choose option 4.

Question 8: Is there a VM capacity restraint on the hardware?

Considerations: Can the hardware actually host a new VM without a trade off in resources or a decrease in performance? If restrained option 3 cannot be used.

Question 9: What is the VM capacity?

Considerations: Is the OS already maxed out on resources? Can the VM handle the additional workload?

Question 10: Is there an expected increase in size and usage in the Database over time?

Considerations: If known and minimal option 1 can be used. If unknown or unlimited the higher the number of the option chosen the better.

Question 11: Is the resource usage of the new Database known?

Considerations: Benchmarking RAM, CPU and bandwidth usage should be carried out prior to installation.

Question 12: What are the disaster recovery requirements?

Considerations: Should the databases that share common dependencies be hosted on
the same server?

Question 13: What is the required operational up time? 24/7 etc.

Considerations: Does this operational run time tie in with the rest of the databases
on the instance, or the instances hosted on a VM or physical server?

Question 14: What are the Maintenance requirements?

Considerations: Will new index rebuilds etc. take a long time and effect the schedule of the instance overall?

Question 15: What are the Backup requirements?

Considerations: Will the backup schedule of the other databases be impacted
by the addition?

Question 16: Is the Database functionally similar or supporting databases currently on the instance?

Considerations: Does the new Database logically fit in with the database/s currently running on an instance?

Question 17: Have server security concerns been addressed?

Considerations: Will people with administrative access to the server have access to a Database and data they should not be able to see?

Question 18: Does hosting particular databases together compound the potential
damage If an unauthorised person was to gain access to the server?

Considerations: Will an unauthorised person have access to data that represents a major security breach if the data is combined from different databases?

Question 19: Does a vendor have control over the instance or server?

Considerations: Will putting a Database on a particular instance leave you at the mercy of a vendor?

Question 20: Will stacking instances make tracking down performance issues across
the VM difficult?

Considerations: Will this create a server level noisy neighbour problem?

Question 21: Will packing databases too densely make tracking down performance issues across the Instance difficult?

Considerations: Will this create an instance level noisy neighbour problem?

Question 22: Will moving the Database to be hosted somewhere else in the
future be possible?

Considerations: Does this decision need to be gotten right the first time?

1. Introduction

2. Optimism in Current Forecasts

2.1 The Official Story: A Soft Landing

2.2 Why This Is Problematic

3. Structural Risks Reassessed

3.1 Population Growth and the Migration Illusion

3.2 Personal Debt and Financial Fragility

4. External Risks and Global Headwinds

4.1 Trump’s Second Term and Ireland’s Trade Exposure

4.2 AI and the Automation Shock

4.3 Geopolitics and Investor Sentiment

5. Shock Scenario Forecast (2025–2028)

6. Conclusion

Share this:

The 3-2-1 Backup Rule: Ensuring Data Resilience

Share this:

🧪 Validating Non-Empty Fields in Python

🔍 What Does “Not Empty or Null” Really Mean?

🧰 Python Functions for Validation

Share this:

Share this:

Share this:

Summary

Introduction

What is a Data Warehouse?

The Digital Universe

Where is all this data coming from?

What types of data are there?

What is a Data Lake?

Data Lakes Scalability

Data Lakes Analysis & Analytics

Conclusion

References:

Share this:

Share this:

Share this:

Share this:

Share this: