An icon of a form

How to set up a free WordPress contact form without plugins

Summary

This solution uses Google Forms and a tiny bit of scripting. The end result is a form that will be neatly embedded in your website that will send an email notification, to any email address (or addresses) you desire, when the form has been filled in and submitted. This solution is very versatile and can be applied to numerous use cases, for instance it could be any website not just a WordPress site and it could be any type of form. The JavaScript is very straight forward and can be easily customised to fit different needs.

Prerequisite

You will need a Google (Gmail) Account in order to follow these instructions. This Gmail account will not be visible to the user who submits the form but it is recommended you do not use your personal Gmail account and instead create a new one.

Instructions

Firstly you will need to create a Google Form.

Sign into the Google Account you intend to associate this form to.

Search for Google Forms in your search engine of choice.

Click on the default Contact form and edit as required, i.e. fields, theme etc.

It is recommended for the purposes of these instructions you create a contact form with the fields Name, Email, Website and Comment as these are the fields the script will look for. Once you understand the whole process you can customise the solution to fit your specific needs.

Once done go to the form settings (Gear icon).

The setting “Requires sign in” is set to limit to 1 response by default.

Remove this requirement or a nasty popup will appear on the contact page of your website requiring the sites visitors to log into Google to fill out the contact form.

(Obviously you can leave this restriction in place if that is the behavior you desire.)

At this point you can test the form.

Testing the Form

Go to the Send button and click it and this will bring up the various share options.

In this instance chose the second option, i.e. not the email option but the url (chain icon) option.

Copy the url and paste it into a new browser window, this will show you the form from a users perspective.

Fill in the form with mock data and submit it.

Return to the form editing page and you will see “1 Response” above the form input boxes.

Clicking on this button brings up the responses and response statistics.

At this point your form should be ready to use.

Form Submission Email Notifications

On the form editing page go to the More options (the three vertical dots) and among the options there will be the “Script Editor” option.

Click on it and the Script Editor window will appear.

Go to File > New > Script file and “Enter new file name” as “FormContact”.

Delete all of the text in the text editor window i.e. “function myFunction() {}”

Then copy and paste the text below into the editor window.

function onFormSubmit(e) {
	//If you run this script from the script editor it will throw an error as the code is not being passed values from an active form
	//To test this script you should have a Contact form prepared with the fields Name, Email, Website and Comment
    //You can then submit the Contact form after populating the fields
	//To run logged tests uncomment the code below that starts with "Logger.log", or simply submit forms and review the received emails
	//You can view the log by going to View > Stackdriver Loggins > Apps Script Dashboard	
  
//Email Address that will receive the notification
  var emailTarget = "c.kent@dailyplanet.com"
//To send notifications to multiple email addresses uncomment the line below and delete the line above
//var emailTarget = "c.kent@dailyplanet.com, b.wayne@waynecorp.com" 
  
//Capture the form input values as variables
  var frm = FormApp.getActiveForm().getItems();
  var nameGiven = e.response.getResponseForItem(frm[0]).getResponse();
  var emailAddress = e.response.getResponseForItem(frm[1]).getResponse();
  var websiteUrl = e.response.getResponseForItem(frm[2]).getResponse();
  var commentGiven = e.response.getResponseForItem(frm[3]).getResponse();
  
//Create the variable htmlPage that will store a basic HTML page including the style specifications for a simple HTML table
  var htmlPage = `
<!DOCTYPE html>
<html>
<head>
<style> table {
  font-family: arial, sans-serif;
  border: 1px solid black;
  border-collapse: collapse;
  width: 100%;
}
table td {
  border: 1px solid black;
  padding: 10px;
}
</style>
</head>
<body>
`

//Add a HTML Table inside the htmlPage variable that will display the captured form values via email
  htmlPage += '<div><table>' 
  +'<tr><td>Name</td><td>' + nameGiven + '</td></tr>' 
  +'<tr><td>Email</td><td>' + emailAddress + '</td></tr>' 
  +'<tr><td>Website</td><td>' + websiteUrl + '</td></tr>' 
  +'<tr><td>Comment</td><td>' + commentGiven + '</td></tr>' 
  + '</table></div></body></html>'

  //Logger.log("Name: " + nameGiven + "Email Address: " + emailAddress + "Website: " + websiteUrl + "Comment: " + commentGiven);
  
//Send the notification email via the Gmail account to any email address provided as the first option    
   GmailApp.sendEmail(emailTarget, 'New Contact Form Submitted', '', {htmlBody: htmlPage});
}

That is all the JavaScript code needed to capture the form variable values and send them wrapped in a simple html table via email.

By default the code is set to send emails to “c.kent@dailyplanet.com” solely.

You will need to update this email address (keep the quotes!) to the email address you want to receive form emails or else you will really annoy Superman.

Similarly you can change “c.kent@dailyplanet.com, b.wayne@waynecorp.com” if you want multiple email address to receive form emails. Just put a comma between email addresses with either end of the string of email addresses wrapped in quotes.

All of the code is now in place but a trigger needs to be set up to run the JavaScript code.

Go to Edit and click on “Current Project’s triggers”.

Add a new Trigger.

For “Choose which function to run” choose “OnFormSubmit”.

For “Select event source” choose “from Form”.

For “Select event type” choose “On Form Submit”.

For “Failure notification settings” choose whatever frequency suits your use case.

Now when the form is submitted it will call the OnFormSubmit function which will run the JavaScript code you entered.

Now that you know how to set up a contact form and can see how the variable values are captured in the JavaScript code you probably now have a good understanding of how to edit both the form and code to fit your specific needs.

Keep in mind the form variable values are captured in order of appearance in the form.

Adding the Contact Form to the WordPress site

This next part covers specific instructions for adding the form to a WordPress site but if you have any web development experience you will see how easy this process is to incorporate into any website HTML page.

On the Google form editing page click the Send button.

For the send via options choose the third option, embedded HTML, symbolised as angled brackets < >.

Copy the code that appears below the send via options.

(If you just needed the form for a website then take that code and embed it into your website page and you are done. If you need the form for a WordPress site keep reading.)

Log into your WordPress site.

Under My Site go to Pages and then select the page you want to use the form in or create a new page.

In the body section of the page click on the plus (+) block to add a new block.

Search for HTML and choose Custom HTML.

Paste the code from the Google Form into the block.

You are done.

Conclusion

Now when you publish the page you will have a new contact form that will email new form submissions to email addresses of your choosing and it didn’t cost you a dime.

If you liked this post please leave a like and share.

Icon for Raspberry Pi

How to find a program’s directory in Raspbian OS

For Linux distros most programs are stored in the /usr directory. There is no “Programs Files” directory like for Windows. The executables are typically stored in /usr/bin with additional stuff in /usr/share and libraries in usr/lib etc.

There is also usr/local where stuff gets put when you do the compiling yourself. With /bin tending to be command line tools and /sbin being the directory for command utils only for root.

The quickest way to find the actual directory a program resides in is through terminal using the command “which”.

Here are some examples:

which nano

which gpicview

which chromium-browser

The programs referenced are the preinstalled text editor, image viewer and web browser. All of these examples will return the /usr/bin/ directory.

Note chromium is referred to as “chromium-browser” as typing “which chromium” will return no result as that is the incorrect name for the program. If you are unsure of a program name, run the program and then look for it in task manager to confirm.

An Icon of a keyboard

How to use an Android device as a keyboard and trackpad for a Raspberry Pi

Anyone who has set up a Raspberry Pi Zero W will know it is a bit limited by IO, such is the trade off for such a small form factor. I recently went through a set up that was especially awkward as there was no WiFi available. I tried to use an Android hotspot but unfortunately the Pi could not see the Android device at all. The only option I had was to tether the Pi to the Android via USB. This worked. The Pi had access to the mobile data of the Android device however it seems that the power draw from the Android device meant there was not enough power left to power the wireless USB receiver for the keyboard and mouse combo. So I was left with mutually exclusive options of either access to the internet or the ability to use a keyboard and mouse. Luckily there is always a plan C.

Prerequisites:

You will need a mouse that can connect to the Pi either by USB or Bluetooth. The OS used was Raspbian but this solution should work with other Distros.

Solution:

The Raspberry Pi Zero W also comes with Bluetooth built in so there was the option to make the Pi discoverable and connect a Bluetooth keyboard and mouse. I do not have a physical Bluetooth keyboard or mouse but thankfully there is an App for that, multiple ones actually.

The App I used was the “Serverless Bluetooth Keyboard & Mouse for PC/Phone” from Google Play, available here.

It is free (with ads) and very easy to set up. In terms of performance it provided me with a usable keyboard (like Gboard) with half of the device screen acting as a very responsive track pad. I certainly would not want to compose a thesis with this setup but for typing a few words and clicking a few links it is perfectly serviceable.

I experienced what maybe a slight bug during set up however but I resolved the problem in a minute or two.

Problem and Fix:

Firstly you will need to make the Pi discoverable via Bluetooth. This is the only time I needed to make use of a physical mouse. The option to turn on Bluetooth and make the device discoverable is to the top right of the Raspbian Home screen.

When I tried to connect the Android and Pi together through the App it would not work. The Pi was not discoverable by the App despite the functionality to discover devices being built in to the App.

To connect the devices I first had to connect the Android device and Pi together via their respective operating systems. This threw an error on the Pi but the Android device was visible to it. I then removed the Android Bluetooth connection from the Pi and again tried connecting the Pi via the App. This worked.

If you found this post helpful please like/share/subscribe.

Icon of computer with tick on screen

How to verifying your WordPress.com site with Google via HTML tag

Before starting: Note that according to WordPress.com “. . . verifying your site with these services (search engines) is not necessary in order for your site to be indexed by search engines.

Prerequisites:

This guide assumes you already have your WordPress.com site set up and you already have an account with Google Analytics / Google Search Console.

Steps:

Log into your WordPress.com site.

Go to Marketing and change the displayed options to “Traffic”.

Under “Marketing and Integrations” scroll down to “Site verification services”.

There you will see an option to provide a HTML google-site-verification code.

To retrieve this code you need to login to the sites associated Google Search Console account.

Login to Google Search Console and under the heading “Google Search Console” you will either see a drop down option to “Add Property” i.e. as in a site you own or the name of your site, or sites, that you previously registered.

If you have not registered your domain before then submit the site address now under the domains option. If you have submitted your site before click on your site name.

On the “Ownership verification” page you will see “Additional verification methods” at the bottom of the page.

Expand the HTML Tag option to reveal the HTML google-site-verification code.

Copy this code and return to the WordPress.com “Marketing and Integrations” page.

Paste the code into the HTML google-site-verification code section.

Save the settings in WordPress.com.

Return to the Google Search Console “Ownership verification” page and verify.

Your WordPress.com site has now been verified with Google.

An Icon of map

How to add your WordPress.com sitemap to Google Search Console

Prerequisites:

This guide assumes you already have your WordPress.com site set up and your site is verified with Google Analytics / Google Search Console.

Steps:

By default WordPress.com prepares a sitemap for you.

To see it simply copy and paste the mock url below (Option 1) into your browser search bar and edit it to reference your site. If you own a custom domain omit the reference to WordPress as demonstrated in (Option 2).

(Option 1)

yoursite.wordpress.com/sitemap.xml

(Option 2)

yoursite.com/sitemap.xml

Once you have verified the sitemap url is correct add this sitemap to Google Search Console.

Do this by logging into Google Search Console and clicking on Sitemaps on the left hand side of the main window.

In the sitemaps window there will be the option to paste the copied url under “Add a new sitemap” and Submit it.

Once the url is submitted your sitemap will be saved under “Submitted sitemaps”.

An icon of a jetpack

How to fix Jetpack for WordPress.com not pushing posts to Facebook or Twitter

If you have set up the connections for Facebook, Twitter, etc. through Jetpack but your posts are not being pushed to those platforms try the following.

Make sure you have given permission to editors and authors of your site to use the established Jetpack connections. To do that go back to the Jetpack connection settings.

In the “Publicize posts” section click the drop down arrow to the far right.

Click the check box allowing the social media platform to be used by more than just the administrators. (Obviously this will allow your authors to publish to the specified social media platform so only do this if you trust your authors having this access.)

Once this is done your next published post should also be pushed across your connected social media platforms.

NOTE:

If the post was already published “Updating” the post will not share the post across the social media platforms. You will need to save the post as a “Draft” and “Publish” it again. This should then push the post to the social media platforms.

Browser Screen Icon with a large X at the center

IT Project Management Failure: 3 Proposed Causes

Introduction

This article first highlights the misuse of the Project Management Triangle as a metric of success. Recognising that the very term “success”, and “failure”, can be subjective the author instead proposes generalised, objective and unambiguous examples of failure as a starting reference point. With these examples of failure serving as a foundation, three general deficits in project management are proposed as potential root causes, for IT project failure.

Project Management Triangle Misuse

The Project Management Triangle (also called the triple constraint, iron triangle and project triangle) consists of three points; cost, time and scope (or features). These points are argued to have proportionate relationships with each other. For example, a project can be completed faster by increasing budget and/or cutting scope. Similarly, increasing scope may require increasing the budget and/or schedule. Lowering the budget available will impact on schedule and/or scope. These trade-offs between the cost, time and scope create constraints which are said to dictate the quality of the produce. However stakeholders often misconstrue staying within the constraints of the triangle, while delivering a project, as a measure of success instead of, as intended, a determinant of quality.

As a demonstration of the unsuitability of the triangle as a metric of success consider the following. Would a self-build home delivered over budget, behind schedule and outside the original specifications be considered a failure? No, not for those who took on such a daunting endeavour, and survived the process, having brought into existence the home of their dreams. This is an example of a project where Atkinson (1999) might suggest the criteria for success existed outside of cost, time and scope.

So to define three significant causes of project failure it is first necessary to settle on unarguable features of project failure. It is important to note at this point that a project must have navigable obstacles and manageable risks. For instance an IT project cannot be considered a failure if an unnavigable obstacle was introduced, an example being new laws that prohibit online gambling that scuttle an online gambling platform that was in development. Similarly an IT project cannot be considered a failure if unmanageable risks were encountered such as the parent company collapsing due to financial irregularities not connected to the project.  

With those points in mind the following statements are proposed as clear examples of project failure:

  1. The project exhausted necessary resources with no or unfinished deliverables.
  2. Delivery was too late and the deliverables are no longer needed or soon to be obsolete.
  3. Deliverables are not fit for purpose or of relative value.
  4. The costs exceeded the relative value generated by the deliverables.
  5. The project killed the parent organisation.

With examples of failure defined above the following section proposes management level causes of IT project failures.

IT Project Failure: Management Level Causes

Poor Project Visibility

There is a recognised need to have an information system in place to report on progress, cost, schedule etc. (Larson and Gray, 2010) In the built environment progress can be apparent even to the eye of a lay person but visibility of progress and consumption of resources can be far more difficult for projects in other industries some of which have intangible deliverables. In the IT industry back end infrastructure projects for example may have no visible deliverables and with cloud based deployments no visible supporting hardware.

This is why project management styles like SCRUM and visualisation tools like Kanban boards and burn down charts have been adopted. Without these visualisation aids Project Managers could be blind to progress and resource consumption. Therefore a lack of visibility is proposed as a potential cause, or contributor, to any of the failure examples defined above.

Inadequate Domain Knowledge

Domain Knowledge Is vital in steering stakeholder specifications, knowing what the relevant mile stones are and establishing what is feasible given the budget, time and scope. The case is made by (Larson and Gray, 2010) that the key to managing scope creep, which can be beneficial, is change management. It is questioned however without adequate domain knowledge how can the project manager know what the knock-on effects of a change will be, the derived value of a change or even if a change is possible without putting the project at risk? It is also questioned if a lack of domain knowledge is often misunderstood as poor leadership?

 Lack of Accountability

Accountability is seen by (Kerzner and Kerzner, 2017) as the combination of authority and responsibility that rests at an individual level and is necessary for work to move forward. It is argued that if team members are not assigned tasks with consequences for under performance or failure the project has no drive for completion. This was particularly evident in the PPARS project (“PPARS- a comedy of errors,” n.d.). Due to questionable contract arrangements there were strong financial incentives to not finish the project and without accountability driving the project forward that end result was a complete failure.

Conclusion

An IT Project Manager needs to utilise the project management triangle as intended i.e. a means to keep the desired level of quality of the deliverable in focus. It there are fluctuations in cost, time or scope the IT Project Manager needs to be cognizant of what the knock-on effects will be. In addition an IT Project Manager needs to know who the right person to assign specific tasks to is. That person needs to have the proper motivation to get the work done, with the IT Project Manager having visibility of the work being done and the knowledge and experience to be able to assess if the work is being done properly.  This is achieved through individual accountability, project visibility and domain knowledge. Without these three elements it is proposed a project has little chance of success.

References:

Atkinson, R., 1999. Project management: cost, time and quality, two best guesses and a phenomenon, its time to accept other success criteria. International Journal of Project Management 17, 337–342. https://doi.org/10.1016/S0263-7863(98)00069-6

Kerzner, H., Kerzner, H.R., 2017. Project Management: A Systems Approach to Planning, Scheduling, and Controlling. John Wiley & Sons.

Larson, E.W., Gray, C.F., 2010. Project Management: The Managerial Process. McGraw-Hill Irwin.

NoClip, 2017. FINAL FANTASY XIV Documentary Part #1 – “One Point O” – YouTube [WWW Document]. URL https://www.youtube.com/watch?v=Xs0yQKI7Yw4 (accessed 10.7.20).

Pinto, J.K., Mantel, S.J., 1990. The causes of project failure. IEEE Transactions on Engineering Management 37, 269–276. https://doi.org/10.1109/17.62322

PPARS- a comedy of errors [WWW Document], n.d. URL http://www.irishhealth.com/article.html?id=8661 (accessed 10.13.18).

An Icon for a Playstation controller

Do PS4 controllers work with the PS3 console?

No!!!

No they do not!!!

If my car only drove in reverse and started 80% of the time would you consider that “working”?

Certainly not if I was trying to convince you to buy the car at full market price. You wouldn’t consider it working because it has partial functionality paired with unreliability. And that describes the PS4 controller connected to a PS3 console.

Unfortunately there’s a plethora of videos on YouTube hosted by snot-nosed teens demonstrating PS4 controllers “working” on the PS3 console. They’re not faking it they really are controlling the games with the PS4 controller but that’s not the whole story.

What they typically forget to mention is that the PS button doesn’t work. You’re probably thinking that just means you can’t wake the PlayStation with the controller. That’s no big deal though right? No oddly that’s the only thing the PS button can do. Which suggests that the button can communicate with the console but after that, possibly deliberately, it has no functionality. Without the PS button you can’t enter the controller settings, you can’t turn the console off, you can’t exit a game once you’ve entered it, and there’s no guarantee all the other controller buttons will work as expected once you’re in a game. Some games won’t let you in at all.

For example “The Orange Box” Valve’s collection of Half Life, Portal and Team Fortress 2 on a single disk actually checks what controller you’re using when the game loads. Seems like a strange thing to check. I don’t know why they care? Anyway if the game doesn’t detect a standard PS3 controller you can’t progress. This is ironic considering one of the best ways of connecting a PS4 controller to PC is by using the Steam platform by Valve, there’s no native way to do it via Windows. If you’re thinking that’s fine I’ll progress to the in-game levels with a PS3 controller and then reassign that controller as the second controller, well that doesn’t work. The PS4 controller won’t take over as the first controller and you can’t change it manually because . . . you guessed it, you can’t bring up the controller settings on the PS4 controller because the PS button doesn’t work.

So in summary if you want to use a PS4 controller you’ll likely need to have at least a partially functioning PS3 controller to use with it, but even then some games may not work at all. If you’ve no controller and need one for the PS3 (they don’t make them anymore) you’ll have to buy used or a high quality clone (and that’s a whole other mess).

Maybe someday Sony will be cool and release a software update for the PS3 console that will allow the PS4 controller to work with it. Comment below what you think that chances of that happening are.

An icon for a database showing internal waves

Do you need a Data Lake?

Summary

Among data specialist that do not work in the field of Big Data there can be confusion surrounding the term Data Lake. This is because there is apparent overlap in terms of role and function between Data Lakes and, the more traditional, Data Warehouses the likes of which data professionals will be more familiar with. This confusion is not helped by the term Data Lake itself being overloaded which will be discussed later in this article. However despite this overlap Data Lakes do occupy their own distinct role and perform functions Data Warehouses cannot.

Data Lakes have tremendous utility but damagingly there is also a mass of literature surrounding Data Lakes pushing the concept as a cure-all that coincidentally will also require you to migrate your organizations Business Intelligence center into the cloud. The following statements will hopefully dispel some of the associated hucksterism.

  • Data Lakes are not Data Warehouses 2.0, i.e. they are not the evolution of a Data Warehouse.
  • Data Lakes have not replaced Data Warehouses in performing the role of housing aggregated data.
  • Data Lakes will not free you from the burden of developing ETLs or establishing robust Data Architecture and strong Data Governance.

Introduction

It is important to first clarify that both Data Warehouses and Data Lakes are abstract concepts independent of any particular software or vendor. A Data Warehouse can be created in any database engine such as SQL Server, PostgreSQL, Oracle or MySql. Similarly a Data Lake can be deployed across any suitably large data storage platform, i.e. an on-site data center or hosted in the cloud.

In basic terms both Data Warehouses and Data Lakes can be thought of as the place where all data relevant to an organization’s goals is pulled together from various sources both internal and external (increasingly external). They both exist to facilitate an all encompassing view of an organization and how will it performs or provide a greater understanding of the organization’s environment, opportunities (e.g. customer preferences and habits) and threats. However they differ in terms of the data they are optimized to handle and are therefore better suited to different use cases.

What is a Data Warehouse?

A Data Warehouse is a method for storing and organising data that is optimized to support Business Intelligence (BI) activities such as analytics. To put it another way they solely exist and are constructed in a manner to best answer big questions efficiently. For this reason they typically hold vast quantities of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files but primarily transaction applications (Oracle, 2019). However in contrast to a transactional database were each transaction is a separate record, the relevant entries in a Data Warehouse are typically aggregated although they can also hold transaction records for archival purposes.

Figure 1: Typical Data Warehouse architecture of an SME (Databricks, 2019)

Single transaction records on their own are not typically very insightful to an organization, trying to identify consumer trends for example. Aggregating data based on facts and dimensions, e.g. the number of sales (fact) for a particular store (dimension), saves disk space and allows queries looking for that specific answer to be returned quickly. Data Warehouses mostly contain numeric data which is easily manipulated. As an example store sales might be the summation of thousands of rows of data to a single row.

Figure 2: Simplified example of a Data Warehouse internal structure (BIDataPro, 2018)

Data Warehouses also solve the problem of trying to derive information when there are too many sources, e.g. a multinational with thousands of store locations and subsidiaries, by creating a “single source of truth”. Effectively this means pulling all the information to one central location, transforming the data for uniformity and storing like for like data together. For example this could mean gathering all sales data from multiple locations and converting the currency to dollars. All of the data in one place together allows for different sources, which serve different purposes, to be combined via a single query. For example a report that links sales data and logistical data, coming from POS and SCM systems respectively, may not be possible with a single query if the systems are not linked. If best practices regarding I.T. security are being followed they certainly should not be.

Data Warehouses are fed from source systems using an extract, transform and load (ETL) solution. This means data is extracted from a source system, transformed to meet schema and business requirements of the Data Warehouse and then loaded. This is a data delivery method independent of any particular software vendor. There are various software to accomplish ETLs including the option to create a custom application. A variation of this process is extract, load and transform (ELT) in which the data is landed into tables raw and later transformed to meet the schema and business requirements of their intended final table. This method allows for greater auditability which could aid in regulatory compliance or post-mortems if the transformation process fails.

Once set up the Data Warehouse can facilitate statistical analysis, reporting, data mining and more sophisticated analytical applications that generate actionable information by applying machine learning and artificial intelligence (AI) algorithms (Oracle, 2019).

For an organization a single source of truth which will eliminate inconsistencies in reporting, establish a single set of global metrics and allow everyone in the organization to “sing from the same hymn sheet” is very important due to how beneficial the information provided is in directing informed decisions.

So if Data Warehouses have proven such an excellent platform for generating information why are alternatives needed? Well by design only a subset of the attributes are examined, so only pre-determined questions can be answered (Dixon, 2010). Also the data is aggregated so visibility into the lowest levels is lost (Dixon, 2010). The final major factor is that some of the most vital sources of information are no longer simply numerical in nature and generated by an organizations internal transactional system. So what has changed?

 

The Digital Universe

The data landscape has changed drastically in just a few short years. Like the physical universe, the digital universe is large and growing fast. It was estimated that by 2020 there would be nearly as many digital bits as there are stars in the observable universe (Turner, 2014). That estimate is somewhere in the region of 44 zettabytes, or 44 trillion gigabytes (Turner, 2014). Even though this quantity of data is already beyond human comprehension the rate of growth is probably the more impressive feat. For context there is over 10 times more data now than there was in 2013 when the digital universe was an estimated 4.4 zettabytes (Turner, 2014). The data we create and copy annually is estimated to reach 175 zettabytes by 2025 (Coughlin, 2018).

Where is all this data coming from?

The short answer is predominately us and the systems that service our needs. In the not too distant past the only entities to have computers generating and storing data were businesses, governments and other institutions. Now everyone has a computer of some description and with the advent of social media mass consumers became mass creators. When you stop to think of how many interactions a person has with electronic devices every day, directly or indirectly, you soon get a picture of how much data is actually being generated.

As an example of this endless generation of data the following are average social media usage stats over the course of one minute from 2018 (Marr, 2018):

  • Twitter users sent 473,400 tweets
  • Snapchat users shared 2 million photos
  • Instagram users posted 49,380 pictures
  • LinkedIn gained 120 new users

Other extraordinary data stats include (Marr, 2018):

  • Google processes more than 40,000 searches every second or 3.5 billion searches a day.
  • 5 billion people are active on Facebook every day. That’s one-fifth of the world’s population.
  • Two-thirds of the world’s population now owns a mobile phone.

Our way of life has become increasingly digitized with no better example than the effective global lockdown during the 2020 pandemic. Hundreds of millions of employees from around the world managed to continue working from home and did so effectively (Earley, 2020). This would have been unimaginable even by the late nineties. And yet as digitized as our world has become it is only the start. With emerging technologies such as self-driving cars, IoT smart devices and ever increasingly sophisticated robots entering our homes the 175 zettabytes of data by 2025 maybe a conservative estimate.

With so much of the stuff you would be forgiven for thinking all of this data is just a by-product but it is anything but. The data generated is an incredibly valuable asset if it can be analyzed properly and transformed into business relevant information.

What types of data are there?

The state of data within the digital universe can be summarized as structured, semi-structured and unstructured (Hammer, 2018).

The following is a non-exhaustive list of data types (Hammer, 2018):

  • CRM
  • POS
  • Financial
  • Loyalty card
  • Incident ticket
  • Email
  • PDF
  • Spreadsheet
  • Word processing
  • GPS
  • Log
  • Images
  • Social media
  • XML/JSON
  • Click stream
  • Forums
  • Blogs
  • Web content
  • RSS feed
  • Audio
  • Transcripts

Only the data types above in bold are suitable for aggregation (Hammer, 2018). The rest of the data types are typical of what now makes up a large proportion of the digital universe, and despite their value as data assets they are not suitable for storage or analysis within a Data Warehouse. This is because data needs to meet the predefined structure of a Data Warehouse in order for it to be accepted and aggregating these raw unstructured files, e.g. video and audio files etc., is not possible. So how are these types of valuable data turned into actionable information?

What is a Data Lake?

Data Warehouses have been utilized by data specialists for decades but the concept of Data Lakes is much more contemporary and much better suited to deal with storage, analysis and analytics of the semi-structured and unstructured data listed above. By design storage within a Data Lake of these kinds of data does not require files to be transformed as the file is kept in a raw state. Files can be simply copied from one file structure to another. Data Lakes also allow for working off the files directly which means the data can be used effectively immediately, i.e. as soon as it lands, rather than waiting weeks for the Data Warehouse developers to massage the data into a format that the data warehouse can accept if that is even possible (Hammer, 2018). Working with this type of data has become synonymous with the field of Big Data, which is defined by high velocity, high volume and high variability. As such the two methodologies of Data Warehouses and Data Lakes are not necessarily in competition with each other either, in fact depending on their definition (Data Lake is somewhat of an overloaded term (Bethke, 2017)) they could be argued to resolve difference problems and can complement each other when deployed within the same architecture.

There is some contention as to the definition of a Data Lake. Some would argue that original meaning  implied the Lake was a raw data reservoir solely (Bethke, 2017). By this definition the Data Lake is not too dissimilar to a staging area or Operational Data Store (ODS) in a data warehouse were raw copies of data from source systems are landed (Bethke, 2017). This would coincide with an ELT process as opposed to an ETL process. The transform and integration of the data happens later downstream during the populating of the data warehouse (Bethke, 2017). This understanding of a Data Lake still persists today in the minds of many data specialist as can be seen below in the overly simplified illustration.

Figure 3: Overly simplified illustration of a Data Lake architecture  (Hammer, 2018)

(Note: no indication of analysis being performed on the lake directly, the lake services the warehouse solely)

However it is an inaccurate understanding as the person who is credited with coining the term, James Dixon, used the following analogy when he explained a Data Lake:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” (Dixon, 2010)

By stating “various users of the lake can come to examine, dive in, or take samples” Dixon is clearly implying that a feature of the Data Lake is that it is accessible prior to the data being transformed and made available in a Data Warehouse.

This is where Data Lakes and Data Warehouses take an opposing strategy on applying structure to data which is perhaps why they are often mistaken as alternative competing concepts to each other. A Data Warehouse requires Schema on Write whereas a Data Lake uses Schema on Read.

With schema on write all of the relevant data structure needs to be prepared in advance which means all of the relevant business questions need to be thought of in advance. This rarely results in a situation where all the relevant stakeholders have their needs met and if they do it will not be for very long.  This scenario is workable by an organization looking to aggregate finance data they are very familiar with but it is especially difficult when dealing with Big Data were the questions are unknown.

With schema on read the schema is only applied when the data is read allowing for a schema that is adaptable to the queries being issued. This means you are not stuck with a predetermined one-size-fits-all schema (Pasqua, 2014). This allows for the storage of unstructured data and since it is not necessary to define the schema before storing the data it makes it easier to bring in new data sources on the fly. The exploding growth of unstructured data and overhead of ETL for storing data in RDBMS is the offered as a leading reason for the shift to schema on read (Henson, 2016).

When dealing with Big Data the problem of a predefined schema can be so burdensome that it can sink a data project or increase the time-to-value past the point of relevance (Pasqua, 2014). Using a schema on read approach on data as-is means getting value from it right away (Pasqua, 2014). The flexibility of Data Lakes in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner.

 

Data Lakes Scalability

By using schema on read the constraint on scale is virtually removed. The threat of a bottleneck still exists but now in the form of physical constraints in terms of the hardware available. This is why online cloud offerings such as Amazon S3 and Azure Data Lake from Microsoft have become so popular. Of course on-site data centers are also an option with Hadoop being a very popular solution which combines a Data Lake structure with analytically capabilities.  This level of scalability also safe guards against Data Silos. A Data Silo is an undesirable situation where only one group or a limited number of people in an organization have access to a source of data that has a broader relevance to people across an organization (Plixer, 2018).

Data Lakes are intended by design and philosophy to be an antithesis to Data Silos where all an organizations data is stored together in one lake. However centrally storing all data is not without significant security concerns and losing sight of what customer data is on hand can run afoul of numerous legal requirements such as GDPR.

 

Data Lakes Analysis & Analytics

A defining feature of Big Data analytics is the concept of bringing the analytics to the data rather than the data to the analytics. Traditionally analytics was carried out by feeding single flat files into an algorithm with the time taken to prepare these files being significant. Although accessing the raw files directly is potentially a failing as it has the potential to break the principle of a single source of truth and therefore runs the risk of introducing inconsistencies between reports and other forms of analysis. As you can imagine this is complex and disciplined work which is why Data Lakes, at this point in their maturity, are best suited to Data Scientists and advanced Data Analysts (Hammer, 2018). However this goes against the Data Lake ethos of “data for all” as it only allows the very skilled to have access. This creates the problem Data Lakes were meant to solve by imposing restrictions or “data for the select few”. With Data scientists acting as the gatekeepers an organizations stakeholders can lose sight of the useful data available to them. Worse still is that valuable data may come from external sources with stakeholders having no visibly of it prior to it landing in the Data Lake. This may leave stakeholders with no option but to take action based on an analysis produced by a Data Scientist with accuracy of the analysis being a matter of fate because the stakeholder has no data to say otherwise.  In comparison the creation of a Data Warehouse is usually a collaboration of stakeholders, familiar with internal sources systems and data, and developers. Once a Data Warehouse is created, far less skilled (and cheaper) Data Analysts will have the ability to navigate the internal structure and compile valuable reports.

Despite the obvious concerns the significance of scalability and direct raw data analysis cannot be overlooked. The sooner an organization is informed the sooner it can act. In real world terms this could save millions of dollars, save thousands of jobs or stop the organizations itself from going under. However the benefits of scalability and earlier data access are not without risks as poorly managed Data Lakes have the potential to turn into Data Swamps. Data Swamps are poorly managed Data Lakes that become a dumping ground for data. Though the data may be unstructured the method in which it is stored must not be or visibility of what is stored and where it is stored will be lost. Failure to catalogue the data, letting users know what is available while making the attributes of the data known, will overwhelm users and result in the garbage results (Hammer, 2018). Successful implementation of a Data Lake is complex and requires ongoing commitment to maintain but for a large organization that needs to make better use of the wider range of data available in the digital universe a Data Lake is a necessity.

 

Conclusion

A Data Lake is not a replacement for a Data Warehouse. Data Lakes are better equipped to solve the different problems associated with dealing with semi-structured to unstructured data. Their flexibility in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner. However Data Lakes are not without their drawbacks. They require highly skilled and expensive staff to develop and maintain. They potentially run a greater risk of failing spectacularly by devolving into a Data Swamp and could potentially become a serious liability from a regulatory standpoint if this was to happen. Organisations can also be left at the mercy of Data Scientists in how accurate they are in analyzing data and producing correct reports as stakeholders may not have the expertise to retrieve data from the Data Lake themselves. 

Thankfully Data Warehouses are still perfectly suited for dealing with numeric data and for organizations that still predominately use their own internal transactional systems in the creation of actionable information these organisations have no immediate need to utilize any alternatives.

 

References:

Bethke, U. (2017) ‘Are Data Lakes Fake News?’, Sonra, 8 August. Available at: http://www.kdnuggets.com/2017/09/data-lakes-fake-news.html (Accessed: 4 July 2020).

BIDataPro (2018) ‘What is Fact Table in Data Warehouse’, BIDataPro, 23 April. Available at: https://bidatapro.net/2018/04/23/what-is-fact-table-in-data-warehouse/ (Accessed: 4 July 2020).

Coughlin, T. (2018) 175 Zettabytes By 2025, Forbes. Available at: https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-by-2025/ (Accessed: 4 July 2020).

Databricks (2019) ‘Unified Data Warehouse’, Databricks, 8 February. Available at: https://databricks.com/glossary/unified-data-warehouse (Accessed: 4 July 2020).

Dixon, J. (2010) ‘Pentaho, Hadoop, and Data Lakes’, James Dixon’s Blog, 14 October. Available at: https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ (Accessed: 4 July 2020).

Earley, K. (2020) Google and Facebook extend work-from-home policies to 2021, Silicon Republic. Available at: https://www.siliconrepublic.com/companies/google-facebook-remote-work-until-2021 (Accessed: 5 July 2020).

Hammer, D. (2018) What is a data lake? – The Hammer | The Hammer. Available at: https://www.sqlhammer.com/what-is-a-data-lake/ (Accessed: 4 July 2020).

Henson, T. (2016) ‘Schema On Read vs. Schema On Write Explained’, Thomas Henson, 14 November. Available at: https://www.thomashenson.com/schema-read-vs-schema-write-explained/ (Accessed: 6 July 2020).

Marr, B. (2018) How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, Forbes. Available at: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ (Accessed: 5 July 2020).

Oracle (2019) What Is a Data Warehouse | Oracle Ireland. Available at: https://www.oracle.com/ie/database/what-is-a-data-warehouse/ (Accessed: 5 July 2020).

Pasqua, J. (2014) Schema-on-Read vs Schema-on-Write, MarkLogic. Available at: https://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/ (Accessed: 6 July 2020).

Plixer (2018) What is a Data Silo and Why is It Bad for Your Organization? Available at: https://www.plixer.com/blog/data-silo-what-is-it-why-is-it-bad/ (Accessed: 6 July 2020). Turner, V. (2014) The Digital Universe of Opportunities. Available at: https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm (Accessed: 5 July 2020).

How to fix “your model is not manifold” error in Cura

You just pulled a file from Thingiverse and now when you try to slice it in Cura it says “your model is not manifold”.  If you just want to make the error go away you can skip to the paragraph “The Fix”  if you don’t want to be “filled in” on why it’s happening in the first place. Little bit of 3D printing humour there for ya.

So what does the error mean?

For a model to be manifold you can think of it as having logically enclosed space in a manner that can exist in real life with an outer geometry that can actually be 3D printed.

So what does non-manifold mean?

There’s a couple of reasons why a model is not manifold and here they are:

  • Self-intersecting
  • Separate Object
  • Hole
  • Inner Faces
  • Overlapping Geometry

Self-intersecting: This is a bit of a weird analogy but imagine punching through yourself. You can’t do that in real life without making a big hole but in a virtual model of yourself you can have the objects of your fist and forearm pass through another body part of your choosing and that’s perfectly fine but in real life you can’t have two objects with mass occupy the same space at the same time so this cannot be printed.

Separate Object: Imagine a model of a figurine wearing sun glasses. If the sun glasses and the figurine were two separate objects and you shrank the figurine by rescaling it to 90% its original size the sun glasses might be left floating in midair. That’s fine for a virtual 3D model but in real life gravity might have something to say about that.

Hole: Pretty self explanatory, there’s a hole in the model and not like a window just a void that makes the model impossible to print successfully.

Inner Faces: Imagine trying to print a model within a model. The slicer reads the code and gets confused because there should only be one outer surface area not two.

Overlapping Geometry: Imagine you have created a 3D model of a house and you’ve accidentally copied the roof and then pasted it back on top of the model over the original roof. The model now has two roofs occupying the same space which cannot be printed.

The Fix:

Ideally you should open the file with some 3D modelling software and fix it manually but if you’re just pulling files from Thingiverse that’s a bit unrealistic. Luckily the following site allows you to upload files and it will try to fix them automatically.

https://3d-print.jomatik.de/en/index.php

If the process successfully fixes the file it will give you the option to download the file with a brief summary of what changes it made highlighting big changes in red.

Its a great solution especially for low risk models but the onus will always be on you to manually inspect the model to see if the problems have in fact been resolved. Also if you’re working on a super secret product design for a company probably best not to upload the model to be fixed online, but for files you’ve pulled from Thingiverse sure why not they’re already publicly available anyway.