An Icon of a keyboard

How to use an Android device as a keyboard and trackpad for a Raspberry Pi

Anyone who has set up a Raspberry Pi Zero W will know it is a bit limited by IO, such is the trade off for such a small form factor. I recently went through a set up that was especially awkward as there was no WiFi available. I tried to use an Android hotspot but unfortunately the Pi could not see the Android device at all. The only option I had was to tether the Pi to the Android via USB. This worked. The Pi had access to the mobile data of the Android device however it seems that the power draw from the Android device meant there was not enough power left to power the wireless USB receiver for the keyboard and mouse combo. So I was left with mutually exclusive options of either access to the internet or the ability to use a keyboard and mouse. Luckily there is always a plan C.

Prerequisites:

You will need a mouse that can connect to the Pi either by USB or Bluetooth. The OS used was Raspbian but this solution should work with other Distros.

Solution:

The Raspberry Pi Zero W also comes with Bluetooth built in so there was the option to make the Pi discoverable and connect a Bluetooth keyboard and mouse. I do not have a physical Bluetooth keyboard or mouse but thankfully there is an App for that, multiple ones actually.

The App I used was the “Serverless Bluetooth Keyboard & Mouse for PC/Phone” from Google Play, available here.

It is free (with ads) and very easy to set up. In terms of performance it provided me with a usable keyboard (like Gboard) with half of the device screen acting as a very responsive track pad. I certainly would not want to compose a thesis with this setup but for typing a few words and clicking a few links it is perfectly serviceable.

I experienced what maybe a slight bug during set up however but I resolved the problem in a minute or two.

Problem and Fix:

Firstly you will need to make the Pi discoverable via Bluetooth. This is the only time I needed to make use of a physical mouse. The option to turn on Bluetooth and make the device discoverable is to the top right of the Raspbian Home screen.

When I tried to connect the Android and Pi together through the App it would not work. The Pi was not discoverable by the App despite the functionality to discover devices being built in to the App.

To connect the devices I first had to connect the Android device and Pi together via their respective operating systems. This threw an error on the Pi but the Android device was visible to it. I then removed the Android Bluetooth connection from the Pi and again tried connecting the Pi via the App. This worked.

If you found this post helpful please like/share/subscribe.

Icon of computer with tick on screen

How to verifying your WordPress.com site with Google via HTML tag

Before starting: Note that according to WordPress.com “. . . verifying your site with these services (search engines) is not necessary in order for your site to be indexed by search engines.

Prerequisites:

This guide assumes you already have your WordPress.com site set up and you already have an account with Google Analytics / Google Search Console.

Steps:

Log into your WordPress.com site.

Go to Marketing and change the displayed options to “Traffic”.

Under “Marketing and Integrations” scroll down to “Site verification services”.

There you will see an option to provide a HTML google-site-verification code.

To retrieve this code you need to login to the sites associated Google Search Console account.

Login to Google Search Console and under the heading “Google Search Console” you will either see a drop down option to “Add Property” i.e. as in a site you own or the name of your site, or sites, that you previously registered.

If you have not registered your domain before then submit the site address now under the domains option. If you have submitted your site before click on your site name.

On the “Ownership verification” page you will see “Additional verification methods” at the bottom of the page.

Expand the HTML Tag option to reveal the HTML google-site-verification code.

Copy this code and return to the WordPress.com “Marketing and Integrations” page.

Paste the code into the HTML google-site-verification code section.

Save the settings in WordPress.com.

Return to the Google Search Console “Ownership verification” page and verify.

Your WordPress.com site has now been verified with Google.

An Icon of map

How to add your WordPress.com sitemap to Google Search Console

Prerequisites:

This guide assumes you already have your WordPress.com site set up and your site is verified with Google Analytics / Google Search Console.

Steps:

By default WordPress.com prepares a sitemap for you.

To see it simply copy and paste the mock url below (Option 1) into your browser search bar and edit it to reference your site. If you own a custom domain omit the reference to WordPress as demonstrated in (Option 2).

(Option 1)

yoursite.wordpress.com/sitemap.xml

(Option 2)

yoursite.com/sitemap.xml

Once you have verified the sitemap url is correct add this sitemap to Google Search Console.

Do this by logging into Google Search Console and clicking on Sitemaps on the left hand side of the main window.

In the sitemaps window there will be the option to paste the copied url under “Add a new sitemap” and Submit it.

Once the url is submitted your sitemap will be saved under “Submitted sitemaps”.

An icon of a jetpack

How to fix Jetpack for WordPress.com not pushing posts to Facebook or Twitter

If you have set up the connections for Facebook, Twitter, etc. through Jetpack but your posts are not being pushed to those platforms try the following.

Make sure you have given permission to editors and authors of your site to use the established Jetpack connections. To do that go back to the Jetpack connection settings.

In the “Publicize posts” section click the drop down arrow to the far right.

Click the check box allowing the social media platform to be used by more than just the administrators. (Obviously this will allow your authors to publish to the specified social media platform so only do this if you trust your authors having this access.)

Once this is done your next published post should also be pushed across your connected social media platforms.

NOTE:

If the post was already published “Updating” the post will not share the post across the social media platforms. You will need to save the post as a “Draft” and “Publish” it again. This should then push the post to the social media platforms.

Browser Screen Icon with a large X at the center

IT Project Management Failure: 3 Proposed Causes

Introduction

This article first highlights the misuse of the Project Management Triangle as a metric of success. Recognising that the very term “success”, and “failure”, can be subjective the author instead proposes generalised, objective and unambiguous examples of failure as a starting reference point. With these examples of failure serving as a foundation, three general deficits in project management are proposed as potential root causes, for IT project failure.

Project Management Triangle Misuse

The Project Management Triangle (also called the triple constraint, iron triangle and project triangle) consists of three points; cost, time and scope (or features). These points are argued to have proportionate relationships with each other. For example, a project can be completed faster by increasing budget and/or cutting scope. Similarly, increasing scope may require increasing the budget and/or schedule. Lowering the budget available will impact on schedule and/or scope. These trade-offs between the cost, time and scope create constraints which are said to dictate the quality of the produce. However stakeholders often misconstrue staying within the constraints of the triangle, while delivering a project, as a measure of success instead of, as intended, a determinant of quality.

As a demonstration of the unsuitability of the triangle as a metric of success consider the following. Would a self-build home delivered over budget, behind schedule and outside the original specifications be considered a failure? No, not for those who took on such a daunting endeavour, and survived the process, having brought into existence the home of their dreams. This is an example of a project where Atkinson (1999) might suggest the criteria for success existed outside of cost, time and scope.

So to define three significant causes of project failure it is first necessary to settle on unarguable features of project failure. It is important to note at this point that a project must have navigable obstacles and manageable risks. For instance an IT project cannot be considered a failure if an unnavigable obstacle was introduced, an example being new laws that prohibit online gambling that scuttle an online gambling platform that was in development. Similarly an IT project cannot be considered a failure if unmanageable risks were encountered such as the parent company collapsing due to financial irregularities not connected to the project.  

With those points in mind the following statements are proposed as clear examples of project failure:

  1. The project exhausted necessary resources with no or unfinished deliverables.
  2. Delivery was too late and the deliverables are no longer needed or soon to be obsolete.
  3. Deliverables are not fit for purpose or of relative value.
  4. The costs exceeded the relative value generated by the deliverables.
  5. The project killed the parent organisation.

With examples of failure defined above the following section proposes management level causes of IT project failures.

IT Project Failure: Management Level Causes

Poor Project Visibility

There is a recognised need to have an information system in place to report on progress, cost, schedule etc. (Larson and Gray, 2010) In the built environment progress can be apparent even to the eye of a lay person but visibility of progress and consumption of resources can be far more difficult for projects in other industries some of which have intangible deliverables. In the IT industry back end infrastructure projects for example may have no visible deliverables and with cloud based deployments no visible supporting hardware.

This is why project management styles like SCRUM and visualisation tools like Kanban boards and burn down charts have been adopted. Without these visualisation aids Project Managers could be blind to progress and resource consumption. Therefore a lack of visibility is proposed as a potential cause, or contributor, to any of the failure examples defined above.

Inadequate Domain Knowledge

Domain Knowledge Is vital in steering stakeholder specifications, knowing what the relevant mile stones are and establishing what is feasible given the budget, time and scope. The case is made by (Larson and Gray, 2010) that the key to managing scope creep, which can be beneficial, is change management. It is questioned however without adequate domain knowledge how can the project manager know what the knock-on effects of a change will be, the derived value of a change or even if a change is possible without putting the project at risk? It is also questioned if a lack of domain knowledge is often misunderstood as poor leadership?

 Lack of Accountability

Accountability is seen by (Kerzner and Kerzner, 2017) as the combination of authority and responsibility that rests at an individual level and is necessary for work to move forward. It is argued that if team members are not assigned tasks with consequences for under performance or failure the project has no drive for completion. This was particularly evident in the PPARS project (“PPARS- a comedy of errors,” n.d.). Due to questionable contract arrangements there were strong financial incentives to not finish the project and without accountability driving the project forward that end result was a complete failure.

Conclusion

An IT Project Manager needs to utilise the project management triangle as intended i.e. a means to keep the desired level of quality of the deliverable in focus. It there are fluctuations in cost, time or scope the IT Project Manager needs to be cognizant of what the knock-on effects will be. In addition an IT Project Manager needs to know who the right person to assign specific tasks to is. That person needs to have the proper motivation to get the work done, with the IT Project Manager having visibility of the work being done and the knowledge and experience to be able to assess if the work is being done properly.  This is achieved through individual accountability, project visibility and domain knowledge. Without these three elements it is proposed a project has little chance of success.

References:

Atkinson, R., 1999. Project management: cost, time and quality, two best guesses and a phenomenon, its time to accept other success criteria. International Journal of Project Management 17, 337–342. https://doi.org/10.1016/S0263-7863(98)00069-6

Kerzner, H., Kerzner, H.R., 2017. Project Management: A Systems Approach to Planning, Scheduling, and Controlling. John Wiley & Sons.

Larson, E.W., Gray, C.F., 2010. Project Management: The Managerial Process. McGraw-Hill Irwin.

NoClip, 2017. FINAL FANTASY XIV Documentary Part #1 – “One Point O” – YouTube [WWW Document]. URL https://www.youtube.com/watch?v=Xs0yQKI7Yw4 (accessed 10.7.20).

Pinto, J.K., Mantel, S.J., 1990. The causes of project failure. IEEE Transactions on Engineering Management 37, 269–276. https://doi.org/10.1109/17.62322

PPARS- a comedy of errors [WWW Document], n.d. URL http://www.irishhealth.com/article.html?id=8661 (accessed 10.13.18).

An Icon for a Playstation controller

Do PS4 controllers work with the PS3 console?

No!!!

No they do not!!!

If my car only drove in reverse and started 80% of the time would you consider that “working”?

Certainly not if I was trying to convince you to buy the car at full market price. You wouldn’t consider it working because it has partial functionality paired with unreliability. And that describes the PS4 controller connected to a PS3 console.

Unfortunately there’s a plethora of videos on YouTube hosted by snot-nosed teens demonstrating PS4 controllers “working” on the PS3 console. They’re not faking it they really are controlling the games with the PS4 controller but that’s not the whole story.

What they typically forget to mention is that the PS button doesn’t work. You’re probably thinking that just means you can’t wake the PlayStation with the controller. That’s no big deal though right? No oddly that’s the only thing the PS button can do. Which suggests that the button can communicate with the console but after that, possibly deliberately, it has no functionality. Without the PS button you can’t enter the controller settings, you can’t turn the console off, you can’t exit a game once you’ve entered it, and there’s no guarantee all the other controller buttons will work as expected once you’re in a game. Some games won’t let you in at all.

For example “The Orange Box” Valve’s collection of Half Life, Portal and Team Fortress 2 on a single disk actually checks what controller you’re using when the game loads. Seems like a strange thing to check. I don’t know why they care? Anyway if the game doesn’t detect a standard PS3 controller you can’t progress. This is ironic considering one of the best ways of connecting a PS4 controller to PC is by using the Steam platform by Valve, there’s no native way to do it via Windows. If you’re thinking that’s fine I’ll progress to the in-game levels with a PS3 controller and then reassign that controller as the second controller, well that doesn’t work. The PS4 controller won’t take over as the first controller and you can’t change it manually because . . . you guessed it, you can’t bring up the controller settings on the PS4 controller because the PS button doesn’t work.

So in summary if you want to use a PS4 controller you’ll likely need to have at least a partially functioning PS3 controller to use with it, but even then some games may not work at all. If you’ve no controller and need one for the PS3 (they don’t make them anymore) you’ll have to buy used or a high quality clone (and that’s a whole other mess).

Maybe someday Sony will be cool and release a software update for the PS3 console that will allow the PS4 controller to work with it. Comment below what you think that chances of that happening are.

An icon for a database showing internal waves

Do you need a Data Lake?

Summary

Among data specialist that do not work in the field of Big Data there can be confusion surrounding the term Data Lake. This is because there is apparent overlap in terms of role and function between Data Lakes and, the more traditional, Data Warehouses the likes of which data professionals will be more familiar with. This confusion is not helped by the term Data Lake itself being overloaded which will be discussed later in this article. However despite this overlap Data Lakes do occupy their own distinct role and perform functions Data Warehouses cannot.

Data Lakes have tremendous utility but damagingly there is also a mass of literature surrounding Data Lakes pushing the concept as a cure-all that coincidentally will also require you to migrate your organizations Business Intelligence center into the cloud. The following statements will hopefully dispel some of the associated hucksterism.

  • Data Lakes are not Data Warehouses 2.0, i.e. they are not the evolution of a Data Warehouse.
  • Data Lakes have not replaced Data Warehouses in performing the role of housing aggregated data.
  • Data Lakes will not free you from the burden of developing ETLs or establishing robust Data Architecture and strong Data Governance.

Introduction

It is important to first clarify that both Data Warehouses and Data Lakes are abstract concepts independent of any particular software or vendor. A Data Warehouse can be created in any database engine such as SQL Server, PostgreSQL, Oracle or MySql. Similarly a Data Lake can be deployed across any suitably large data storage platform, i.e. an on-site data center or hosted in the cloud.

In basic terms both Data Warehouses and Data Lakes can be thought of as the place where all data relevant to an organization’s goals is pulled together from various sources both internal and external (increasingly external). They both exist to facilitate an all encompassing view of an organization and how will it performs or provide a greater understanding of the organization’s environment, opportunities (e.g. customer preferences and habits) and threats. However they differ in terms of the data they are optimized to handle and are therefore better suited to different use cases.

What is a Data Warehouse?

A Data Warehouse is a method for storing and organising data that is optimized to support Business Intelligence (BI) activities such as analytics. To put it another way they solely exist and are constructed in a manner to best answer big questions efficiently. For this reason they typically hold vast quantities of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files but primarily transaction applications (Oracle, 2019). However in contrast to a transactional database were each transaction is a separate record, the relevant entries in a Data Warehouse are typically aggregated although they can also hold transaction records for archival purposes.

Figure 1: Typical Data Warehouse architecture of an SME (Databricks, 2019)

Single transaction records on their own are not typically very insightful to an organization, trying to identify consumer trends for example. Aggregating data based on facts and dimensions, e.g. the number of sales (fact) for a particular store (dimension), saves disk space and allows queries looking for that specific answer to be returned quickly. Data Warehouses mostly contain numeric data which is easily manipulated. As an example store sales might be the summation of thousands of rows of data to a single row.

Figure 2: Simplified example of a Data Warehouse internal structure (BIDataPro, 2018)

Data Warehouses also solve the problem of trying to derive information when there are too many sources, e.g. a multinational with thousands of store locations and subsidiaries, by creating a “single source of truth”. Effectively this means pulling all the information to one central location, transforming the data for uniformity and storing like for like data together. For example this could mean gathering all sales data from multiple locations and converting the currency to dollars. All of the data in one place together allows for different sources, which serve different purposes, to be combined via a single query. For example a report that links sales data and logistical data, coming from POS and SCM systems respectively, may not be possible with a single query if the systems are not linked. If best practices regarding I.T. security are being followed they certainly should not be.

Data Warehouses are fed from source systems using an extract, transform and load (ETL) solution. This means data is extracted from a source system, transformed to meet schema and business requirements of the Data Warehouse and then loaded. This is a data delivery method independent of any particular software vendor. There are various software to accomplish ETLs including the option to create a custom application. A variation of this process is extract, load and transform (ELT) in which the data is landed into tables raw and later transformed to meet the schema and business requirements of their intended final table. This method allows for greater auditability which could aid in regulatory compliance or post-mortems if the transformation process fails.

Once set up the Data Warehouse can facilitate statistical analysis, reporting, data mining and more sophisticated analytical applications that generate actionable information by applying machine learning and artificial intelligence (AI) algorithms (Oracle, 2019).

For an organization a single source of truth which will eliminate inconsistencies in reporting, establish a single set of global metrics and allow everyone in the organization to “sing from the same hymn sheet” is very important due to how beneficial the information provided is in directing informed decisions.

So if Data Warehouses have proven such an excellent platform for generating information why are alternatives needed? Well by design only a subset of the attributes are examined, so only pre-determined questions can be answered (Dixon, 2010). Also the data is aggregated so visibility into the lowest levels is lost (Dixon, 2010). The final major factor is that some of the most vital sources of information are no longer simply numerical in nature and generated by an organizations internal transactional system. So what has changed?

 

The Digital Universe

The data landscape has changed drastically in just a few short years. Like the physical universe, the digital universe is large and growing fast. It was estimated that by 2020 there would be nearly as many digital bits as there are stars in the observable universe (Turner, 2014). That estimate is somewhere in the region of 44 zettabytes, or 44 trillion gigabytes (Turner, 2014). Even though this quantity of data is already beyond human comprehension the rate of growth is probably the more impressive feat. For context there is over 10 times more data now than there was in 2013 when the digital universe was an estimated 4.4 zettabytes (Turner, 2014). The data we create and copy annually is estimated to reach 175 zettabytes by 2025 (Coughlin, 2018).

Where is all this data coming from?

The short answer is predominately us and the systems that service our needs. In the not too distant past the only entities to have computers generating and storing data were businesses, governments and other institutions. Now everyone has a computer of some description and with the advent of social media mass consumers became mass creators. When you stop to think of how many interactions a person has with electronic devices every day, directly or indirectly, you soon get a picture of how much data is actually being generated.

As an example of this endless generation of data the following are average social media usage stats over the course of one minute from 2018 (Marr, 2018):

  • Twitter users sent 473,400 tweets
  • Snapchat users shared 2 million photos
  • Instagram users posted 49,380 pictures
  • LinkedIn gained 120 new users

Other extraordinary data stats include (Marr, 2018):

  • Google processes more than 40,000 searches every second or 3.5 billion searches a day.
  • 5 billion people are active on Facebook every day. That’s one-fifth of the world’s population.
  • Two-thirds of the world’s population now owns a mobile phone.

Our way of life has become increasingly digitized with no better example than the effective global lockdown during the 2020 pandemic. Hundreds of millions of employees from around the world managed to continue working from home and did so effectively (Earley, 2020). This would have been unimaginable even by the late nineties. And yet as digitized as our world has become it is only the start. With emerging technologies such as self-driving cars, IoT smart devices and ever increasingly sophisticated robots entering our homes the 175 zettabytes of data by 2025 maybe a conservative estimate.

With so much of the stuff you would be forgiven for thinking all of this data is just a by-product but it is anything but. The data generated is an incredibly valuable asset if it can be analyzed properly and transformed into business relevant information.

What types of data are there?

The state of data within the digital universe can be summarized as structured, semi-structured and unstructured (Hammer, 2018).

The following is a non-exhaustive list of data types (Hammer, 2018):

  • CRM
  • POS
  • Financial
  • Loyalty card
  • Incident ticket
  • Email
  • PDF
  • Spreadsheet
  • Word processing
  • GPS
  • Log
  • Images
  • Social media
  • XML/JSON
  • Click stream
  • Forums
  • Blogs
  • Web content
  • RSS feed
  • Audio
  • Transcripts

Only the data types above in bold are suitable for aggregation (Hammer, 2018). The rest of the data types are typical of what now makes up a large proportion of the digital universe, and despite their value as data assets they are not suitable for storage or analysis within a Data Warehouse. This is because data needs to meet the predefined structure of a Data Warehouse in order for it to be accepted and aggregating these raw unstructured files, e.g. video and audio files etc., is not possible. So how are these types of valuable data turned into actionable information?

What is a Data Lake?

Data Warehouses have been utilized by data specialists for decades but the concept of Data Lakes is much more contemporary and much better suited to deal with storage, analysis and analytics of the semi-structured and unstructured data listed above. By design storage within a Data Lake of these kinds of data does not require files to be transformed as the file is kept in a raw state. Files can be simply copied from one file structure to another. Data Lakes also allow for working off the files directly which means the data can be used effectively immediately, i.e. as soon as it lands, rather than waiting weeks for the Data Warehouse developers to massage the data into a format that the data warehouse can accept if that is even possible (Hammer, 2018). Working with this type of data has become synonymous with the field of Big Data, which is defined by high velocity, high volume and high variability. As such the two methodologies of Data Warehouses and Data Lakes are not necessarily in competition with each other either, in fact depending on their definition (Data Lake is somewhat of an overloaded term (Bethke, 2017)) they could be argued to resolve difference problems and can complement each other when deployed within the same architecture.

There is some contention as to the definition of a Data Lake. Some would argue that original meaning  implied the Lake was a raw data reservoir solely (Bethke, 2017). By this definition the Data Lake is not too dissimilar to a staging area or Operational Data Store (ODS) in a data warehouse were raw copies of data from source systems are landed (Bethke, 2017). This would coincide with an ELT process as opposed to an ETL process. The transform and integration of the data happens later downstream during the populating of the data warehouse (Bethke, 2017). This understanding of a Data Lake still persists today in the minds of many data specialist as can be seen below in the overly simplified illustration.

Figure 3: Overly simplified illustration of a Data Lake architecture  (Hammer, 2018)

(Note: no indication of analysis being performed on the lake directly, the lake services the warehouse solely)

However it is an inaccurate understanding as the person who is credited with coining the term, James Dixon, used the following analogy when he explained a Data Lake:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” (Dixon, 2010)

By stating “various users of the lake can come to examine, dive in, or take samples” Dixon is clearly implying that a feature of the Data Lake is that it is accessible prior to the data being transformed and made available in a Data Warehouse.

This is where Data Lakes and Data Warehouses take an opposing strategy on applying structure to data which is perhaps why they are often mistaken as alternative competing concepts to each other. A Data Warehouse requires Schema on Write whereas a Data Lake uses Schema on Read.

With schema on write all of the relevant data structure needs to be prepared in advance which means all of the relevant business questions need to be thought of in advance. This rarely results in a situation where all the relevant stakeholders have their needs met and if they do it will not be for very long.  This scenario is workable by an organization looking to aggregate finance data they are very familiar with but it is especially difficult when dealing with Big Data were the questions are unknown.

With schema on read the schema is only applied when the data is read allowing for a schema that is adaptable to the queries being issued. This means you are not stuck with a predetermined one-size-fits-all schema (Pasqua, 2014). This allows for the storage of unstructured data and since it is not necessary to define the schema before storing the data it makes it easier to bring in new data sources on the fly. The exploding growth of unstructured data and overhead of ETL for storing data in RDBMS is the offered as a leading reason for the shift to schema on read (Henson, 2016).

When dealing with Big Data the problem of a predefined schema can be so burdensome that it can sink a data project or increase the time-to-value past the point of relevance (Pasqua, 2014). Using a schema on read approach on data as-is means getting value from it right away (Pasqua, 2014). The flexibility of Data Lakes in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner.

 

Data Lakes Scalability

By using schema on read the constraint on scale is virtually removed. The threat of a bottleneck still exists but now in the form of physical constraints in terms of the hardware available. This is why online cloud offerings such as Amazon S3 and Azure Data Lake from Microsoft have become so popular. Of course on-site data centers are also an option with Hadoop being a very popular solution which combines a Data Lake structure with analytically capabilities.  This level of scalability also safe guards against Data Silos. A Data Silo is an undesirable situation where only one group or a limited number of people in an organization have access to a source of data that has a broader relevance to people across an organization (Plixer, 2018).

Data Lakes are intended by design and philosophy to be an antithesis to Data Silos where all an organizations data is stored together in one lake. However centrally storing all data is not without significant security concerns and losing sight of what customer data is on hand can run afoul of numerous legal requirements such as GDPR.

 

Data Lakes Analysis & Analytics

A defining feature of Big Data analytics is the concept of bringing the analytics to the data rather than the data to the analytics. Traditionally analytics was carried out by feeding single flat files into an algorithm with the time taken to prepare these files being significant. Although accessing the raw files directly is potentially a failing as it has the potential to break the principle of a single source of truth and therefore runs the risk of introducing inconsistencies between reports and other forms of analysis. As you can imagine this is complex and disciplined work which is why Data Lakes, at this point in their maturity, are best suited to Data Scientists and advanced Data Analysts (Hammer, 2018). However this goes against the Data Lake ethos of “data for all” as it only allows the very skilled to have access. This creates the problem Data Lakes were meant to solve by imposing restrictions or “data for the select few”. With Data scientists acting as the gatekeepers an organizations stakeholders can lose sight of the useful data available to them. Worse still is that valuable data may come from external sources with stakeholders having no visibly of it prior to it landing in the Data Lake. This may leave stakeholders with no option but to take action based on an analysis produced by a Data Scientist with accuracy of the analysis being a matter of fate because the stakeholder has no data to say otherwise.  In comparison the creation of a Data Warehouse is usually a collaboration of stakeholders, familiar with internal sources systems and data, and developers. Once a Data Warehouse is created, far less skilled (and cheaper) Data Analysts will have the ability to navigate the internal structure and compile valuable reports.

Despite the obvious concerns the significance of scalability and direct raw data analysis cannot be overlooked. The sooner an organization is informed the sooner it can act. In real world terms this could save millions of dollars, save thousands of jobs or stop the organizations itself from going under. However the benefits of scalability and earlier data access are not without risks as poorly managed Data Lakes have the potential to turn into Data Swamps. Data Swamps are poorly managed Data Lakes that become a dumping ground for data. Though the data may be unstructured the method in which it is stored must not be or visibility of what is stored and where it is stored will be lost. Failure to catalogue the data, letting users know what is available while making the attributes of the data known, will overwhelm users and result in the garbage results (Hammer, 2018). Successful implementation of a Data Lake is complex and requires ongoing commitment to maintain but for a large organization that needs to make better use of the wider range of data available in the digital universe a Data Lake is a necessity.

 

Conclusion

A Data Lake is not a replacement for a Data Warehouse. Data Lakes are better equipped to solve the different problems associated with dealing with semi-structured to unstructured data. Their flexibility in this regard allows them to surpass Data Warehouses in terms of scalability while making data accessible for analysis sooner. However Data Lakes are not without their drawbacks. They require highly skilled and expensive staff to develop and maintain. They potentially run a greater risk of failing spectacularly by devolving into a Data Swamp and could potentially become a serious liability from a regulatory standpoint if this was to happen. Organisations can also be left at the mercy of Data Scientists in how accurate they are in analyzing data and producing correct reports as stakeholders may not have the expertise to retrieve data from the Data Lake themselves. 

Thankfully Data Warehouses are still perfectly suited for dealing with numeric data and for organizations that still predominately use their own internal transactional systems in the creation of actionable information these organisations have no immediate need to utilize any alternatives.

 

References:

Bethke, U. (2017) ‘Are Data Lakes Fake News?’, Sonra, 8 August. Available at: http://www.kdnuggets.com/2017/09/data-lakes-fake-news.html (Accessed: 4 July 2020).

BIDataPro (2018) ‘What is Fact Table in Data Warehouse’, BIDataPro, 23 April. Available at: https://bidatapro.net/2018/04/23/what-is-fact-table-in-data-warehouse/ (Accessed: 4 July 2020).

Coughlin, T. (2018) 175 Zettabytes By 2025, Forbes. Available at: https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-by-2025/ (Accessed: 4 July 2020).

Databricks (2019) ‘Unified Data Warehouse’, Databricks, 8 February. Available at: https://databricks.com/glossary/unified-data-warehouse (Accessed: 4 July 2020).

Dixon, J. (2010) ‘Pentaho, Hadoop, and Data Lakes’, James Dixon’s Blog, 14 October. Available at: https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ (Accessed: 4 July 2020).

Earley, K. (2020) Google and Facebook extend work-from-home policies to 2021, Silicon Republic. Available at: https://www.siliconrepublic.com/companies/google-facebook-remote-work-until-2021 (Accessed: 5 July 2020).

Hammer, D. (2018) What is a data lake? – The Hammer | The Hammer. Available at: https://www.sqlhammer.com/what-is-a-data-lake/ (Accessed: 4 July 2020).

Henson, T. (2016) ‘Schema On Read vs. Schema On Write Explained’, Thomas Henson, 14 November. Available at: https://www.thomashenson.com/schema-read-vs-schema-write-explained/ (Accessed: 6 July 2020).

Marr, B. (2018) How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, Forbes. Available at: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ (Accessed: 5 July 2020).

Oracle (2019) What Is a Data Warehouse | Oracle Ireland. Available at: https://www.oracle.com/ie/database/what-is-a-data-warehouse/ (Accessed: 5 July 2020).

Pasqua, J. (2014) Schema-on-Read vs Schema-on-Write, MarkLogic. Available at: https://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/ (Accessed: 6 July 2020).

Plixer (2018) What is a Data Silo and Why is It Bad for Your Organization? Available at: https://www.plixer.com/blog/data-silo-what-is-it-why-is-it-bad/ (Accessed: 6 July 2020). Turner, V. (2014) The Digital Universe of Opportunities. Available at: https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm (Accessed: 5 July 2020).

How to fix “your model is not manifold” error in Cura

You just pulled a file from Thingiverse and now when you try to slice it in Cura it says “your model is not manifold”.  If you just want to make the error go away you can skip to the paragraph “The Fix”  if you don’t want to be “filled in” on why it’s happening in the first place. Little bit of 3D printing humour there for ya.

So what does the error mean?

For a model to be manifold you can think of it as having logically enclosed space in a manner that can exist in real life with an outer geometry that can actually be 3D printed.

So what does non-manifold mean?

There’s a couple of reasons why a model is not manifold and here they are:

  • Self-intersecting
  • Separate Object
  • Hole
  • Inner Faces
  • Overlapping Geometry

Self-intersecting: This is a bit of a weird analogy but imagine punching through yourself. You can’t do that in real life without making a big hole but in a virtual model of yourself you can have the objects of your fist and forearm pass through another body part of your choosing and that’s perfectly fine but in real life you can’t have two objects with mass occupy the same space at the same time so this cannot be printed.

Separate Object: Imagine a model of a figurine wearing sun glasses. If the sun glasses and the figurine were two separate objects and you shrank the figurine by rescaling it to 90% its original size the sun glasses might be left floating in midair. That’s fine for a virtual 3D model but in real life gravity might have something to say about that.

Hole: Pretty self explanatory, there’s a hole in the model and not like a window just a void that makes the model impossible to print successfully.

Inner Faces: Imagine trying to print a model within a model. The slicer reads the code and gets confused because there should only be one outer surface area not two.

Overlapping Geometry: Imagine you have created a 3D model of a house and you’ve accidentally copied the roof and then pasted it back on top of the model over the original roof. The model now has two roofs occupying the same space which cannot be printed.

 

The Fix:

Ideally you should open the file with some 3D modelling software and fix it manually but if you’re just pulling files from Thingiverse that’s a bit unrealistic. Luckily the following site allows you to upload files and it will try to fix them automatically.

https://3d-print.jomatik.de/en/index.php

If the process successfully fixes the file it will give you the option to download the file with a brief summary of what changes it made highlighting big changes in red.

Its a great solution especially for low risk models but the onus will always be on you to manually inspect the model to see if the problems have in fact been resolved. Also if you’re working on a super secret product design for a company probably best not to upload the model to be fixed online, but for files you’ve pulled from Thingiverse sure why not they’re already publicly available anyway.

The New Fix:

The website above no longer provides the functionality to automatically make objects manifold unfortunately.

As an alternative solution, download and Install Slic3r.

Start Slic3r, go to file and then “Repair STL file . . .” and load in the file you want to fix.

You will then be able to open the file with Cura and hopefully it should be fixed (note: the solution is a bit hit and miss).

Alternatively download Meshmixer, open the problem file, go to edit and then make solid. This is not a guaranteed fix either but may fix some minor gaps and errors.

How to use a generic PC controller with GTA 5

So you plugged in the cheap PC controller you bought off eBay or Amazon (say one that is coincidentally shaped like an Xbox controller) and found it doesn’t work with GTA 5?

This is probably because the controller is using the DirectInput standard as opposed to the newer XInput standard. You can read more about these standards from Microsoft by clinking on this link.

GTA 5 (or GTA V if you’re feeling fancy) was not optimised to use the DirectInput standard unfortunately. However if you’re playing the game on PC you should be using a keyboard and mouse like a grown-up anyway, it’s way better for shooters. Ah but GTA isn’t all about the shooting I hear you say and you’re right. I’ll admit I switch to a controller for flying vehciles because they are horrible to pilot with direction keys. Analogue sticks are much better suited to aircraft.

So the workaround for being able to use the cheap generic controller is quite simple but requires using an “Xbox 360 Controller Emulator”.

Download the file x360ce_x64 from the following site by clicking on the “download for 64 bit games” button at the top of the screen.

https://www.x360ce.com/

(Be sure to test the downloaded file with whatever antivirus software you have installed)

Once you are satisfied the software is safe extract the file to the root directory of where you installed GTA 5.

(If you do not know where that folder is try searching GTAV in your Windows search bar. Note a folder called GTA V is often created in the Documents folder but this is not the correct directory.  The correct directory will have application files with the GTAV logo in it. The game maybe in a Rockstar folder or perhaps a Steam or Epic folder, it all depends on who you bought it from.)

Right click on x360ce_x64 and run the file as administrator and you should be given an option button to “Create”.

Click this button and a Xbox controller calibration window will open.

At this point if your controller has a large circular button at its center press it and make sure it lights up otherwise it may not send the right signals when the controller is being mapped.

Click “Auto” and then “Save”.

I found the A, B, X and Y face buttons were not mapped correctly and needed to be mapped manually.

To do this, beside each face button on the emulator interface there is a drop down menu. Click on it for each button and choose the option to record. The interface will highlight which button to press on your controller to map it correctly. Once the buttons are mapped correctly click “Save” again.

After following these steps you should now be able to play GTA V with the controller.

How to secure Creality Ender 3 Pro TL-Smoothers once installed

This post will outline how to secure TL-Smoothers within the Creality Ender Pro circuit housing without blocking vents.

Prerequisites:

You will need insulation tape and double-sided sellotape to following the instructions.

What are TL-Smoothers?

A TL-Smoother, as displayed below, is an add-on circuit module for 3D printer stepper motor drivers. They seek to lower vibration, lower noise and provide a smoother result by way of cleaning up electrical signals. Their use is intended to compensate for less than premium motor control circuits that are used in more budget 3D printers.

TL-Smoother

(Note: there is an ongoing debate as to whether TL-Smoothers have any benefit on print quality for the Ender 3 Pro. This post will not cover this topic or the installation process as a whole. This post simply puts forth advice regarding good placement and how to secure the circuits.)

Steps:

The four pin sockets of the TL-Smoothers protrude on the back side of the circuit.

Clip the ends off of these protrusion using the electrical wire cable cutters, provided with the printer, leaving the back of each circuit flat.

Cover the four pins on either end of each circuit with insulation tape, this will prevent shorts.

Cut a small square of double-sided tape and secure it to the back of each circuit as shown below.

DoubleSidedTapeSmall

Secure the circuits as demonstrated below using the double-sided tape. These locations will ensure everything will fit inside the housing while leaving enough space for the fan. The green square in the picture marks where the fan will be situated when the housing is closed back up.

Smoother Placement

Happy Printing. ☮