Spotless Data

How would your home look like if you let dirt and mess accumulate for years? It would be a health hazard and would also make it impossible to find what you need when you need it most. In the end, you would reach a point when the problem simply couldn’t be overlooked. This is the situation that many plant managers are facing after accumulating huge quantities of manufacturing data over the years. 

By implementing a data-driven company culture, manufacturers can exponentially improve virtually any aspect of production. Big data can be used, among other things, to maximise energy efficiency, improve the business’s predictive maintenance strategy, and prevent downtime caused by equipment failure. To do this, manufacturers need accurate and reliable data.    
 
But when data is collected and accumulated for several years, its quality can start to decline. Dirty or rogue data is data affected by issues such as duplicates, inaccuracies, inconsistencies, and out-of-date information. When plants reach this point, it’s time for a good clean-up. 

Not The Exception

Dirty data is the norm, not the exception. As companies evolve, the amount of data they collect grows in quantity and complexity. High employee turnover, the use of different enterprise resources planning (ERP) solutions across several departments, and lack of standard guidelines for data entry complicate the situation. For these reasons, achieving perfect data is almost impossible, especially in large organisations.  

Data cleansing, or cleaning, is the process of detecting and correcting or eliminating incomplete, inaccurate, out-of-date or irrelevant data. It differs from data validation in that the latter is automatically performed by the system at the time of data entry, while data cleaning is done later on batches of data that have become unreliable.

There are a lot of data cleansing tools available, such as Trifacta, Openprise, WinPure, OpenRefine and many more. It’s also possible to use libraries like Panda for Python, or Dplyr for R. The variety of solutions on the market means that manufacturers might want to consult a data analyst to choose the best one for their business case.

How Dirty, Exactly?

Regardless of the solution employed and the type of data being cleansed, the first step is assessing the quality of the existing data. In this phase, a data analyst will assess the company’s needs and establish specific KPIs for clean data. Legacy data is then audited using statistical and database methods to reveal anomalies and inconsistencies.  

This can be done using commercial software that allows the user to specify various constraints. The existing data will be uploaded and tested against these constraints, and data that doesn’t pass the test should be cleansed.  

During this phase, manufacturers should establish which input fields must be standardised across the company. Standardisation rules can help businesses prevent the build-up of dirty data in that they minimise inconsistencies and facilitate the uploading of clean data into a common ERP.

Keep It Clean

After the audit, the cleaning process can begin. Data will pass through a series of automated software programmes that discard what is not compliant with the specified KPIs. The result is then tested for correctness and incomplete data will be amended manually, if possible. A final quality control phase will ensure that the output data is clean enough to by seamlessly uploaded into the chosen ERP.

However, just like when cleaning our homes, a big clean-up every now and then is not enough. The best approach is to implement a culture of continuous data improvement, distributing tasks among each member of the team. Developing practices that support ongoing data hygiene is the key to success.

About the Author:  Neil Ballinger is head of EMEA at automation parts supplier EU Automation and for more information on how to use big data to optimise your business, visit www.euautomation.com

Image: Unsplash

You Might Also Read: 

Some Expert Predictions For Industrial Cyber Security:

 

« Myanmar’s Cyber Security Bill
A Successful Solar Winds Investigation »

ManageEngine
CyberSecurity Jobsite
Check Point

Directory of Suppliers

Tines

Tines

The Tines security automation platform helps security teams automate manual tasks, making them more effective and efficient.

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.

Jooble

Jooble

Jooble is a job search aggregator operating in 71 countries worldwide. We simplify the job search process by displaying active job ads from major job boards and career sites across the internet.

Clayden Law

Clayden Law

Clayden Law advise global businesses that buy and sell technology products and services. We are experts in information technology, data privacy and cybersecurity law.

ASIS International

ASIS International

ASIS International is a global community of security practitioners with a role in the protection of assets - people, property, and/or information.

Cyber Security & Information Systems Information Analysis Center (CSIAC) - USA

Cyber Security & Information Systems Information Analysis Center (CSIAC) - USA

CSIAC is chartered to leverage best practices and expertise from government, industry, and academia on cyber security and information technology.

Averon

Averon

Averon's technology is the new gold standard for digital identity - the easiest, fastest and most secure verification solution for users on both WiFi and LTE.

KvantPhone

KvantPhone

KvantPhone (formerly CryptTalk) is an easy-to-use, quantum resistant secure communication service designed for businesses and large organizations.

Exatel

Exatel

Exatel is Poland’s leading provider of ICT security services.

CyberTech Network

CyberTech Network

CyberTECH is a global cybersecurity, Internet of Things (IoT) and Smart City network ecosystem and incubator operator.

CyberProof

CyberProof

CyberProof aims to give clarity and confidence to businesses worldwide using a new risk-based approach to cyber security services.

Fiserv

Fiserv

Fiserv offers a wide array of Risk & Compliance solutions to help you prevent losses from fraud and ensure adherence to regulatory and compliance mandates.

Noventiq

Noventiq

Noventiq (the brandname of Softline Holding plc) is a leading global solutions and services provider in digital transformation and cybersecurity.

Prove Identity

Prove Identity

Prove (formerly Payfone) is a leader in mobile & digital identity authentication for the connected world.

Dynamic Quest

Dynamic Quest

Dynamic Quest is a managed IT, cloud and security services companies, providing a comprehensive range of technology services including cybersecurity, backup and disaster recovery.

XioGuard

XioGuard

XioGuard is a managed security service for 360-degree cybersecurity coverage, protecting the entire attack surface, increasing performance, reducing cost, and simplifying operations.

The CyberWire

The CyberWire

The CyberWire gets people up to speed on cyber quickly and keeps them a step ahead in a continually changing industry.

Superna

Superna

Superna is the global leader in data security and cyberstorage solutions for unstructured data, both on-prem and in the hybrid multi-cloud.

CovertSwarm

CovertSwarm

Since 2020 CovertSwarm have been radically redefining how enterprise security risks are discovered. We outpace the cyber threats faced by our clients using a constant cyber attack methodology.

Cyberscope

Cyberscope

Cyberscope is a Web3 security firm specializing in smart contract audits, crypto security audits, and blockchain vulnerability assessments.