There is a significant gap emerging between expected and actual benefits of Manufacturing 4.0, and the cause is hesitation to fully embrace a data-driven analytics approach within the ranks. But there is no need to hesitate. It’s time to dive right in. By Danny Smith

Inundated with marketing hyperbole on the value of analytics these days? Wondering if there is more hype than reality? There is certainly a lot of attention these days on the value of a data-driven approach, with some very visible successes in the efforts to monetize data by the likes of Google and Facebook that have probably gotten the attention of your board. Along with all of the buzz comes high expectations. But there hasn’t been widespread adoption of a data-driven analytics approach within the manufacturing community despite board-level inquiries and C- level initiatives.

Most manufacturers are just not fully committed to data-driven analytics like they are to Lean, Six Sigma, and other improvement methodologies. Their organizations are filled with quantitatively-trained people such as engineers, information, and operations technologists. They have embraced the concepts of Manufacturing 4.0 and have started to heavily invest in Digital Twins and other Industrial Internet of Things (IIoT) technologies. But not data-driven analytics.

Understanding why is critical to accelerating the journey to fully benefiting from M4.0 efforts. The issue may come down to comfort levels in the ranks. Fortunately, there are practices that can alleviate the nervousness and kickstart organizations along the path to analytics maturity.

Manufacturing, You Have a Problem 

Data science as a formal discipline has been around for many years. But the technology stack — sensors, data, bandwidth, storage, compute, memory, algorithms — all coming together to give unique in-situ insights into manufacturing processes is relatively new, and manufacturing as an industry is late to the table. Engineering culture will want to take on the job. But be careful! Data science is NOT engineering and is NOT computer science. It is its own separate discipline. Don’t be fooled by appearances. This will lead to problems. You may hear these symptomatic comments:

We may be too early. Being data driven requires data. Some manufacturers are early on the journey to cyber-physical systems and may feel they lack the data necessary to derive valuable insights. Challenge this assumption and investigate. Many find that they have lots of data but it’s just not easy to access, which is a different problem but one that is more easily overcome. Resist the urge of IT staff to build structured data warehouses. Purposely exploring data from original sources as part of the preparation for deeper analytic analysis is important, and aggregated and pre-structured data tends to not be granular enough for insights.

We lack the skills/understanding. Recent research from analytics expert J.G. Harris (The Team Solution to the Data Scientist Shortage) took a deeper look at the skills a data scientist has and found that there are at least five separate skillsets: systems architect, quantitative analyst, business analyst, visualization engineer, and software engineer. The quantitative analyst has a deep background in statistics and other advanced modeling techniques. This skillset is the most lacking in manufacturers. Engineers usually only take one or two statistics classes at best.

We can’t agree philosophically on the approach. Engineers are trained to focus on the why and seek to understand the world via first principles – whether of physics or chemistry. Organizations have invested in lean/six sigma techniques and probably have many certified master blackbelts. Traditional improvement methodologies stress root cause analysis and seek to fully understanding the process, the physics/chemistry, and the cause/effect (e.g. Kaizen “5 Whys”). But many manufacturing processes are so complex they surpass the ability of a human to understand all of the interactions. It may be a unique combination of many variables (that are all in control univariately) that is causing the issue. Additionally, in-situ data is typically so voluminous that a human may miss critical data items due to noise.

Approaches that stress human knowledge/expertise and understood logic leave open the potential that critical relationships and interrelations are missed. Example: after three rounds of Kaizen events to improve end-of-line first pass yield, where master black belts pored over production process and quality data with a fine-tooth comb, a major white-goods manufacturer was ready to move on to easier projects. After switching to a data-driven approach, it discovered that one in-situ data stream explained first-pass fail results more than any other variable. This data stream related not to process or quality but to the equipment on the line. All of the experts missed the potential for this causal relationship, but the data-driven approach spotted it immediately. It is critical to note: data-driven and traditional process quality techniques are NOT incompatible but are very complementary. It does require an open mind by the humans involved to leverage both as well as the skills to do so.

You may see other philosophical differences catch you as well. Many machine learning techniques are executed in code (programming) leading some to assume that existing computer science resources can take on the job. They can execute the coding but probably are missing the foundation in statistics that is required to make sure the technique selected is valid and that the data conforms to algorithm assumptions.

There is a problem in manufacturing environments with humans being able to accept the results of analytical models.

Getting Comfortable with the Analytic Lifecycle 

You can’t go from data to insights directly. There is a sequence to data-driven analysis that is helpful to walk through with your engineers and master black belts as well as IT and line of business colleagues:

Prioritize. Start with a valuable problem. Again, common sense, but most need a reminder. If it’s not keeping you up at night, than analyzing it is not worth it.

Explore. Identify all available data sources. Leverage team expertise but don’t discard data without evaluating it analytically first. Explore data behavior individually and in relation to each other. These first steps are commonly called descriptive analytics. More sophisticated techniques identify possible relationships between causes and effects. Determining these by statistical techniques, rather than experience and/or first principles of physics, is a fundamental difference between data-driven and traditional techniques. The algorithms can spot the unsuspected relationships in data much better than humans can but only if you let them.

Prepare. Algorithms do the heavy lifting, but the data needs to be in the right form first. Typically, 80% of analytics work is data preparation. Exploration is a large part, but, frequently, underlying assumptions of specific modeling techniques require transformation of data into other forms. In-situ data typically has a non-normal distribution (binomial is common) but many data mining techniques require data to be normally distributed. Transforming the in-situ data with a log function is one example of a technique, generally called feature engineering, to allow use. Additionally, most analytic techniques require all of the data to be in the same addressable space. This has significant implications for hardware and how you store data. The data usually needs to be in-memory, and de-normalized. Biggest implications: you’ll need to further process the data for analysis. Analytic-savvy organizations are moving to analytic-friendly data stores (e.g. Hadoop) commonly known as data lakes and leveraging cloud compute environments in some cases. Check with your IT group for skills and readiness in these areas – data prep, storage, and computing for analytics is different than traditional approaches.

  • Model. There is a lot of buzz around modeling techniques. This is where you need to invest in talent. Make sure it is data science talent versus computer science talent. There are many different modeling techniques, and the ones you hear about today are improvements on techniques dating back decades. But all require a good grasp of the assumptions they are built on, and the formal data scientist will keep you from using a technique that will mislead you.

    Machine learning (ML) and artificial intelligence (AI) are two buzzwords worth mentioning. There are several types of ML techniques in common use in manufacturing:

  • Supervised Learning. This is teaching by example and it requires a known answer (labeled or classified output variables). A typical example in manufacturing is explaining/predicting an end-of-line quality test result (you would label the test data with pass or fail). There are many supervised learning techniques including decision trees (and derivatives like random forest and gradient boosting), regressions, support vector machines, and neural networks. The downside of supervised learning is the requirement for an end-result classification to exist, and these techniques need lots of data to learn from. Semi-supervised techniques can operate with a mix of known and unknown results in training data.
  • Unsupervised Learning. Classified data is not necessary. It draws inferences and conclusions based solely on analyzing input data. It is very useful in manufacturing because you can use these techniques to monitor for abnormal vs. normal conditions without the requirement of a bad/good classification associated with in-situ data. Anomaly detection on rotating equipment (drives, pumps, and motors) uses unsupervised techniques. There are many unsupervised techniques, including various clustering techniques, nearest-neighbor mappings, affinity analysis, and singular value decomposition.

Only certain classes of problems lend themselves to ML techniques:

  • Associations or rules easily intuited but not easily codified or defined by simple rules;
  • Combinatorial problems with defined, discrete outputs and diverse input conditions;
  • When data is problematic for traditional analytic techniques.

There is a downside to ML models – they are hard to interpret and work best for applications when accuracy is more important than interpretation. This is a problem in manufacturing environments with humans accepting the results of the models; engineering culture wants to ask why the model flagged a potential failure. There is active research in using other analytic techniques to interpret ML models for human understanding.

Deep Learning (DL) is a special category of machine learning, and it is the subject of many research efforts. It is the technique that Google uses to identify if a picture shows a cat or not. DL makes use of extremely sophisticated neural networks. Models generated are significantly more complex (deep refers to the models having more hidden layers) than traditional neural networks. DL techniques are very useful for building predictive models using images, and a typical use in manufacturing is to identify minute flaws and misalignments on the line by capturing images rather than relying on human inspection.

A couple of warnings: deep learning models tend to ingest vastly larger amounts of data than their predecessors so you need the data to effectively use them. And they are extremely hard to interpret and/or explain – worse than other machine learning models — which causes humans to mistrust the outputs.

Artificial Intelligence (AI) is another term thrown about and used to excite and threaten. We’ll skip artificial general intelligence (AGI) – this is where the computer becomes smarter than we are, and we all work for it – and focus more on artificial narrow intelligence (ANI), where the computer will take over human tasks such as driving or reading a report for insights.

ANI, when you remove the marketing jargon, is the science of training systems to emulate human tasks through learning and automation. It leverages machine learning, and especially deep learning techniques; often encompasses imaging and computer vision; and can also include capabilities to let the computer interact with a human in a more natural way, including through natural language processing, interaction, understanding, and generation. AI combines modeling with deployment and decisioning.

Deploy. The modeling phase of the lifecycle gets most of the attention, but a big oversight for many is how to deploy the model into production. Deployment could be as simple as a report showing an insight, that a human can look at and determine an action. More typically, as new in-situ data is generated in real-time, that data is then “scored” against a predictive model to drive an alert. The predictive model is built in centralized servers (and IT function), but many manufacturers find they must recode the scoring mechanism to run in their operations technology (OT) platform close to the in-situ data (for low latency).

Asking the question “how will the insights generated by the model be deployed into production?” early is key. Another important aspect to deployment is monitoring the models for degradation. Most novices don’t recognize that a model doesn’t stay accurate indefinitely as data changes models and they need to be retrained. This is a continuous process and is the data science version of condition-based maintenance. Don’t put this off – the credibility of the models depend on vigilance and retraining as appropriate. Automating this process is the key to scale.

Decide. To turn the analytically derived data-driven insights into results/benefits, you need to act. Ensuring action can be a challenge if left to human intervention. Analytically advanced organizations will automate the decision process for immediate action. This requires standard operating procedures that are well designed and robust, as well as machine-to-machine interfaces to execute the procedures. It may not be appropriate to fully automate the decision. Robust processes should be adopted whether manual or automated.

In many ways, AI is the culmination of the Analytic Lifecycle: a data-driven analytic insight that has automated action and can be considered a sophisticated and fully executed analytic solution to a hard problem. You can see the allure of AI. Now you understand the details underneath.

“Engineering culture will feel awkward at first integrating a data-driven approach, but engineers respond well to results.

A High Tech Case Study 

Here is an example of how the analytic lifecycle played out at a high-tech manufacturer. The company, a contract manufacturer, makes electronic circuit boards. A low-margin business, the company requires a high end-of-line first pass yield to perform financially and satisfy customers. Lines use surface mount technology to solder chips and other components to the surface of a printed circuit board. An early and critical step in the process is the application of solder paste via a template (like making a screen print t-shirt). If the template doesn’t align perfectly to the board, then the later steps where expensive components are soldered to the board are wasted and a lot of re-work and/or scrap costs are incurred. Existing alerting had too many false positives. The company wanted to see if it could predict a failure at end of line directly after the solder paste application. Here are key actions the firm took:

Prioritize: High impact – low first pass yield making line unprofitable. Fix or abandon business.

Explore: Optical inspection data available in two forms – raw images and processed structured data derived from the images. Solder paste inspection data can be sent in real-time off line. End-of-line test pass/fail with defect code classified by test engineers from many months of production. Environmental sensor data (temperature and humidity), PCB configuration and supplier, and manufacturing execution system data all available by board.

Prepare: Image data was processed by inspection machine into several discrete variables: offset from ideal x/y, height, area, volume per solder point (20,000+ per board). Further transformation of data was required to format into an analytic-ready form (similar to using a pivot table in a spreadsheet). More transformation of raw images directly was also performed. Additionally, training data and hold-out data (to test accuracy) was separated. This is a critical step to not get an overfit model – one that predicts the training data but isn’t as accurate on new data.

Model: A champion modeling process, which involves trying several modeling techniques and picking the most accurate, was performed using regressions, decision trees, gradient boosting, random forests, and shallow neural networks. Random forest had the highest accuracy with lowest false positives, an important criterion for the plant manager, who didn’t need any more useless alerts. Initial model accuracy was 90%, with successive iterations bringing accuracy up to 98.5%, at which point the plant manager agreed to put it into production. Final production model was a deep learning model using raw images directly. This was where the most resistance to trusting the data-driven approach came from, but low false positives satisfied the plant manager and he overcame his initial discomfort.

Deploy: The model score code was embedded in an analytically capable event stream processor that scored real-time data coming off the optical inspection step on an edge device (and IoT Gateway) next to the line. Alerts were sent to line engineers and operators.

Decide: Standard operating procedures were created to instruct the operators to “wash the board” – pull the board after the solder paste application, wash off the paste, and try again. Sequential failures were sent to quality inspectors.

This manufacturer doesn’t share the financial impact, but the line is now profitable, and it continues with that business.

Creating an Analytic Culture 

Here is how to increase your team’s comfort level with data-driven analytics:

Challenge the team to get out of their comfort zone. Engineering cultures will feel awkward at first integrating a data-driven approach, but engineers respond well to results and learn fast. One successful project will invigorate the entire organization. Think beyond first principles and follow the data. You might be amazed at what you find.

Start with a valuable problem. It’s worth repeating and is a rule to live by. Go find a sponsor! With an issue! Risk-taking business leaders with P&L responsibility are best. They will tell their peers of successes and lead the rest of the organization by example.

Treat your data as a highly valuable asset, especially in-situ data. Collect it, store it in accessible and analytic-ready stores, and leave it unstructured. If you think you have a great data source in your data warehouse, be prepared for some work. Chances are it’s been structured, aggregated, and cleansed to the point of limited value. Investigate source systems like historians, line side equipment, and SCADA/DCS systems. Usually you have more data than you think; it just might be hard to get to. A common data integration tool is the USB drive. Assume you will have to contextualize the in-situ data with other data sources like MES and ERP systems.

Invest in your people. Staff data-driven teams appropriately. A mix of domain knowledge from line engineers, master black belt process experts, data processers from IT, data-driven quantitative analyst/statistician/data scientist and line of business make a good core team to get things done. Remember, data science is the skill you probably need to invest in the most but it’s not the answer by itself. Consider a partner strategy to bridge the gap while you build up data science expertise.

Establish a process. Because you want your data-driven teams to crank out work on a repeatable, reliable process, think of building an analytics assembly line. Follow the process and invest in an integrated analytics platform. Don’t build every analysis by hand – you don’t build your products that way, why build your analytics that way?

Finally, always think about deployment from the start. The best insights provided by your analytic assembly line won’t provide any value unless the insights are deployed in the real world. Continue to drive for automation. Manual insights deployed by humans over time are a great start. Automated real-time execution of analytic-driven standard operating procedures gets to the value quicker and more consistently than any other mechanism.

If manufacturers can acknowledge their discomfort in adopting data-driven analytic approaches, get beyond the resistance, and set up their teams for success, their investments in M4.0 infrastructure will yield real results – and beyond. It’s time to dive in.   M