Improve your process, improve your performance


No Data, No Problem-My Lean Six Sigma Data Collection Secrets

Posted by kyle toppazzini on Wed, Jul 11, 2012

Lean six Sigma, Lean, Sigma, Process, AssessmentI can’t tell you how many times I have heard this- “WE DON’T HAVE THE DATA FOR THIS, I guess we will need to make an educated guess”.   What happened if your doctor told you that he didn’t know what was wrong with you so he was just going to guess at it?  How would you react to the doctor?

In the Lean Six Sigma engagements I work on when I hear “We don’t have the data for this”, my response is “Let’s CREATE the Data”.  Without fail, I get the deer in the headlights stare for about a second or two until I explain what I am proposing.  This blog describes how to go about dealing with situations when there is no data.

Below are just some of the tools that you can use.


Manual Data Collection Forms 

Once you have created a value stream map or just a regular process map, you should have identified the primary metrics to capture.  Most metrics deal with time, quality, and cost across the value stream.  Metrics, e.g. queue time, re-work, defects, process time, are often not captured in a system.  Although you have a few ways of collecting the data, such as asking employees performing the work for their best estimate; however, the data is likely to be biased.  You can also do what I do and that is undertake a data collection exercise with employees for 10 to 15 days. 

I work with employees to create simple data collection forms that they will use to keep track of when an activity starts, when it ends, the type of activity, when the work was received, defects and types of defects.  Two elements are critically important when you design the data collection forms:  1) make it as simple and least time consuming for employees to complete, and 2) ensure that all the details are captured so that you can separate the different types of activities. Where possible, we ask the employee to fill in an electronic PDF or web form; however, we have balance that with the time requirement to fill in the form.

I also select through a random stratified sample of the employees for whom we will collect the data from.  The stratification includes factors such as experience, employee level, type of work that the employees perform (some employees may work on only dedicated files), language (if relevant), age, shift (if applied).   I also randomize the days in which the data will be collected to ensure that variations in demand and work type are well represented. 

Typically, within 10-15 days over a 1-month period, I am able to capture more than 5,000 data observations with consistent data across employees, divisions, and regions.  As a verification check, I often input the data collected into a process simulation model and conduct an assessment of the model outputs in comparision to actual output numbers.  I am pretty confidence with the data collected if the variation between the model and actual outputs is between 1% and 4%.


Reverse Engineering or Backwards Induction

Reverse engineering or backward induction tends to take a few iterations and requires experience to obtain a good approximation of the numbers.  Here is how my reverse engineering method works.

  1. I conduct a working session with employees for us to go through the amount of time he/she takes to do various activities, the defect rates, process wait time etc.  I ask employees to provide estimates times, number of defects etc, for an average day, a low workload day and the highest workload days.
  2. I enter that data (the data is usually a triangular distribution i.e. min, mean, max) into value stream simulation model and estimate some key performance metrics, output and production numbers.  If you know the actual distribution type like an exponential distribution or log normal (which are the most common) then you can enter that estimates required for these types of distributions.  Obtaining exponential and log linear values from employees might be difficult though). The model numbers are then compared to the actual numbers for various days and months throughout the year.
  3. We then assess the data to determine which estimates are likely off.  This is where experience is required.  I look to see in the model analytics, e.g. if resources are idle or unproductive or at capacity, bottlenecks occur and work accumulates, to determine the most likely spots where the data estimation maybe wrong. 
  4.  I verify the model analysis with the employees, revise the estimates ,and repeat steps 2-4. 


These steps are completed until the model outputs are close to the actual output values. 


Concluding Thoughts

Keep in mind that these are two of my favourite data collection methods, there are many other types of data collection techniques available, such as surveys, focus groups, file reviews. 

Next time when someone says we have no data, try saying, “Let’s create it” and use the methods I have proposed. 


You want to master our secret data collection technique? Register for our webinar.

Click me

Valuable Resources on Lean 6 Sigma

The following URLs provide great additional information on Lean 6 Sigma

Toppazzini and Lee Consulting Lean 6 Sigma Consulting  at -Lean Six Sigma Consulting

Linkedin Six Sigma Group at

ISixSigma web site at

ASQ web site at



Topics: sigma, lean, Performance Measurement, Lean Six Sigma, Assessment, Data Collection