Dealing with Nonnormal Data
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Dealing with Nonnormal Data
 This topic has 17 replies, 5 voices, and was last updated 3 years, 7 months ago by Dave Franco.

AuthorPosts

April 7, 2017 at 11:41 pm #55680
balasubramanianGuest@b1a5l9a2 Include @b1a5l9a2 in your post and this person will
be notified via email.Hi,
I have observed a data in one of the process of my company . The Normality test shows P value – 0.002 . Since it is non normal data , How to deal with this ? . Even I tried Box cox transformation of the data , the resultant data was also not normal . When I tried through a software StatAssist , it showed the data approximately follows Burr distribution .Actually the data is of Hardness of Powder metallurgy part , where there is excessive within part variation on account of pores in the part and this is natural . Can pl. anyone suggest how to handle this .
Regards,
Bala0April 8, 2017 at 5:30 am #201141
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.You will have to tell us what it is that you are trying to do with your data before anyone can offer much in the way of advice.
Control charting? – nonnormality is not an issue.
Process capability? – there are methods (Chapter 8 in Measuring Process Capability Bothe)
to handle this situation.ttest or ANOVA? – no problem – the ttest is robust to nonnormal data as is ANOVA,
Regression? – no issues here – approximate normality is an issue with residuals only.
Concern that the process is out of control? – Maybe, maybe not – hardness data will be nonnormal even when everything is in control because it is from a process that is bounded. Besides, there is no conncection between data that is normally distributed and whether or not the process is in control.
0April 8, 2017 at 6:03 am #201142
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.Can you post a picture of the histogram for the data?
0April 8, 2017 at 10:07 am #201143
b1a5l9a2Participant@b1a5l9a2 Include @b1a5l9a2 in your post and this person will
be notified via email.Hi ,
I have attached the histogram . I am trying to take process capability study in this process.
Can pl. advise me the methodology for this .0April 8, 2017 at 11:45 am #201144
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.What kind of a sample size are we looking at? If that is a small sample then the fact that there is a gap between what looks to be about 625650 probably isn’t much to worry about and the data looks to be normal enough to just press on with a capability measure calculation.
On the other hand, if that histogram represents hundreds of samples then there are some intersting possibilities. You could have a bimodal process with some curious process behavior around 625.
0April 8, 2017 at 6:07 pm #201145
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Nice gathering of data…the first step to success. :)
0April 9, 2017 at 7:03 am #201146
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.concur with @rbutler.
0April 9, 2017 at 7:06 am #201147
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.Also, if you are using Minitab, when you do the normality evaluation, do you get a plot like that below? If so, can you post that as well?
0April 9, 2017 at 7:58 am #201148
b1a5l9a2Participant@b1a5l9a2 Include @b1a5l9a2 in your post and this person will
be notified via email.Hi ,
pl. find the attached normal probability plot . Actually i have taken almost 100 nos sample.
1 no of sample randomly selected in each shift on each day for one month .0April 9, 2017 at 1:37 pm #201149
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.@b1a5l9a2 – OK. We’re getting closer. Can you observe, or ask the workers, if there is adjustment going on to bring the value back to nominal? It looks to me that the lower side is happening randomly, but when the values get to the upper side, an adjustment is made to get the value back to target.
If this is the case, then a fundamental premise is being violated in evaluating normality – that of outside adjustment of the data.
Looking at your probability plot, we use something called the “fat pencil test.” Back when these graphs were created by hand, one would take the pencil used and lay it over the data. If the pencil covered the data points, you could be fairly confident of normality. Now, with statistical tests able to calculate probabilities, we tend to rely on them. However, the statistics are susceptible to individual points which can influence the statistics that visual examination would call “close enough.”
As @rbutler states, the question as to normality depends on the use of the data. Many statistical tests are robust to nonnormality, particularly when the data is similar to what you have presented.
If I were mentoring you as one of my belts, I would have you check on the adjustment. If that’s happening, then I would go on and accept normality based on the histogram and prob plot. If not, then I would check on the sensitivity of the stat test that I’m looking to apply and see if it is robust to nonnormality, and if so, then proceed. If it is sensitive to normality, then I would take some more data to ensure I have a full and complete picture. Even at 100 data points, you may have only captured one side of the distribution and over more time/data it may fill out.
Hope this helps.
0April 10, 2017 at 9:14 am #201158
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Don’t forget….are the data points giving enough precision, use an MSA to check.
However, use your process map and gather data on X’s in the process and see if there’s an explanation/confirmation of the second grouping of data on the right.
0April 10, 2017 at 2:21 pm #201168When looking at nonnormal data, rule out a few things before trying to calculate the process capability (and these are good general guidelines which can be easily forgotten if the data just so happens to be normally distributed):
1. Are you looking at different levels of an X being captured? Run a dot plot, and SPC, and a few other graphs, research the process. With 100 samples, and the data looking bimodal, you’ll want to rule this out first. I almost got “burned” by this once. I just wanted the probability of exceeding the specification limit and was so eager for “just the answer” that I initially missed that there were two separate behaviors going on in my data: normal conditions and when there were special events going on at the company.
2. Perhaps the process is unstable? Stability is a requirement of most distributions. An SPC chart can help you determine if you are seeing random noise or potentially special cause noise (remember, the data needs to be in time sequence, if not you can only use Test 1).
3. Data is not truly continuous. Histograms are a tough tool to use to detect this specific measurement issue. Run a dot plot. If the data stacks in nice bins, then this may be some form of attribute data.
4. Perhaps the data is just naturally nonnormally distributed? But I would check on the other 3 conditions first to be safe. If you still feel it is naturally nonnormally occurring data, there are a wide variety of other distributions besides the normal distribution and transformations, that may be suitable models.
As a last resort, I’ve seen some people convert the data to pass/fail data and treat it as a binomial process capability. This is not my favorite and there are arguments fore and against this approach. But, I’d rule the other conditions out and truly understand why the data was nonnormally distributed before considering this strategy.
0May 12, 2017 at 4:50 pm #201362If you’re using Minitab, I’d recommend trying the following tool.
Stat > Quality Tools > Individual distribution identificationThis tool will check your data against 14 distribution and 2 transformations. It might provide some insights.
0May 16, 2017 at 6:08 am #201390
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.it’s a great tool! I still remember when it showed up and I’m like….how did people live without this cool tool
0May 16, 2017 at 7:47 pm #201411
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.You can also use a distribution fitting tool in CrystalBall.
0May 29, 2017 at 7:02 pm #201514But some times this tool is not sufficient and any distribution or transformation have a pvalue > alfa.
And this case you need to discretize the data and calculate inthe hand, but for our luck you will find excel tables in the internet.
And if you have more than one distribution or transformation with pvalue > alfa use with the less AD (Anderson Darling number).0May 29, 2017 at 7:05 pm #201515But this tool is used only to do the capability of no normal data.
0February 25, 2018 at 11:22 am #202305Please check the below plot and let us know what you’re trying to achieve and what data you have ?
0 
AuthorPosts
You must be logged in to reply to this topic.