Lies, damned lies and statistics.

How do we create honest statistics? How can we detect that statistics are subjective (or even downright dishonest)? For example: leaving out specific datasets (probably because they score badly) is a bad sign.

There's a whole academic field of making statistics and charts (and I already forgot most of that course). What theories, principles or guidelines should an OR researcher use in his statistics and charts?

Which chart types do you use when? For example: where and when would you apply candlestick charts?

asked 07 Jul '12, 07:55

Geoffrey%20De%20Smet's gravatar image

Geoffrey De ... ♦
3.6k32764
accept rate: 6%

edited 07 Jul '12, 07:55


My answer's going to lean more into Industrial Engineering (statistical quality control, engineering statistics, etc), but I'll try and keep the concepts as general as possible:

Describing the system (Descriptive Statistics)

(Usually applied when you don't really have a good understanding of the system, but have information/data gathered about it)

1) Here's usually where "common-sense", and asking deep questions about how the information/data was collected, to assess whether the data-gathering process might introduce bias/etc (or simply be downright useless) in describing certain features about the system

Without having to go very deeply into mathematical statistics, I think a very solid grasp of statistics 101 (population, sample, sampling method, etc) will help a great deal in helping you pose precise questions about - what is the population we're sampling from? what kind of sampling method is being used? how was the data gathered? etc

2) Having a good visualisation of the dataset in ways that is not only clear in describing what it was originally meant to describe, but is also helpful in suggesting correlations/questions between the various parameters that might not have been obvious otherwise

See FlowingData (by nathan yau) for some inspirations. Two examples that I happen to recall: (i) Chart Chooser, and (ii) Network Diagrams

Analyzing the system (Engineering Statistics)

(analysis of variance, design of experiments, etc)

1) Analysis of Variance (ANOVA) - a typical course in introductory statistics might cover up to statistical inference for 1/2 samples (hypothesis testing with variances un/known, z/t-tests, p-value, z-value, confidence intervals, alpha/beta (type 1/2 errors), etc). ANOVA is typically applied to decompose the variance in the data using sum-of-squares (linear regression) to indicate the variance in each component of the decomposition.

2) Experimental Design - commonly used in Industrial Engineering as an active statistical method, in which you perform a series of tests on the process or system, making changes in the inputs and observing the corresponding changes in the outputs. It is usually important when you have multiple controllable inputs, and what to know which of the input variables are more influential than others. If you're interested in this area, you should read up on factorial design/experiment, how to randomise your experimental runs, perform residual analysis, etc.

Monitoring the system (Time Series Analysis)

Typically employed as a passive statistical method, to monitor a process that is typically stable/'in-control', for indications of a change in the process (eg. shift in the process mean/variance, etc).

I'll just mention the Shewhart Control Chart here (as a class of charts that are simple to understand, and widely used in industry): It is commonly used as an online monitoring technique, in which you systematically sample parameters/products from a process, and use them as information for monitoring whether there's been a change in the process (mean/variance) and as estimates of the process itself (usually for process capability analysis - whether the existing process is 'good' enough to produce a sufficiently high ratio of non-defective goods that conform to a particular standard).

The reason I mention them - because they've been very effective in practice - mostly because they're easy to (i) visualize/interpret, and (ii) implement.


See also: Regression Analysis for Computational Results (fished from ORX's archives)

link

answered 07 Jul '12, 23:30

yeesian's gravatar image

yeesian
846210
accept rate: 3%

edited 07 Jul '12, 23:35

Great answer!

(08 Jul '12, 05:31) Geoffrey De ... ♦

I think one of the most honest charts in optimization is the performance profile. Even people who don't like it should be able to read it since its used quite a lot.

A great resource on "how to lie with charts" and in general on how to design charts to maximize data content (and minimize false impressions) is the classic Tufte book: The Visual Display of Quantitative Information. I'd say everybody in OR should have at least browsed through it, especially since its a quick read and very entertaining. Unfortunatly too many have not read it or ignore what they learned.

link

answered 31 Jul '12, 13:39

Philipp%20Christophel's gravatar image

Philipp Chri...
1.0k27
accept rate: 22%

First of all, it is a great question. While writing my thesis, I wonder several times about how to present results of the study. Probably my answer is not what you are looking for but I am using Google Chart Tools for creating flexible and nice-looking charts. For the results with three variables I am using Bubble Chart and other than that most of the time bar and/or line chart save my day.

I think results of academic studies should be available online to inspect and evaluate. For example, in one of the study I mention in my literature review, I couldn't able to reach data set of the study and asked to author, however, he stated that results are gone since he changed his computer long ago.

By the way, I really don't like pie charts since it seems very useless.

link

answered 07 Jul '12, 10:55

Sertalp%20Bilal's gravatar image

Sertalp Bilal
1566
accept rate: 33%

There are 2 meta-rules that help to asses the credibility of statistics:

  • They should include a link to information how to reproduce them or details on the methodology used if that's not practical (due to sensitive data etc).
  • Their medium should allow for non-censored public comments.

If these meta-rules are applied and the statistics are flawed, then it's more likely that a public comment will call that out (especially in a competitive environment).

link

answered 31 Jul '12, 03:37

Geoffrey%20De%20Smet's gravatar image

Geoffrey De ... ♦
3.6k32764
accept rate: 6%

edited 31 Jul '12, 03:52

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×8

Asked: 07 Jul '12, 07:55

Seen: 1,309 times

Last updated: 31 Jul '12, 13:39

OR-Exchange! Your site for questions, answers, and announcements about operations research.