WOLFRAM NOTEBOOK

WOLFRAM|DEMONSTRATIONS PROJECT

Calculating Sample Size

% confidence level

% confidence interval (e)

0.26

% accuracy

0.795

data size (population)

486000000

calculated sample size	10

Statistically, 10% of a population is enough to estimate the survey results of 100%. But if you have a huge dataset, such as 1 billion records, instead of looking at 10% of the population (which is still large), you can look for the optimal (minimum) amount of data to survey.

This standard equation defines the appropriate sample size (

) of people to use for a survey:

SS=

P(1-P)

It is very common to use this equation for population sizes of big data projects in order to define the appropriate sample of data that should be analyzed.

The parameters to define the sample size are:

Confidence level

: the precision required for the survey

Confidence interval

: the error tolerance for the survey,

-0.04≤e≤0.4

Accuracy

: the data quality or trustworthiness of the information in the data

Data size

: the total population (or number of records in the database)

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.

I understand and wish to continue anyway »