Statistical design Tuomas Selander 28.8.2014
Introduction Biostatistician Work area KYS-erva KYS, Jyväskylä, Joensuu, Mikkeli, Savonlinna Work tasks Statistical methods, selection and quiding Data analysis Statistical design sample size calculations and randomization 3.9.2014 2
Introduction Sample size calculation Why it is important to do? Examples and how to do Example from register study 3.9.2014 3
Sample size calculations Appropriate sample size permits to show hypothesis is true Publications and fundings will require why your sample is that size Costs/effort too small or big sample is wasting need to find balance You can find calculators from Google, write sample size calculator 3.9.2014 4
Sample size calculations Reguirements these must set p-value = probability that treatment have no effect = 0,05 power = probability that treatment have effect = 0,80 these definitions are unofficials Reguirements these must assume the most commons are proportions and means proportions example = how many get healed means example = how much BMI decrease researcher must have knowledge what will happen, find from literature or do the pilot study 3.9.2014 5
Sample size calculations Steps in action 1. Assumptions 2. Set hypothesis 3. Select statistical test or method 4. Set p-value and power 5. Solve N 3.9.2014 6
Sample size calculations Example Two medicines, assumption is that medicine 1 heals 85 % and medicine 2 75 %. Set p-value to 0.05 and power 0.80. Hypothesis is two sided. Proportions are not equals. These assumptions says that count of patients is 250 per medicine group. If hypothesis is one sided i.e. medicine 1 is better than medicine 2 than expected sample size is 197 group. 3.9.2014 7
Sample size calculations 3.9.2014 8
Sample size calculations 3.9.2014 9
Sample size calculations Example Researcher wants to show that babys from the first birth are lighter than from the second birth in year 2015. Birth register 1987-2011 says that first childs weight is 3413 g and second childs 3598 g. Standard deviations are 581 g in both groups. Set p-value 0.05 and power 0.80. Under these assumptions the sample size is 158 per birth group, total 316. 3.9.2014 10
Sample size calculations 3.9.2014 11
Example from registerstudy Delivery Want to study association between episiotomy and OASIS. OASIS is damage of sphincter in child birth. Risk factors are babys size, age of mother, length of active second stage, mode of delivery. Episiotomy means that midwife do surgery for perineal. This surgery might increase or decrease risk of OASIS. There is no consensus from this operation. Data is birth register, every first birth events 2004-2011, N=130981. 3.9.2014 12
Example from registerstudy Alkuperäinen aineisto Logistisen regression kertoimet Parametri Kerroin 95 % LV ala 95 % LV ylä ikä <=19 1,00 ikä 20-29 1,94 1,46 2,59 ikä 30-39 2,44 1,82 3,27 ikä >=40 2,22 1,38 3,58 paino <=2999 1,00 paino 3000-3499 2,04 1,66 2,52 paino 3500-3999 3,04 2,48 3,73 paino >= 4000 4,81 3,88 5,98 synnytystapa alatie 1,00 synnytystapa perätila 0,85 0,42 1,70 synnytystapa pihti 4,85 2,68 8,77 synnytystapa imukuppi 1,68 1,50 1,88 kesto <=30 1,00 kesto 31-69 1,30 1,17 1,45 kesto 70-169 1,34 1,18 1,52 kesto >=170 1,39 1,08 1,78 episiotomia 0,88 0,80 0,97 3.9.2014 13
Example from registerstudy 3.9.2014 14
Example from registerstudy Distributions differs between episiotomy groups Balance groups by matching. The purpose is to find episiotomy+ women matched pair from episiotomy- group. Matching criterion is exactly similar case and conditions are age of woman, babys weight, mode of delivery and duration of second stage. This trick reduce sample size because there is no pair for every episiotomy+ woman. Matching reduce bias which proceed from backround variables. Episiotomy have done typicaly for older women whose babys are heavier, second stage lasts longer and mode of delivery is different. Interactions are complex between variables. 3.9.2014 15
Example from registerstudy 3.9.2014 16
Example from registerstudy Alkuperäinen aineisto Logistisen regression kertoimet Mätsätty aineisto Logistisen regression kertoimet Parametri Kerroin 95 % LV ala 95 % LV ylä Parametri Kerroin 95 % LV ala 95 % LV ylä ikä <=19 1,00 ikä <=19 1,00 ikä 20-29 1,94 1,46 2,59 ikä 20-29 1,79 1,27 2,52 ikä 30-39 2,44 1,82 3,27 ikä 30-39 2,42 1,73 3,39 ikä >=40 2,22 1,38 3,58 ikä >=40 3,04 1,78 4,26 paino <=2999 1,00 paino <=2999 1,00 paino 3000-3499 2,04 1,66 2,52 paino 3000-3499 2,11 1,62 2,74 paino 3500-3999 3,04 2,48 3,73 paino 3500-3999 3,11 2,41 4,02 paino >= 4000 4,81 3,88 5,98 paino >= 4000 5,03 3,84 6,60 synnytystapa alatie 1,00 synnytystapa alatie 1,00 synnytystapa perätila 0,85 0,42 1,70 synnytystapa perätila 0,85 0,21 3,43 synnytystapa pihti 4,85 2,68 8,77 synnytystapa pihti 10,64 4,11 27,54 synnytystapa imukuppi 1,68 1,50 1,88 synnytystapa imukuppi 1,23 1,03 1,47 kesto <=30 1,00 kesto <=30 1,00 kesto 31-69 1,30 1,17 1,45 kesto 31-69 1,39 1,22 1,58 kesto 70-169 1,34 1,18 1,52 kesto 70-169 1,59 1,36 1,85 kesto >=170 1,39 1,08 1,78 kesto >=170 1,79 1,31 2,45 episiotomia 0,88 0,80 0,97 episiotomia 0,77 0,69 0,86 3.9.2014 17
Example from registerstudy Reviewer BMJ Data is not validated Statistical analysis is hard to read Reviewer PLOS ONE This study is based on a highly reliable medical register Statistical analysis is easy to read 3.9.2014 18
Sample size calculations Conclusion It is important to do sample size calculations 3.9.2014 19
Contact tuomas.selander@kuh.fi 044-7179583 3.9.2014 20