|
Overview: • About Miller • Miller in 10 minutes • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • FAQ • Cookbook part 1 • Cookbook part 2 • Cookbook part 3 • Data-diving examples • Manpage • Reference • Reference: Verbs • Reference: DSL • Documents by release • Installation, portability, dependencies, and testing Background: • Why? • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo |
• Mean without/with oosvars • Keyed mean without/with oosvars • Variance and standard deviation without/with oosvars • Min/max without/with oosvars • Keyed min/max without/with oosvars • Delta without/with oosvars • Keyed delta without/with oosvars • Exponentially weighted moving averages without/with oosvars Overview
One of Miller’s strengths is its compact notation: for example, given input of the form
$ head -n 5 ../data/medium a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 $ mlr --oxtab stats1 -a sum -f x ../data/medium x_sum 4986.019682 $ mlr --opprint stats1 -a sum -f x -g b ../data/medium b x_sum pan 965.763670 wye 1023.548470 zee 979.742016 eks 1016.772857 hat 1000.192668
$ mlr --oxtab put -q '
@x_sum += $x;
end {
emit @x_sum
}
' data/medium
x_sum 4986.019682
$ mlr --opprint put -q '
@x_sum[$b] += $x;
end {
emit @x_sum, "b"
}
' data/medium
b x_sum
pan 965.763670
wye 1023.548470
zee 979.742016
eks 1016.772857
hat 1000.192668
Mean without/with oosvars$ mlr --opprint stats1 -a mean -f x data/medium x_mean 0.498602
$ mlr --opprint put -q '
@x_sum += $x;
@x_count += 1;
end {
@x_mean = @x_sum / @x_count;
emit @x_mean
}
' data/medium
x_mean
0.498602
Keyed mean without/with oosvars$ mlr --opprint stats1 -a mean -f x -g a,b data/medium a b x_mean pan pan 0.513314 eks pan 0.485076 wye wye 0.491501 eks wye 0.483895 wye pan 0.499612 zee pan 0.519830 eks zee 0.495463 zee wye 0.514267 hat wye 0.493813 pan wye 0.502362 zee eks 0.488393 hat zee 0.509999 hat eks 0.485879 wye hat 0.497730 pan eks 0.503672 eks eks 0.522799 hat hat 0.479931 hat pan 0.464336 zee zee 0.512756 pan hat 0.492141 pan zee 0.496604 zee hat 0.467726 wye zee 0.505907 eks hat 0.500679 wye eks 0.530604
$ mlr --opprint put -q '
@x_sum[$a][$b] += $x;
@x_count[$a][$b] += 1;
end{
for ((a, b), v in @x_sum) {
@x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
}
emit @x_mean, "a", "b"
}
' data/medium
a b x_mean
pan pan 0.513314
pan wye 0.502362
pan eks 0.503672
pan hat 0.492141
pan zee 0.496604
eks pan 0.485076
eks wye 0.483895
eks zee 0.495463
eks eks 0.522799
eks hat 0.500679
wye wye 0.491501
wye pan 0.499612
wye hat 0.497730
wye zee 0.505907
wye eks 0.530604
zee pan 0.519830
zee wye 0.514267
zee eks 0.488393
zee zee 0.512756
zee hat 0.467726
hat wye 0.493813
hat zee 0.509999
hat eks 0.485879
hat hat 0.479931
hat pan 0.464336
Variance and standard deviation without/with oosvars$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium x_count 10000 x_sum 4986.019682 x_mean 0.498602 x_var 0.084270 x_stddev 0.290293
$ cat variance.mlr
@n += 1;
@sumx += $x;
@sumx2 += $x**2;
end {
@mean = @sumx / @n;
@var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
@stddev = sqrt(@var);
emitf @n, @sumx, @sumx2, @mean, @var, @stddev
}
$ mlr --oxtab put -q -f variance.mlr data/medium n 10000 sumx 4986.019682 sumx2 3328.652400 mean 0.498602 var 0.084270 stddev 0.290293 Min/max without/with oosvars$ mlr --oxtab stats1 -a min,max -f x data/medium x_min 0.000045 x_max 0.999953
$ mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium
x_min 0.000045
x_max 0.999953
Keyed min/max without/with oosvars$ mlr --opprint stats1 -a min,max -f x -g a data/medium a x_min x_max pan 0.000204 0.999403 eks 0.000692 0.998811 wye 0.000187 0.999823 zee 0.000549 0.999490 hat 0.000045 0.999953
$ mlr --opprint --from data/medium put -q '
@min[$a] = min(@min[$a], $x);
@max[$a] = max(@max[$a], $x);
end{
emit (@min, @max), "a";
}
'
a min max
pan 0.000204 0.999403
eks 0.000692 0.998811
wye 0.000187 0.999823
zee 0.000549 0.999490
hat 0.000045 0.999953
Delta without/with oosvars$ mlr --opprint step -a delta -f x data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0.411890 wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077 eks wye 4 0.38139939387114097 0.13418874328430463 0.176796 wye pan 5 0.5732889198020006 0.8636244699032729 0.191890 $ mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0.411890 wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077 eks wye 4 0.38139939387114097 0.13418874328430463 0.176796 wye pan 5 0.5732889198020006 0.8636244699032729 0.191890 Keyed delta without/with oosvars$ mlr --opprint step -a delta -f x -g a data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0 wye wye 3 0.20460330576630303 0.33831852551664776 0 eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281 wye pan 5 0.5732889198020006 0.8636244699032729 0.368686 $ mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small a b i x y x_delta pan pan 1 0.3467901443380824 0.7268028627434533 0 eks pan 2 0.7586799647899636 0.5221511083334797 0 wye wye 3 0.20460330576630303 0.33831852551664776 0 eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281 wye pan 5 0.5732889198020006 0.8636244699032729 0.368686 Exponentially weighted moving averages without/with oosvars$ mlr --opprint step -a ewma -d 0.1 -f x data/small a b i x y x_ewma_0.1 pan pan 1 0.3467901443380824 0.7268028627434533 0.346790 eks pan 2 0.7586799647899636 0.5221511083334797 0.387979 wye wye 3 0.20460330576630303 0.33831852551664776 0.369642 eks wye 4 0.38139939387114097 0.13418874328430463 0.370817 wye pan 5 0.5732889198020006 0.8636244699032729 0.391064
$ mlr --opprint put '
begin{ @a=0.1 };
$e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
@e=$e
' data/small
a b i x y e
pan pan 1 0.3467901443380824 0.7268028627434533 0.346790
eks pan 2 0.7586799647899636 0.5221511083334797 0.387979
wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
wye pan 5 0.5732889198020006 0.8636244699032729 0.391064
|