与R中的data.table聚合
练习包括通过因子的组合和R中的data.table来聚合值的数值向量.以下面的数据表为例: require (data.table) require (plyr) dtb <- data.table (cbind (expand.grid (month = rep (month.abb[1:3],each = 3),fac = letters[1:3]),value = rnorm (27))) 请注意,’month’和’fac’的每个独特组合都会出现三次.因此,当我尝试通过这两个因素平均值时,我应该期望一个包含9个唯一行的数据框: (agg1 <- ddply (dtb,c ("month","fac"),function (dfr) mean (dfr$value))) month fac V1 1 Jan a -0.36030953 2 Jan b -0.58444588 3 Jan c -0.15472876 4 Feb a -0.05674483 5 Feb b 0.26415972 6 Feb c -1.62346772 7 Mar a 0.24560510 8 Mar b 0.82548140 9 Mar c 0.18721114 但是,当与data.table聚合时,我会不断得到两个因素的每个冗余组合提供的结果: (agg2 <- dtb[,value := mean (value),by = list (month,fac)]) month fac value 1: Jan a -0.36030953 2: Jan a -0.36030953 3: Jan a -0.36030953 4: Feb a -0.05674483 5: Feb a -0.05674483 6: Feb a -0.05674483 7: Mar a 0.24560510 8: Mar a 0.24560510 9: Mar a 0.24560510 10: Jan b -0.58444588 11: Jan b -0.58444588 12: Jan b -0.58444588 13: Feb b 0.26415972 14: Feb b 0.26415972 15: Feb b 0.26415972 16: Mar b 0.82548140 17: Mar b 0.82548140 18: Mar b 0.82548140 19: Jan c -0.15472876 20: Jan c -0.15472876 21: Jan c -0.15472876 22: Feb c -1.62346772 23: Feb c -1.62346772 24: Feb c -1.62346772 25: Mar c 0.18721114 26: Mar c 0.18721114 27: Mar c 0.18721114 month fac value 是否有一种优雅的方法可以将这些结果折叠为每个独特的因子组合与数据表的一行? 问题(和推理)与聚合值的分配不仅仅是计算有关.如果你查看一个包含更多列而不仅仅是用于计算的列的data.table,则更容易观察到这一点. # Therefore,let's add a new column dtb[,newCol := LETTERS[seq(length(value))] 请注意,如果我们只想输出计算值,那么RHS上的表达式就好了. # This gives the expected results dtb[,mean (value),fac)] # This on the other hand assigns the respective values to *each* row dtb[,fac)] 换句话说,数据被子集化为仅返回唯一值. 然后将此data.table复制到agg仍然会通过所有行发送. 因此,如果要复制到新表,只能从原始表中那些唯一的行,即可 a. wrap the original table inside `unique()` before assigning it b. assign the table,above,that is returned when you are not assigning the RHS output (which is what @Arun suggested) 一个例子.将会: agg2 <- unique(dtb[,fac)]) 以下示例可能有助于说明. (你需要复制粘贴,因为省略了输出) # SAMPLE DATA,as above library(data.table) dtb.bak <- data.table (expand.grid (month = rep (month.abb[1:3],value = rnorm (27)) # METHOD 1 # #------------# dtb <- copy(dtb.bak) # restore,from sample data. dtb[,fac)] dtb # this is what you would like to assign unique(dtb) # METHOD 2 # #------------# dtb <- copy(dtb.bak) # restore,from sample data. # this is what you would like to assign # next two lines are the same,only differnce is column name dtb[,fac)] dtb[,list("mean" = mean (value)),fac)] # quote marks added for clarity # dtb is unchanged. dtb # NOW COMPARE THE SAME TWO METHODS,BUT IF THERE IS AN ADDITIOANL COLUMN dtb.bak[,newCol := rep(c("A","B","A"),length(value)/3)] dtb1 <- copy(dtb.bak) # restore,from sample data. dtb2 <- copy(dtb.bak) # restore,from sample data. # Method 1 dtb1[,fac)] dtb1 unique(dtb1) # METHOD 2 # dtb2[,fac)] # quote marks added for clarity dtb2 # METHOD 2,WITH ADDED COLUMNS IN list() in `j` dtb2[,list("mean" = mean (value),newCol),fac)] # quote marks added for clarity # notice this has more columns thatn unique(dtb1) (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |