sql-server – 运行总计数?
|
正如标题所示,我需要一些帮助来获得T-SQL中的运行总计.问题是我需要做的总和是计数的总和: sum(count (distinct (customers))) 如果我单独运行计数,结果将是: Day | CountCustomers ---------------------- 5/1 | 1 5/2 | 0 5/3 | 5 我需要输出总和为: Day | RunningTotalCustomers ---------------------- 5/1 | 1 5/2 | 1 5/3 | 6 我在使用coalesce方法之前已经完成了运行总计,但从未计算过.我现在不知道如何做到这一点. 解决方法您可以通过以下几种方法进行比较.首先让我们设置一个包含一些虚拟数据的表.我用sys.all_columns中的一堆随机数据来填充它.嗯,这是随机的 – 我确保日期是连续的(这对于其中一个答案来说真的很重要).CREATE TABLE dbo.Hits(Day SMALLDATETIME,CustomerID INT); CREATE CLUSTERED INDEX x ON dbo.Hits([Day]); INSERT dbo.Hits SELECT TOP (5000) DATEADD(DAY,r,'20120501'),COALESCE(ASCII(SUBSTRING(name,s,1)),86) FROM (SELECT name,r = ROW_NUMBER() OVER (ORDER BY name)/10,s = CONVERT(INT,RIGHT(CONVERT(VARCHAR(20),[object_id]),1)) FROM sys.all_columns) AS x; SELECT Earliest_Day = MIN([Day]),Latest_Day = MAX([Day]),Unique_Days = DATEDIFF(DAY,MIN([Day]),MAX([Day])) + 1,Total_Rows = COUNT(*) FROM dbo.Hits; 结果: Earliest_Day Latest_Day Unique_Days Total_Days ------------------- ------------------- ----------- ---------- 2012-05-01 00:00:00 2013-09-13 00:00:00 501 5000 数据看起来像这样(5000行) – 但在您的系统上看起来会略有不同,具体取决于版本和构建#: Day CustomerID ------------------- --- 2012-05-01 00:00:00 95 2012-05-01 00:00:00 97 2012-05-01 00:00:00 97 2012-05-01 00:00:00 117 2012-05-01 00:00:00 100 ... 2012-05-02 00:00:00 110 2012-05-02 00:00:00 110 2012-05-02 00:00:00 95 ... 并且运行总计结果应该如下所示(501行): Day c rt ------------------- -- -- 2012-05-01 00:00:00 6 6 2012-05-02 00:00:00 5 11 2012-05-03 00:00:00 4 15 2012-05-04 00:00:00 7 22 2012-05-05 00:00:00 6 28 ... 所以我要比较的方法是: >“自我加入” – 基于集合的纯粹主义方法 自联接 这是人们在警告你远离游标时告诉你这样做的方式,因为“基于集合总是更快”.在最近的一些实验中,我发现光标超越了这个解决方案. ;WITH g AS
(
SELECT [Day],c = COUNT(DISTINCT CustomerID)
FROM dbo.Hits
GROUP BY [Day]
)
SELECT g.[Day],g.c,rt = SUM(g2.c)
FROM g INNER JOIN g AS g2
ON g.[Day] >= g2.[Day]
GROUP BY g.[Day],g.c
ORDER BY g.[Day];
带日期的递归cte 提醒 – 这依赖于连续日期(无间隙),最多10000级递归,并且您知道您感兴趣的范围的开始日期(设置锚点).当然,您可以使用子查询动态设置锚点,但我想保持简单. ;WITH g AS
(
SELECT [Day],c = COUNT(DISTINCT CustomerID)
FROM dbo.Hits
GROUP BY [Day]
),x AS
(
SELECT [Day],c,rt = c
FROM g
WHERE [Day] = '20120501'
UNION ALL
SELECT g.[Day],x.rt + g.c
FROM x INNER JOIN g
ON g.[Day] = DATEADD(DAY,1,x.[Day])
)
SELECT [Day],rt
FROM x
ORDER BY [Day]
OPTION (MAXRECURSION 10000);
与row_number的递归cte Row_number计算在这里略贵.同样,这支持最大递归级别10000,但您不需要分配锚点. ;WITH g AS
(
SELECT [Day],rn = ROW_NUMBER() OVER (ORDER BY DAY),rn,rt = c
FROM g
WHERE rn = 1
UNION ALL
SELECT g.[Day],g.rn,x.rt + g.c
FROM x INNER JOIN g
ON g.rn = x.rn + 1
)
SELECT [Day],rt
FROM x
ORDER BY [Day]
OPTION (MAXRECURSION 10000);
带临时表的递归cte 正如建议的那样,从Mikael的答案中窃取,将其包含在测试中. CREATE TABLE #Hits
(
rn INT PRIMARY KEY,c INT,[Day] SMALLDATETIME
);
INSERT INTO #Hits (rn,Day)
SELECT ROW_NUMBER() OVER (ORDER BY DAY),COUNT(DISTINCT CustomerID),[Day]
FROM dbo.Hits
GROUP BY [Day];
WITH x AS
(
SELECT [Day],rt = c
FROM #Hits as c
WHERE rn = 1
UNION ALL
SELECT g.[Day],x.rt + g.c
FROM x INNER JOIN #Hits as g
ON g.rn = x.rn + 1
)
SELECT [Day],rt
FROM x
ORDER BY [Day]
OPTION (MAXRECURSION 10000);
DROP TABLE #Hits;
古怪的更新 我再一次只是为了完整而包括这个;我个人不会依赖这个解决方案,因为正如我在另一个答案中提到的,这种方法根本不能保证工作,并且可能在SQL Server的未来版本中完全破坏. (我正在尽力强迫SQL Server遵守我想要的顺序,使用提示进行索引选择.) CREATE TABLE #x([Day] SMALLDATETIME,rt INT);
CREATE UNIQUE CLUSTERED INDEX x ON #x([Day]);
INSERT #x([Day],c)
SELECT [Day],c = COUNT(DISTINCT CustomerID)
FROM dbo.Hits
GROUP BY [Day]
ORDER BY [Day];
DECLARE @rt1 INT;
SET @rt1 = 0;
UPDATE #x
SET @rt1 = rt = @rt1 + c
FROM #x WITH (INDEX = x);
SELECT [Day],rt FROM #x ORDER BY [Day];
DROP TABLE #x;
光标 “小心,这里有游标!游标是邪恶的!你应该不惜一切代价避免使用游标!”不,那不是我说话,这只是我听到的很多东西.与流行的观点相反,在某些情况下游标是合适的. CREATE TABLE #x2([Day] SMALLDATETIME,rt INT);
CREATE UNIQUE CLUSTERED INDEX x ON #x2([Day]);
INSERT #x2([Day],COUNT(DISTINCT CustomerID)
FROM dbo.Hits
GROUP BY [Day]
ORDER BY [Day];
DECLARE @rt2 INT,@d SMALLDATETIME,@c INT;
SET @rt2 = 0;
DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR SELECT [Day],c FROM #x2 ORDER BY [Day];
OPEN c;
FETCH NEXT FROM c INTO @d,@c;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @rt2 = @rt2 + @c;
UPDATE #x2 SET rt = @rt2 WHERE [Day] = @d;
FETCH NEXT FROM c INTO @d,@c;
END
SELECT [Day],rt FROM #x2 ORDER BY [Day];
DROP TABLE #x2;
SQL Server 2012 如果您使用的是最新版本的SQL Server,对窗口功能的增强使我们可以轻松计算运行总计而无需自加入的指数成本(SUM在一次通过中计算),CTE的复杂性(包括要求)对于表现更好的CTE的连续行,不支持的古怪更新和禁止的游标.只是要警惕使用RANGE和ROWS之间的区别,或者根本不指定 – 只有ROWS避免了磁盘上的磁盘,否则会严重影响性能. ;WITH g AS ( SELECT [Day],rt = SUM(c) OVER (ORDER BY [Day] ROWS UNBOUNDED PRECEDING) FROM g ORDER BY g.[Day]; 性能比较 我采用了每种方法并使用以下方法将其包装成批处理: SELECT SYSUTCDATETIME(); GO DBCC DROPCLEANBUFFERS;DBCC FREEPROCCACHE; -- query here GO 10 SELECT SYSUTCDATETIME(); 以下是总持续时间的结果,以毫秒为单位(请记住,这也包括每次DBCC命令): method run 1 run 2 ----------------------------- -------- -------- self-join 1296 ms 1357 ms -- "supported" non-SQL 2012 winner recursive cte with dates 1655 ms 1516 ms recursive cte with row_number 19747 ms 19630 ms recursive cte with #temp table 1624 ms 1329 ms quirky update 880 ms 1030 ms -- non-SQL 2012 winner cursor 1962 ms 1850 ms SQL Server 2012 847 ms 917 ms -- winner if SQL 2012 available 我没有DBCC命令就重新做了: method run 1 run 2 ----------------------------- -------- -------- self-join 1272 ms 1309 ms -- "supported" non-SQL 2012 winner recursive cte with dates 1247 ms 1593 ms recursive cte with row_number 18646 ms 18803 ms recursive cte with #temp table 1340 ms 1564 ms quirky update 1024 ms 1116 ms -- non-SQL 2012 winner cursor 1969 ms 1835 ms SQL Server 2012 600 ms 569 ms -- winner if SQL 2012 available 删除DBCC和循环,只测量一个原始迭代: method run 1 run 2 ----------------------------- -------- -------- self-join 313 ms 242 ms recursive cte with dates 217 ms 217 ms recursive cte with row_number 2114 ms 1976 ms recursive cte with #temp table 83 ms 116 ms -- "supported" non-SQL 2012 winner quirky update 86 ms 85 ms -- non-SQL 2012 winner cursor 1060 ms 983 ms SQL Server 2012 68 ms 40 ms -- winner if SQL 2012 available (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
