sql-server – 通过在T-SQL中设置种子,从均匀分布生成随机值
|
我想从T-SQL中给定数据表的每行的均值分布生成一个随机值,其中mean = 0,标准devation = 1.另外,我想设置一个种子,以确保分析的重现性.以下是不起作用的想法: >使用具有声明数字的函数RAND()不符合此目标:为数据集的每一行生成相同的随机值. SELECT ABS(CAST(CAST(NEWID()AS VARBINARY)AS INT))AS [RandomNumber] 由于不可重复,因此也不能解决问题. 编辑: 表现很重要,因为我的桌子有数亿条记录. 解决方法这里的主要问题IMHO是怎么看到“可重复性”?或者不同的问题:什么驱动着随机性?只要数据不变,我可以设想一个解决方案,每个运行的每个记录都附加相同的随机数.但是,如果数据发生变化,预计会发生什么?为了乐趣,我对一个(不是非常有代表性的)100万行的测试表进行了以下测试: -- seed
SELECT Rand(0)
-- will show the same random number for EVERY record
SELECT Number,blah = Convert(varchar(100),NewID()),random = Rand()
INTO #test
FROM master.dbo.fn_int_list(1,1000000)
CREATE UNIQUE CLUSTERED INDEX uq0_test ON #test (Number)
SET NOCOUNT ON
GO
DECLARE @start_time datetime = CURRENT_TIMESTAMP,@c_number int
-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !
-- seed
SELECT @c_number = Rand(0)
-- update 1 by 1
DECLARE cursor_no_transaction CURSOR LOCAL STATIC
FOR SELECT Number
FROM #test
ORDER BY Number
OPEN cursor_no_transaction
FETCH NEXT FROM cursor_no_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE #test
SET random = Rand()
WHERE Number = @c_number
FETCH NEXT FROM cursor_no_transaction INTO @c_number
END
CLOSE cursor_no_transaction
DEALLOCATE cursor_no_transaction
PRINT 'Time needed (no transaction) : ' + Convert(nvarchar(100),DateDiff(ms,@start_time,CURRENT_TIMESTAMP)) + ' ms.'
SELECT _avg = AVG(random),_stdev = STDEV(random) FROM #test
GO
DECLARE @start_time datetime = CURRENT_TIMESTAMP,@c_number int
BEGIN TRANSACTION
-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !
-- seed
SELECT @c_number = Rand(0)
-- update 1 by 1 but all of it inside 1 single transaction
DECLARE cursor_single_transaction CURSOR LOCAL STATIC
FOR SELECT Number
FROM #test
ORDER BY Number
OPEN cursor_single_transaction
FETCH NEXT FROM cursor_single_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE #test
SET random = Rand()
WHERE Number = @c_number
FETCH NEXT FROM cursor_single_transaction INTO @c_number
END
CLOSE cursor_single_transaction
DEALLOCATE cursor_single_transaction
COMMIT TRANSACTION
PRINT 'Time needed (single transaction) : ' + Convert(nvarchar(100),_stdev = STDEV(random) FROM #test
GO
DECLARE @start_time datetime = CURRENT_TIMESTAMP
-- update each record (single operation),use the Number column to reseed the Rand() function for every record
UPDATE #test
SET random = Rand(Number)
PRINT 'Time needed Rand(Number) : ' + Convert(nvarchar(100),use 'a bunch of fields' to reseed the Rand() function for every record
UPDATE #test
SET random = Rand(BINARY_CHECKSUM(Number,blah))
PRINT 'Time needed Rand(BINARY_CHECKSUM(Number,blah)) : ' + Convert(nvarchar(100),_stdev = STDEV(random) FROM #test
结果或多或少是如预期的: Time needed (no transaction) : 24570 ms. _avg _stdev ---------------------- ---------------------- 0.499630943538644 0.288686960086461 Time needed (single transaction) : 14813 ms. _avg _stdev ---------------------- ---------------------- 0.499630943538646 0.288686960086461 Time needed Rand(Number) : 1203 ms. _avg _stdev ---------------------- ---------------------- 0.499407423620328 0.291093824839539 Time needed Rand(BINARY_CHECKSUM(Number,blah)) : 1250 ms. _avg _stdev ---------------------- ---------------------- 0.499715398881586 0.288579510523627 所有这些都是“可重复的”,问题是“可重复”是指你想要的意思.我已经坚持使用AVG()和STDEV()来获取一个粗略的分布概念,我会留给你看看他们是否真的适合账单(如果不是,如何改进它) 1百万行的1.2秒对于100万行IMHO来说听起来不太好.也就是说,如果您的表格包含额外的列,它将占用更多的空间,因此需要更多的时间! 希望这能让你开始… (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
