当前位置:喜价首页攻略日用百货Tick Time精华值得入手吗?Tick Time精华好用吗

Tick Time精华值得入手吗?Tick Time精华好用吗

数据清洗和理解是数据科学中的基础工作,重要而繁琐,经常占据了80%的工作量,在量化交易中也不例外。本文介绍如何利用Clickhouse数据仓库进行高效的数据清洗工作,以Tick行情处理为例,重点谈谈分析函数的使用。一、分析函数概况1.1 与


数据清洗和理解是数据科学中的基础工作,重要而繁琐,经常占据了80%的工作量,在量化交易中也不例外。本文介绍如何利用Clickhouse数据仓库进行高效的数据清洗工作,以Tick行情处理为例,重点谈谈分析函数的使用。

一、分析函数概况

1.1 与聚合函数的区别

说到分析函数,就不得不提到聚合函数。聚合函数是汇总计算返回一个计算结果,而分析函数会根据表的行数,每行返回一个计算结果。

  • 聚合函数特点:GROUP BY 分组,统计组内的max、min、std等统计值,赋予到聚合组中
  • 分析函数特点:PARTITION BY 分组,统计组内的max、min、std等统计值,赋予到当前行中

很明显,分析函数有“序”的概念,通过分组的“序”获取上下文指定范围,进而获取聚合值。

1.2 窗口函数的基本用法

语法格式:

<窗口函数> OVER ([PARTITION BY <列清单>] ORDER BY <排序用列清单>)

1.2.1 计算排名

计算一分钟K线各个股票的成交额从大到小排名:

select datetime, order_book_id, RANK() OVER (PARTITION BY datetime ORDER BY total_turnover desc) as deal_rankfrom rqdata.stock_kline_1m where date='2023-01-11' and datetime<='2023-01-11 09:32:00' and order_book_id in ('601318.XSHG', '600536.XSHG', '601012.XSHG')

其中desc表示逆序排列(从大到小)

1.2.2 获取相邻行

计算一分钟K线每分钟相比上一分钟成交额的占比:

select datetime, order_book_id, total_turnover / anyLast(total_turnover) OVER (PARTITION BY order_book_id ORDER BY datetime asc ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as deal_ratefrom rqdata.stock_kline_1m where date='2023-01-11' and datetime<='2023-01-11 09:32:00' and order_book_id in ('601318.XSHG', '600536.XSHG', '601012.XSHG')

其中anyLast表示取分组中的最后一个不为Null的值,asc表示顺序排列(从小到大),ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING 表示限制数数据在当前行的前面。

1.2.3 计算窗口报告值

计算一分钟K线未来10分钟的*高收益:

select datetime, order_book_id, round(100*(max(high) OVER (PARTITION BY order_book_id ORDER BY datetime asc ROWS BETWEEN 1 FOLLOWING AND 10 FOLLOWING) / close - 1), 3) as min10_high_ratefrom rqdata.stock_kline_1m where date='2023-01-11' and datetime<='2023-01-11 09:40:00' and order_book_id in ('601318.XSHG', '600536.XSHG', '601012.XSHG')

其中max(high)取分组中的最大值,ROWS BETWEEN 1 FOLLOWING AND 10 FOLLOWING 表示在接下来的10分钟里。

二、数据清洗案例

2.1 关于Tick行情

Tick行情在国内市场通常称为切片数据,把距离上一个时间点的市场状态以及快照行情提供给用户,沪深股票交易所的切片间隔是3s,期货市场则更短,最快可以达到0.5s。Tick数据不同于逐笔委托和逐笔成交数据,最快为3s,如果一段时间没有成交,这个值就更久远。

由此造成一个问题,某个1分钟时间段里,可能无法直接由Tick行情得到K线数据,因此,如果某分钟没有记录,我们需要在其后面补充一条记录,保持K线的完整性。

2.2 选出缺失项

首先我们定义表a为所有股票代码,表b为所有分钟日期,两表做笛卡尔乘积可以得到完整的数据主键,这里通过cross join来实现。再通过左反连接left anti join排除掉已有的数据主键,这样就选出了需要补充的项:

select a.stock_code as stock_code, toDate('2016-03-29') as trade_date, b.trade_time + toIntervalSecond(59) as trade_time, null as open, null as last, null as high, null as low, null as prev_close, null as volume, null as total_turnoverfrom ( select distinct stock_code from rqdata.sse50_tick where trade_date='2016-03-29' ) a cross join ( with toHour(trade_time) * 60 + toMinute(trade_time) as trade_minute select distinct toStartOfMinute(trade_time) as trade_time from rqdata.sse50_tick where trade_date='2016-03-29' and trade_minute != 11 * 60 + 30 and trade_minute != 9 * 60 + 29 and trade_minute != 12 * 60 + 59 and trade_minute>= 9*60 + 25 ) bleft anti join ( select distinct stock_code, toStartOfMinute(trade_time) as trade_time from rqdata.sse50_tick where trade_date='2016-03-29' ) c on a.stock_code = c.stock_code and b.trade_time = c.trade_time

可以发现,部分股票没有15:00的收盘价

Tick Time精华值得入手吗?Tick Time精华好用吗

2.3 利用分析函数填充缺失项

select stock_code, trade_date, trade_time, anyLast(open) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as open, ...from ...

2.4 与原始数据并联后的完整语句

select * from ( select stock_code, trade_date, trade_time, anyLast(open) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as open, anyLast(last) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last, anyLast(high) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as high, anyLast(low) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as low, anyLast(prev_close) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as prev_close, anyLast(volume) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as volume, anyLast(total_turnover) over (PARTITION BY stock_code ORDER BY trade_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as total_turnover from ( select a.stock_code as stock_code, toDate('2016-03-29') as trade_date, b.trade_time + toIntervalSecond(59) as trade_time, null as open, null as last, null as high, null as low, null as prev_close, null as volume, null as total_turnover from ( select distinct stock_code from rqdata.sse50_tick where trade_date='2016-03-29' ) a cross join ( with toHour(trade_time) * 60 + toMinute(trade_time) as trade_minute select distinct toStartOfMinute(trade_time) as trade_time from rqdata.sse50_tick where trade_date='2016-03-29' and trade_minute != 11 * 60 + 30 and trade_minute != 9 * 60 + 29 and trade_minute != 12 * 60 + 59 and trade_minute>= 9*60 + 25 ) b left anti join ( select distinct stock_code, toStartOfMinute(trade_time) as trade_time from rqdata.sse50_tick where trade_date='2016-03-29' ) c on a.stock_code = c.stock_code and b.trade_time = c.trade_time union all select stock_code, trade_date, trade_time, open, last, high, low, prev_close, volume, total_turnover from rqdata.sse50_tick where trade_date='2016-03-29' )) where stock_code='601669.XSHG' and trade_time>'2016-03-29 14:59:00'

可以看到,最后一行进行了填充,当时间精确到分钟后,便得到收盘价。

Tick Time精华值得入手吗?Tick Time精华好用吗

结论

分析函数,也叫OLAP函数(Online Anallytical Processing,联机分析处理),可以对数据库数据进行实时分析处理。比起Python中的数据处理包Pandas,可以利用数据库软件自身的多线程机制进行加速,高效地完成数据分析工作。


以上就是小编给大家带来的关于'Tick Time精华值得入手吗?Tick Time精华好用吗'的探讨分享,希望大家通过阅读小编的文章之后能够有所收获!如果大家觉得小编的文章不错的话,可以多多分享给有需要的人。

全部评论(0)