Python中如何搭建自己的量化分析數(shù)據(jù)庫

本篇文章給大家分享的是有關(guān)Python中如何搭建自己的量化分析數(shù)據(jù)庫，小編覺得挺實用的，因此分享給大家學(xué)習(xí)，希望大家閱讀完這篇文章后可以有所收獲，話不多說，跟著小編一起來看看吧。

定海網(wǎng)站建設(shè)公司成都創(chuàng)新互聯(lián)公司,定海網(wǎng)站設(shè)計制作，有大型網(wǎng)站制作公司豐富經(jīng)驗。已為定海上千家提供企業(yè)網(wǎng)站建設(shè)服務(wù)。企業(yè)網(wǎng)站搭建\外貿(mào)網(wǎng)站制作要多少錢，請找那個售后服務(wù)好的定海做網(wǎng)站的公司定做！

手把手教你搭建自己的量化分析數(shù)據(jù)庫

引言：

數(shù)據(jù)是金融量化分析的重要基礎(chǔ)，包括股票歷史交易數(shù)據(jù)、上市公司基本面數(shù)據(jù)、宏觀和行業(yè)數(shù)據(jù)等。隨著信息流量的日益膨脹，學(xué)會獲取、查詢和加工數(shù)據(jù)信息變得越來越重要。對于鼓搗量化交易的人來說，怎么能說不會玩數(shù)據(jù)庫呢？目前常用的開源（免費(fèi)）數(shù)據(jù)庫有MySQL、Postgresql 、MongoDB 和 SQLite （Python自帶），在2018-2019年DB-Engines 排行榜上位居前十（見下圖），可見其使用量和受歡迎程度較高。這幾個數(shù)據(jù)庫各有自己的特點和適用環(huán)境，關(guān)于該學(xué)習(xí)哪一個或如何學(xué)習(xí)網(wǎng)上有很多相關(guān)資料。本文主要為大家簡單介紹如何使用 Python 操作 Postgresql 數(shù)據(jù)庫（其他數(shù)據(jù)庫類似），利用 psycopg2 和 sqlalchemy 實現(xiàn) postgresql 與 pandas 的 dataframe 進(jìn)行交互，一步步搭建自己的量化分析數(shù)據(jù)庫。

Python中如何搭建自己的量化分析數(shù)據(jù)庫

PostgreSQL的安裝與使用

安裝 PostgreSQL。到其官網(wǎng)選擇適合自己電腦配置的版本下載安裝即可，安裝過程除了設(shè)置密碼（本文設(shè)置為“123456”），其他可選擇全部默認(rèn)，如實在不會可參考CSDN上的文章：PostgreSQL安裝詳細(xì)步驟（windows）。安裝完之后在安裝目錄里還可以看到pgAdmin4，這個是自帶的數(shù)據(jù)庫圖形化工具，最新版是Web 應(yīng)用程序，有點類似 Python 的 Jupyter Notebook，可用來查看和操作postgresql 數(shù)據(jù)庫。

Python上安裝psycopg2 和 sqlalchemy 庫。psycopg2 是 Python 連接PostgreSQL數(shù)據(jù)庫的接口，sqlalchemy 應(yīng)用更廣泛，可連接數(shù)據(jù)庫（MySQL, SQLite, PostgreSQL），尤其是對于 pandas 的dataframe型數(shù)據(jù)，操作起來十分方便。關(guān)于這兩個 python 庫的介紹網(wǎng)上有很多，這里不詳細(xì)展開，在cmd上使用pip install xxx 進(jìn)行安裝即可。

實例應(yīng)用

首先，使用 tushare 獲取3000多只股票行情數(shù)據(jù)到本地，使用psycopg2 和 sqlalchemy 為接口，將數(shù)據(jù)存入本地PostgreSQL數(shù)據(jù)庫中，方便進(jìn)一步查詢和操作。

#先引入后面分析、可視化等可能用到的庫
import tushare as ts
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
#正常顯示畫圖時出現(xiàn)的中文和負(fù)號
from pylab import mpl
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False
#設(shè)置token
token='輸入你的token'
pro = ts.pro_api(token)

數(shù)據(jù)獲取函數(shù)，默認(rèn)時間可以隨時改動。

#如果報錯，把tushare升級到最新
def get_data(code,start='20190101',end='20190425'):
 df=ts.pro_bar(ts_code=code, adj='qfq', start_date=start, end_date=end)
 return df
股票代碼獲取函數(shù)，獲取最新交易日的代碼。
#獲取當(dāng)前交易日最新的股票代碼和簡稱
def get_code():
 codes = pro.stock_basic(list_status='L').ts_code.values
 return codes

插入PostgreSQL 數(shù)據(jù)庫操作，函數(shù)里使用了try...except...pass是為了避免某些數(shù)據(jù)出錯導(dǎo)致程序崩潰。

from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')
def insert_sql(data,db_name,if_exists='append'):
 #使用try...except..continue避免出現(xiàn)錯誤，運(yùn)行崩潰
 try:
 data.to_sql(db_name,engine,index=False,if_exists=if_exists)
 #print(code+'寫入數(shù)據(jù)庫成功')
 except:
 pass

由于行情數(shù)據(jù)量龐大，下載比較慢，先下載20190101至20190425期間日交易

數(shù)據(jù)，后續(xù)再不斷更新。

#下載20190101-20190425數(shù)據(jù)并插入數(shù)據(jù)庫stock_data
#此步驟比較耗費(fèi)時間，大致25-35分鐘左右
for code in get_code():
 data=get_data(code)
 insert_sql(data,'stock_data')
#讀取整張表數(shù)據(jù)
df=pd.read_sql('stock_data',engine)
print(len(df))
#輸出結(jié)果：270998
#選取ts_code=000001.SZ的股票數(shù)據(jù)
df=pd.read_sql("select * from stock_data where ts_code='000001.SZ'",engine)
print(len(df))

構(gòu)建一個數(shù)據(jù)更新函數(shù)，可以下載和插入其他時間周期的數(shù)據(jù)。2018年1月1日至2019年4月25日，數(shù)據(jù)就已達(dá)到108萬條。

#更新數(shù)據(jù)或下載其他期間數(shù)據(jù)
def update_sql(start,end,db_name):
 from datetime import datetime,timedelta
 for code in get_code():
 data=get_data(code,start,end)
 insert_sql(data,db_name)
 print(f'{start}:{end}期間數(shù)據(jù)已成功更新')
#下載20180101-20181231期間數(shù)據(jù)
#只需運(yùn)行一次，不再運(yùn)行后可以注釋掉
#下載數(shù)據(jù)比較慢，需要20-35分鐘左右
start='20180101'
end='20181231'
db_name='stock_data'
#數(shù)據(jù)下載和存入數(shù)據(jù)庫
update_sql(start,end,db_name)
#使用pandas的read_sql讀取數(shù)據(jù)
df_all_data=pd.read_sql('stock_data',engine)
print(len(df_all_data))
#輸出結(jié)果：1087050
#查看交易代碼和交易日期個數(shù)
print(len(df_all_data.ts_code.unique()))
print(len(df_all_data.trade_date.unique()))
#輸出結(jié)果：3604；319
d=df_all_data.trade_date.unique()
print(d.max())
print(d.min())
2019-04-25T00:00:00.000000000
2018-01-02T00:00:00.000000000
#獲取交易日2019年4月25日數(shù)據(jù)
pd.read_sql("select * from stock_data where trade_date='2019-04-25' ",engine).head()

Python中如何搭建自己的量化分析數(shù)據(jù)庫

構(gòu)建數(shù)據(jù)查詢和可視化函數(shù)：
def plot_data(condition,title):
 from pyecharts import Bar
 from sqlalchemy import create_engine
 engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')
 data=pd.read_sql("select * from stock_data where+"+ condition,engine)
 count_=data.groupby('trade_date')['ts_code'].count()
 attr=count_.index
 v1=count_.values
 bar=Bar(title,title_text_size=15)
 bar.add('',attr,v1,is_splitline_show=False,linewidth=2)
 return bar
查詢股價低于2元個股數(shù)據(jù)分布
c1="close<2"
t1="股價低于2元個股時間分布"
plot_data(c1,t1)

Python中如何搭建自己的量化分析數(shù)據(jù)庫

查詢股價日漲幅超過9.5%個股數(shù)據(jù)分布：

c2="pct_chg>9.5"
t2="股價漲幅超過9.5%個股時間分布"
plot_data(c2,t2)

Python中如何搭建自己的量化分析數(shù)據(jù)庫

查詢股價日跌幅超過-9.5%個股數(shù)據(jù)分布：

c3="pct_chg<-9.5"
t3="股價跌幅超過-9.5%個股時間分布"
plot_data(c3,t3)

Python中如何搭建自己的量化分析數(shù)據(jù)庫

結(jié)合選股策略對數(shù)據(jù)庫進(jìn)行查詢和提取數(shù)據(jù)：

#篩選代碼
#獲取當(dāng)前交易的股票代碼和名稱
def get_new_code(date):
 #獲取當(dāng)前所有交易股票代碼
 df0 = pro.stock_basic(exchange='', list_status='L')
 df1 =pro.daily_basic(trade_date=date)
 df=pd.merge(df0,df1,on='ts_code')
 #剔除2017年以后上市的新股次新股
 df=df[df['list_date'].apply(int).values<20170101]
 #剔除st股
 df=df[-df['name'].apply(lambda x:x.startswith('*ST'))]
 #剔除動態(tài)市盈率為負(fù)的
 df=df[df.pe_ttm>0]
 #剔除大市值股票
 df=df[df.circ_mv<10**5]
 #剔除價格高于20元股票
 #df=df[df.close<20]
 codes=df.ts_code.values
 return codes
len(get_new_code('20190425'))
#輸出結(jié)果：46
import talib as ta
#20日均線交易策略
def find_stock(date):
 f_code=[]
 for code in get_new_code(date):
 try:
 data=df_all_data.loc[df_all_data.ts_code==code].copy()
 data.index=pd.to_datetime(data.trade_date)
 data=data.sort_index()
 data['ma_20']=ta.MA(data.close,timeperiod=20)
 if data.iloc[-1]['close']>data.iloc[-1]['ma_20']:
 f_code.append(code)
 except:
 pass
 return f_code
fs=find_stock('20190305')
print(f'篩選出的股票個數(shù)：{len(fs)}')
if fs:
 df_find_stocks=pd.DataFrame(fs,columns=['ts_code'])
 #將選出的股票存入數(shù)據(jù)庫，如果表已存在，替換掉，相當(dāng)于每次更新
 insert_sql(df_find_stocks,'find_stocks',if_exists='replace')
 print('篩選的股票已入庫')
篩選出的股票個數(shù)：9
篩選的股票已入庫
#查看數(shù)據(jù)庫中篩選的股票池
codes=pd.read_sql('find_stocks',engine)
codes=codes.values.tolist()
codes=[c[0] for c in codes]
#print(codes)

對篩選的股票作進(jìn)一步分析：

select_data=pd.DataFrame()
for code in codes:
 try:
 df_= df_all_data[df_all_data.ts_code.values==code]
 df_.index=pd.to_datetime(df_.trade_date)
 df_=df_.sort_index()
 select_data[code]=df_.close
 except:
 pass
select_data.fillna(method='ffill',inplace=True)
select_data.tail()
ret=select_data.apply(lambda x:x/x.shift(1)-1)
ret=ret.dropna()
ret.tail()
prod_ret=ret.apply(lambda x:(1+x).cumprod())
prod_ret.plot(figsize=(12,5))
plt.xlabel('',fontsize=15)
plt.title('股票池累計凈值',size=15)
ax = plt.gca() 
ax.spines['right'].set_color('none') 
ax.spines['top'].set_color('none') 
plt.show()

Python中如何搭建自己的量化分析數(shù)據(jù)庫

#根據(jù)代碼從數(shù)據(jù)庫中獲取數(shù)據(jù)
def get_data_from_sql(code):
 from sqlalchemy import create_engine
 engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')
 data=pd.read_sql(f"select * from stock_data where ts_code='{code}'",engine)
 data.index=pd.to_datetime(data.trade_date)
 data=data.sort_index()
 #計算20日均線
 data['ma20']=data.close.rolling(20).mean()
 return data

利用20日均線交易策略，搭建數(shù)據(jù)查詢和可視化函數(shù)kline_plot()，完整代碼將分享在知識星球上。對選出的股票日K線、20日均線、成交量、買入（buy）和賣出（sell）信號進(jìn)行可視化，下面以002790.和300573股票的K線圖為例。

kline_plot('002790.SZ')

Python中如何搭建自己的量化分析數(shù)據(jù)庫

kline_plot('300573.SZ')

Python中如何搭建自己的量化分析數(shù)據(jù)庫

數(shù)據(jù)庫操作其實要學(xué)的東西還很多，本文旨在拋磚引玉，簡單介紹使用Python 對 PostgreSQL 數(shù)據(jù)庫與 dataframe 型數(shù)據(jù)進(jìn)行交互，一步步搭建自己的量化分析數(shù)據(jù)庫。由于文中用到的數(shù)據(jù)僅為百萬條左右，實際上使用excel的csv來讀寫也很快，并且比較直觀，但隨著數(shù)據(jù)的不斷增多，要建立自己完善的量化分析系統(tǒng)，數(shù)據(jù)庫的學(xué)習(xí)就顯得尤為重要。注意，文中所提及選股方式和股票代碼僅作為示例應(yīng)用，不構(gòu)成任何投資建議。

以上就是Python中如何搭建自己的量化分析數(shù)據(jù)庫，小編相信有部分知識點可能是我們?nèi)粘９ぷ鲿姷交蛴玫降摹ＯＭ隳芡ㄟ^這篇文章學(xué)到更多知識。更多詳情敬請關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道。

分享題目：Python中如何搭建自己的量化分析數(shù)據(jù)庫
本文來源：http://www.chinadenli.net/article2/iiigoc.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供關(guān)鍵詞優(yōu)化、商城網(wǎng)站、用戶體驗、網(wǎng)站導(dǎo)航、Google、虛擬主機(jī)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

欧美一区二区三区老妇人-欧美做爰猛烈大尺度电-99久久夜色精品国产亚洲a-亚洲福利视频一区二区

Python中如何搭建自己的量化分析數(shù)據(jù)庫