How To Predict Bitcoin Price Using Google Data

Warning: This Article Will Be Techincal
The Information in This Article is Intended Only as Scientific Analysis and Not As Financial Suggestion – I don’t consider myself Responsible for how you will use It and for Eventually Financial Losses

Human Beings tend to lie or to have a biased image of themselves: consequently, when they talk about the value of something, the answer will tend to be unprecise or completely wrong.

The main reason relay on 2 factors:
I) People don’t know what others will do –> Individual perspective, on past, present and future collective behavior, hasn’t enough information and cognitive capability to process clearly them.
II) People don’t know what themselves will do –> It is hard to rationalize and predict themselves future behaviors, we don’t know, on average, how we make decisions and what influences us.

Necessarily, to understand what people want and what they will value (also in terms of market price), we have to observe what they do, and not what they say.

It is basing on this assumption – we can understand people analyzing what they do – and on the James Surowiecki’s one (The Wisdom Of Crowds) – the average of a collectivity tends to be a more precise estimation than the individual’s estimation -, that I based my analysis.

The main scope was trying to understand if Bitcoin Market Value could be expressed as a Linear Transformation of the Size of its Community, as I introduced in my precedent article.

To Understand and Estimate the Bitcoin Community Size (People Interested in Bitcoin), initially, I tried to scrape Twitter using Selenium (a Python Framework to simulate Browser interactions) and AWS (to execute it in Cloud, so I could use my Pc to see Movies in the meantime).

After an entire night, I discovered that the program was still scraping the first-day data (too many tweets and few Computational Power available, because I was using EC2 free version): unfeasible considering I wanted to scrape data since 2008-01-01.

It will be for the next time, maybe with more computational power: in this case, I could apply some NLP (Natural-Language-Processing) Algorithm to extract the Sentiment and determine if someone was against or favorable Bitcoin (this is not possible with the methodology I used after, and this is one of it’s the biggest flaw according to me).

So, I decided to use the Cryptory module, to access to Google Searches Data and Bitcoin Prices.
I assumed Google Searches could be used as a Campionary Estimation of the Bitcoin Community Size, being able to describe (in part) the evolution of the Collective interest for the Bitcoin among time.

Bitcoin Google Searches
Bitcoin Close Price

As the first thing, I analyzed the Correlation between these 2-time series, obtaining a value equal to 0.763431, that can be considered strong and reliable to try to build a One-Variable Linear Model.

Consequently, I Scaled the Data to plot “Google Searches” against “Close Price”.

Scaled Values

And in Absolute Values, the Graph Become this one.

Absolute Values

After, I used Sklearn (a famous library used for Machine Learning Applications) to fit a Linear Regression Model, and I plotted it.

Fitting the Linear Model with Sklearn

I suggest you open the Images in a new window to have a better understanding of the plot.

The model parameters were: Intercept = [879.6594822833292] and Beta = [353.55511796].
So, if you want to estimate Bitcoin Expected Price, according to this model, you need to use Cryptory module to obtain the latest Google Data on Bitcoin, multiply this value for the Beta and add the Intercept.

I don’t suggest you to use Google Trends website to obtain the X value, because I used daily frequency (and for the large time period the website gives you monthly one), but directly this module to obtain more precise data.

I computed the mean of the last 7 days (for the X – Google Searches Value), obtaining 7.752316, and I estimated that the actual Bitcoin Price, according to this model, should be 3620.53 $.

This model not necessarily is correct, because it relies on past data and doesn’t discriminate between the good and bad community and maybe Bitcoin Price is determined by other variables too, but it could be useful to understand better the evolution of the Crypto when there aren’t other information.
I use it to understand if Bitcoin Price’s Increases are supported by True Increases of its Community, and according to this Model, the Actual Market Price is not consistent with its Community.

Thank You – I Hope My Article Is Useful – If Yes, Please Share It – To Contact Me Use The Contact Section!

This is the Code:

from cryptory import Cryptory
from import json_normalize
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

my_cryptory = Cryptory(from_date = "2013-04-28", to_date = "2019-07-27")

# get historical bitcoin prices from coinmarketcap
file = my_cryptory.extract_coinmarketcap("bitcoin")

# google trends- bitcoin search results
file2 = my_cryptory.get_google_trends(kw_list=["bitcoin"])

datetime = pd.to_datetime(file2["date"])

google_bitcoin = pd.DataFrame()
google_bitcoin["Bitcoin"] = file2["bitcoin"]
google_bitcoin = google_bitcoin.set_index(datetime)

datetime_m = pd.to_datetime(file["date"])
market_bitcoin = pd.DataFrame()
market_bitcoin["Bitcoin Close Price"] = file["close"]
market_bitcoin = market_bitcoin.set_index(datetime_m)

plt.ylabel('Google Searches - "Bitcoin"')
plt.title("Bitcoin Analysis - Stefano Ciccarelli - Matplotlib & Cryptory")

bitcoin = pd.DataFrame()
bitcoin["Bitcoin Google Searches"] = file2["bitcoin"]
bitcoin["Bitcoin Price"] = file["close"]

scale = StandardScaler()
#bitcoin_scaled = pd.DataFrame(scale.fit_transform(bitcoin))
bitcoin_scaled = bitcoin.iloc[::-1]

sentiment = pd.DataFrame(bitcoin_scaled["Bitcoin Google Searches"])
X = sentiment.values.reshape(-1,1)
y = bitcoin_scaled["Bitcoin Price"]

linear = LinearRegression(),y)
y_pred = linear.predict(X)

Leave a Reply

Your email address will not be published. Required fields are marked *