Pandas-Keras — Deep Learning

Jiyun Park
7 min readJun 29, 2020

--

Practice | Smart Trading Agent

Smart Trading Agent

In this notebook, we predict cryptocurrency transaction based on five pieces of data:

  • Timestamp (year-month-day hour:min:sec)
  • Price
  • Mid Price
  • Book Feature
  • Side (Sell / Buy)

Introduction

In this tutorial we will use the popular Deep Learning library, Keras, and the visualization libraries Matplotlib and Seaborn to build a classifying simple model. The libraries Numpy and Pandas will help us along the way.

Visualization

We will now start thinking of which of these features we will use in our model. First let’s make a plot of our data to see how it looks. To visualize our data, we will use matplotlib and seaborn.

Intuitively, it makes sense that the price of BTC (‘price’) would play a big role in the customer consumption(‘side’). Let’s see if these hypotheses are correct:

Unfortunately, it is a bit hard to visualize prices since we have a lot of different samples.

There are some peek points around 0.99 and 1.015. At these points, trader bought or sold BTC at the highest price.

Here we can see that prices on both ends of the spectrum seem to fare better, but we need to get a closer look. We will ‘bin-ify’ the prices, grouping them to bins according to their value. So, prices closer together will appear as one and it will be easier to visualize.

The function we will use will round the prices within a factor. We will use numpy.

There doesn’t seem to be much correlation to transaction rate.

How about counts?

It seems more clear that trader usually make highest numbers of transactions around 9,900,000 won. The number of transactions grows proportionally to around 9,900,000 won, since then it decreases. It returns to growth by around 1,015,000 won, but its total counts are much smaller than before.

conclusion 1:

Main trade price of May 2018 is about 9,900,000 won and most trades are under 10,500,000 won.

Now to check the book feature:

It shows a correlation of transaction counts with book features.

same as both sides, buy and sell.

Something does seem to be going on with ‘Book Feature’. When it’s over zero, total transaction counts at those feature numbers start to decrease. Especially, the count numbers of 3 is more than twice the 8.

conclusion 2

The latest roundup of information is that the trader usually make transactions at price from 9,900,000 won to 10,000,000 won. And those book-feature will be under 7,000,000 won with high possibility.

Let’s try to think about transaction types separately. And this time, we will focus on transaction price under 10,000,000 won and book-feature under 7,000,000 won more specifically.

In a narrow spectrum, let’s see:

The highest number of sells is around 1,000,000 won, on the other hand, the highest number of purchases is around 2,000,000 won. The second one of sells is around 3,000,000 won and one of purchases is also around 3,000,000 won.

The trend line is decreasing on both graphs.

For the first time, let’s consider about transaction time:

Against our expectations, there doesn’t seem to be much correlation to transaction time.

final conclusion

So far, we’ve thought about several factors to trader’s transaction patterns. Of course price was the first thing and we got first conclusion of main transaction price. Second was book-feature which is the differences in price and mid-price. And it was clear that trader’s main transaction book feature is under 7,000,000 won. Lastly, we thought about transaction time. It didn’t seem to be strong correlation to trades even though time is quite important factor in almost every cases.

Unfortunately, we couldn’t find a big difference between case of sells and purchases with 5 data types. We’ve just reached that there was a specific intersection of price and book-feature and in this section, trader would buy or sell BTC.

As a result, we couldn’t find a very obvious and specific transaction patterns with 2018–05-newtrade dataset. Instead of generalized statements, we caught some dealing patterns and a few trend lines.

  • Correlation between Transaction Counts and Book Feature: Total counts of sells and purchases decreases proportionally since it’s over 1,000,000 won. Each one has a downward tendency the same.
  • Trader has high possibility of purchase around 2,000,000 won. It’s the only book-feature section that total counts of purchases is more than one of sells. The others trader tends to sell BTC.

Plotting the data

First let’s make a plot of our data to see how it looks. In order to have a 2D plot, let’s ignore the timestamp and mid-price.

Roughly, it looks like the price from 9,820,000 won to 10,000,000 with the book-feature from -500,000 won to 2,000,000 won was dealt, while the ones with high prices didn’t, but the data is not as nicely separable as we hoped it would. Maybe it would help to separate the book-feature ranges? Let’s make 5 plots, each has 4,000,000 differences.

It seems that the lowest transaction counts at range4, and the next is range3. Most of them are at range1 and range2. And it’s common thing that few transactions are at price from 1,000,000 won to 12,000,000 won over all ranges. Let’s use the book-feature as one of our inputs. In order to do this, we should one-hot encode it.

Before encoding, we should replace book_feature with simple integer and remove unused columns.

One-hot encoding the book feature

We’ll use the get_dummies function in pandas. Let's do one-hot encoding:

Scaling the data

The next step is to scale the data. We notice that the range for book-feature is 1.0–5.0, whereas the range for price is roughly 9,760,000–10,230,000 which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let’s fit our two features into a range of 0–1, by dividing the book-feature by 5.0, and the prices by 10,230,000.

Splitting the data into Training and Testing

In order to test our algorithm, we’ll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data.

Splitting the data into features and targets (labels)

Now, as a final step before the training, we’ll split the data into features (X) and targets (y).

Also, in Keras, we need to one-hot encode the output. We’ll do this with the to_categorical function.

Defining the model architecture

Here’s where we use Keras to build our neural network.

Training the model

Scoring the model

👩🏻‍💻See code? Go to:

https://github.com/jyuunnii/Smart-Trading-Agent-Keras

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response