JPMorgan's new guide to machine learning in algorithmic trading

eFC logo
JP Morgan NIPs paper machine learning AI

If you're interested in the application of machine learning and artificial intelligence (AI) in the field of banking and finance, you will probably know all about last year's excellent guide to big data and artificial intelligence from J.P. Morgan. You will also therefore be interested to know that the bank has just released a new report on the problems of 'applying data driven learning' to algorithmic trading. 

Last year's giant report was compiled by Marko Kolanovic, the 'half man half God' head of JPM's macro quant research team, with assistance from Rajesh Krishnamachari, a quant strategist who quit for Bank of America Merrill Lynch in April. This month's smaller report is authored by five different JPM employees - Vacslav Glukhov (Head of EMEA E-Trading Quantitative Research), Vangelis Bacoyannis (a VP in eTrading Quantitative Research), Tom Jin (a quant analyst), Jonathan Kochems (a quant researcher), and...Doo Re Song (also a quant research).

The new report was presented at the NIPS conference in May 2018, but has only just been made public.

For those who want to know how 'data driven learning' interacts with algorithmic trading, this is what the report is saying.

Algorithms now control key trading decisions, within a few parameters set by clients

Algorithms in finance control "micro-level" trading decisions for equities and electronic futures contracts: "They define where to trade, at what price, and what quantity."

However, algos aren't free to do as they please. JPM notes that clients, "typically transmit specific instructions with constraints and preferences to the execution broker." For example, clients might want to preserve currency neutrality in their portfolio transitions, so that the amount sold is roughly equal to the amount bought. They might also specify that the executed basket of securities is exposed in a controlled way to certain sectors, countries or industries.

When clients are placing a single order, they might want to control how the execution of the order affects the market price (control market impact), or to control how the order is exposed to market volatility (control risk), or to specify an urgency level which will balance market impact and risk.

The data contained in a trading order book is crazily complex

Writing an electronic trading algorithm is a crazily complicated undertaking.

For example, the JPM analysts point out that a game of chess is about 40 steps long and that a game of Go is about 200 steps long. However, even with a medium frequency electronic trading algorithm which reconsiders its options every second, there will be 3,600 steps per hour.

Nor is this the only issue. When you're mapping the data in Chess and Go, it's a question of considering how to move one piece among all the eligible pieces and how they might move in response. However, an electronic trading action consists of multiple moves. It's, "a collection of child orders," say the JPM analysts.

What's a child order? JPM points out that a single action might be, "submitting a passive buy order and an aggressive buy order. The passive child order will rest in the order book at the price specified and thus provide liquidity to other market participants. Providing liquidity might eventually be rewarded at the time of trade by locally capturing the spread: trading at a better price vs someone who makes the same trade by taking liquidity. The aggressive child order, on the other hand, can be sent out to capture an opportunity as anticipating a price move. Both form one action. The resulting action space is massively large and increases exponentially with the number of combinations of characteristics we want to use at a moment in time."

Right.

Trading algorithms written by humans tend to become huge and unwieldy

When humans write electronic trading algorithms, things quickly become complicated.

In the past, the JPM analysts note that electronic trading algos were, "a blend of scientific, quantitative models which expressed quantitative views of how the world works." They contained, "rules and heuristics which expressed practical experience, observations and preferences of human traders and users of algorithms."

Trying to encapsulate all of this is hard. Most human-compiled algos are, "tens of thousands lines of hand-written, hard to maintain and modify code." When clients object and markets change, JPM says human algos suffer from “feature creep.” Over time, they come to, "accumulate many layers of logic, parameters, and tweaks to handle special cases."

Regulation makes human algos more complex again

Trading algos also have to do with regulations like MiFID II and the concept of, “best execution.”

They must therefore be written to take account of, "changing market conditions and market structure, regulatory constraints, and clients’ multiple objectives and preferences."

If the writing of algos can be automated and account of these constraints, life will be simpler.

There are three cultural approaches to the use of data when writing trading algorithms

JPM says there are three cultural approaches to using data when you're writing a trading algorithm: the data modelling culture; the machine learning culture; and the algorithmic decision making culture.

The data modelling culture is based on a presumption that financial markets are like a black box with a simple model inside. All you need to do is to build a quantitative model that approximates the black box. Given the complexity of behaviour in the financial markets, this can be too simple.

The machine learning culture tries to use more complex and sometimes opaque functions to model observations. It doesn't claim that these functions reveal the nature of the underlying processes.

The algorithmic decision making culture is about making decisions rather than building models. Instead of trying to map how the world works, this culture tries to train electronic agents (ie. an algorithm) to distinguish between good decisions and bad decisions. The problem then becomes trying to understand why the algorithm made the decisions it did, and injecting rules, values and constraints to ensure the decisions are acceptable.

The algorithm has to find a balance between the optimal rate of execution and the optimal execution schedule for the desired trades

Once you've got your algorithm it needs to make a trade-off. It can either execute a trade quickly, at the risk of effecting market prices. Or it can execute a trade slowly, at the risk that prices will change in a way that's bad for the order ('up for a buy order, down for the sell order').

It's not always clear what constitutes a successful trade 

The definition of success in algo trading is not simple. It might be about balancing this trade-off between executing a trade quickly (efficiency) and executing a trade in such a way that prices are unchanged (optimality) - it depends on client priorities.

For example, the algo's objective might be to blend with the rest of the market. This means balancing the market impact from trading too quickly and moving the price, or trading slowly and seeing prices move against the trade. The algo writer need to find a way of representing information and actions in a way that will fit with models and machine learning methods. The market state has to be summarised despite its, "huge, variable and frequently changing dimension and order state, both parent order and child orders outstanding for model inputs."

It doesn't help that many opportunities are, "short lived and exist possibly on a microsecond scale only." Moreover, JPM says it won't always be apparent whether a trade is good or bad until after the trade has been executed or avoided: "Local optimality does not necessarily translate into a global optimality: what could be considered as a bad trade now could turn out to be an excellent trade by the end of the day".

J.P. Morgan has been using reinforcement learning algorithms to place trades, even though this can cause problems

J.P. Morgan is all for the kinds of "reinforcement learning" (RL) algorithms which use dynamic programming and penalize the algorithm for making a wrong decision whilst rewarding it for making a good one.

"We are now running the second generation of our RL-based limit order placement engine," say JPM's traders, adding that they have been training the ago within a "bounded action space" using, "local short term objectives which differ in their rewards, step and time horizon characteristics." However, training your algo can be complicated. - If you try to 'parallize' an algo's training by executing the algorithm on multiple processing devices at once, you can get the wrong result because of the feedback loop between the algorithm and the environment. But if you don't do this and try "gradient-based training" you will end up with a huge amount of irrelevant experiences and good behaviours can be forgotten.

JPM has tried to avoid this by, "applying hyper-parameter optimization techniques." This means they have fewer sampled episodes per trial and stop uninteresting paths early. Hyper-parameter optimization techniques have enabled the bank to train its algo by running training sessions in parallel.

JPM says the main focus of research has become "policy learning algorithms," which maximize aggregated rewards matching a specified business objective within certain parameters. It also notes that "hierarchical reinforcement learning" can be used in regions where trading algorithms have to, "produce predictable, controllable, and explainable behaviours."

Under a hierarchical approach, the algorithm's decision is separated into groups with different sampling frequencies and different levels of granularity. This allows the separation of some of the algo's modules, and makes it easier to see what its up to.

J.P. Morgan developed a reinforcement learning algorithm with a "character" to deal with long tails

In most reinforcement learning situations, JPMorgan notes that it's about the algorithm learning actions that lead to better outcomes on average. However, in finance it can be a mistake to focus too heavily on average outcomes - it's also about the long tails. For this reason, the bank's quants have been building algos which, "value multidimensional and uncertain outcomes."

To achieve this, the bank has been ranking uncertain outcomes (the long tail) by looking at the expected utility they will deliver in comparison with their future distribution. This is known as Certainty Equivalent Reinforcement Learning (CERL).

Under CERL, JPM notes that the algorithm effectively acquires a character based on its risk preferences. "If the client is risk-averse, the increased uncertainty of outcomes lowers the certainty equivalent reward of an action." This leads to the natural emergence of the discount factor γ as distribution of outcomes is broadened as risk increases and the algo looks further into the future.

There are a few useful open source reinforcement learning frameworks

If you want to build your own trading algorithm, JPM's researchers recommend a few places to start.

They note a few helpful early stage open source reinforcement learning frameworks, including:  OpenAI baselines, dopaminedeepmind/trfl and Ray RLlib.

Have a confidential story, tip, or comment you’d like to share? Contact: sbutcher@efinancialcareers.com in the first instance. Whatsapp/Signal/Telegram also available.

Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)

Related articles

Close