Application of Machine Learning to Financial Trading January 2, 2015 Some slides borrowed from: Andrew Moore s lectures, Yaser Abu Mustafa s lectures
About Us Our Goal : To use advanced mathematical and statistical concepts to create situational trading algorithms generating uncorrelated alpha. Our Background: A Mathematician with some market experience started AlgoAnalytics in October 2009. Global Equivalent: Systematic (non-discretionary) Managed Futures Advisors CEO: Aniruddha Pant, PhD (Berkeley, USA) Financial Engineering, Quantitative Trading, Derivative Trading, Hedging, Analytics/Machine learning, Control Theory CFO: Girish Patil, BE, PGDBA Fundamental equity research covering Indian, US and Middle East markets. Experience in technical trading of markets. Aniruddha Pant +91-9822873624 @ apant@algoanalytics.com www.algoanalytics.com +6 Quantitative Analysts Page 2
Outline What is machine learning - Binary classification What are we trying to classify - Why is this problem unique Machine Learning Techniques - Different Techniques - Support Vector Machines (SVM) - Ensemble Learning - Unsupervised Learning - Overfitting Approach Newer techniques - MKL - Deep Learning Money Management What we do? AlgoAnalytics Portfolio Page 3
DEFINING & UNDERSTANDING THE PROBLEM Page 4
What is machine learning? A computer programis said to learn from experience E with respect to some class of tasks T and performance P, if its performance at tasks in T, as measured by P, improves with experience E Tom M. Mitchell Page 5
Daily Returns of NIFTY Index since January 2003 Daily Return Average: 0.081%, Standard Deviation: 0.016 Ratio of Std/Mean = 19.27 Kurtosis: 13, 2% of the moves bigger than 3-sigma 3 moves bigger than 6 sigma in@2800 days 6-sigma moves @350times more likely than Gaussian Non stationary distribution Page 6
Autocorrelation of Daily returns Mean absolute daily move 1.1% 52% Accuracy leads to losses/break-even 56 % Accuracy leads to phenomenal profit 4% improvement over break-even accuracy leads to 8.8% profit every 100 days, which is huge! Working very close to randomness Page 7
Random Trading Systems: Pitfalls of working with close to random systems Daily signals generated randomly 100 times Only constraint: Number of positive moves same as original dataset Best random system accuracy: 53.1% Worst random system accuracy: 47.3% Page 8
MACHINE LEARNING TECHNIQUES Page 9
Supervised vs. Unsupervised Learning Supervised Learning Unsupervised Learning Goal: to learn a classification/regression model TASK: well defined (the target function) EXPERIENCE: training data with teacher provided PERFORMANCE: error/accuracy on the task Primarily, supervised learning used in the case of financial data Goal: to find structure in the data TASK: vaguely defined No TEACHER No PERFORMANCE (but there are some evaluation metrics) Page 10
Supervised Learning Techniques Decision Trees Flow-chart like structure Valuable with small width datasets Maps observations of an item to conclusion about the items target value Random Forests Extension of single classification trees Many classification trees grown into a FOREST High accuracy and efficient on large databases Artificial Neural Networks Analogous to biological neural networks Used to find complex data patterns Interconnected artificial neurons used for computation Logistic Regression Probabilistic statistical classification model Binary Predictor Page 11
Supervised Learning Techniques SVM Used for classification and regression analysis Constructs hyperplane in high dimensional space with maximum margin Most widely used and popular method Multiple Kernel Learning Extension of kernel trick used to handle non-linear classification Combines information from multiple sources Deep Learning Attempts to model high-level abstractions in data Model architecture composed of multiple non-linear transformations Uses many layers of non-linear processing units for feature extraction and transformation Bayesian Networks Probabilistic model Based on the Bayesian rule Assumption that input attributes are indepedant Page 12
WHAT WE DO FINANCIAL MARKETS Page 13
Try to predict many things which look like this Daily Return Average: 0.081%, Standard Deviation: 0.016 Ratio of Std/Mean = 19.27 Kurtosis: 13, 2% of the moves bigger than 3-sigma 3 moves bigger than 6 sigma in@2800 days 6-sigma moves @350times more likely than Gaussian Non stationary distribution Page 14
AA Portfolio Intra-Day Low Frequency Daily predictions using machine learning techniques Predictions based on economic factors affecting the underlying security Market Neutral Multi-Day Directional Strategy Options Pair Trading Long-short pairs of Nifty stocks and indices Market neutrality achieved by making the pair beta neutral. Based on the idea of statistical arbitrage Momentum Strategy Indentifying momentum in stocks/indices Mean-Reversion Strategy Assumption that each security returns to its historical mean Alpha comes from underlying direction Butterfly spread long ITM strike, short 2 ATM strike, long OTM strike No naked short options Page 15
Portfolio Performance 2.5 Equity Curve 2 AA Portfolio Niftybees 1.5 1 0.5 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 AA Equity Backtesting Performance : Backtesting Period: 4 th Jan 2010 31 st Oct 2014 Portfolio Annualized Returns Drawdown Max DD Period (Months) Leverage Factor Max Loss in Rs. L Sharpe Calmar ratio AA Equity 15.86% 4.80% 4 1 48 3.75 3.30 NiftyBees 10.63% 27.50% 38 0.62 0.39 *Nifty BeES, an ETF tracking the S&P CNX Nifty index, is used as the benchmark. Page 16
WHAT WE DO OTHER DOMAINS Page 17
Some of our previous work and future possibilities BFSI Trading Strategy and Analysis Bank Credit Classification Portfolio Analysis Financial Market Forecasting Predict customer interest in Caravan Insurance Policy Predictive Customer Relationship Analytics(CRA) Risk management and prediction Future Work Detect money laundering Customer segmentation and Branding Healthcare Recognizing potential Pulmonary Embolism candidates from CAT scan data Hepatitis B and Hepatitis C patients using nonbiopsy test data Cancer cell classification Future Work Patient care aid Predict premature birth based on peptide biomarkers Risk of death in surgery Hospital admission predict readmission for same illness Page 18
Some of our previous work and future possibilities (Contd ) Human Resources Management Manpower Asset Allocation Recruitment Model Talent Forecasting Worker s Compensation Policy Future Work Turnover modeling for businesses Targeted retention Telecom Accurately predict as many current 3G customers Identify 2G customers likely to convert to 3G customers Future Work Forecast traffic patterns and peak period routing Identify at-risk customers; convert them to loyal customers Other Electricity Load Forecasting Airline Passenger Forecasting Sentiment Analysis using twitter data Cross-selling predicting potential customers Future Work Predicting player performance in sports Efficient building design Power grid management Page 19
Work in progress MRI Analytics Efficient evidence based healthcare system Image Processing + Machine learning + Radiologist = decision support systems Recommender Systems Recommend items sold online to potential customers Machine learning - predicting that an item is worth recommending Automated detection of diabetic retinopathy and macular edima Efficient evidence based healthcare system Image Processing + Machine learning + eye specialist Predictive Maintenance in Refrigeration Systems Fault detection in refrigeration systems Energy optimization Page 20