Overview Day One. Introduction & background. Goal & objectives. System Model. Reinforcement Learning at Service Provider

Similar documents
Optimal Demand Response Using Device Based Reinforcement Learning

Cyber physical social systems: Modelling of consumer assets and behavior in an integrated energy system

Resource allocation and load-shedding policies based on Markov decision processes for renewable energy generation and storage

MODELING CAR OWNERSHIP AND USAGE: A DYNAMIC DISCRETE-CONTINUOUS CHOICE MODELING APPROACH

1. For s, a, initialize Q ( s,

Distributed Solar Energy Sharing within Connected Communities: A Coalition Game Approach

Ph.D. Defense: Resource Allocation Optimization in the Smart Grid and High-performance Computing Tim Hansen

Spatial Information in Offline Approximate Dynamic Programming for Dynamic Vehicle Routing with Stochastic Requests

Microgrid Modelling and Analysis Using Game Theory Methods

Multi-Period Vehicle Routing with Call-In Customers

IEEE Trans. Smart grid-2015

Optimal Tradeoffs between Demand Response and Storage with Variable Renewables

Resource Adequacy and Reliability Impact of Demand Response in a Smart Grid Environment

A DYNAMIC DISCRETE-CONTINUOUS CHOICE MODEL TO EXPLAIN CAR OWNERSHIP, USAGE AND FUEL TYPE DECISIONS

David Simchi-Levi M.I.T. November 2000

Pricing-based Energy Storage Sharing and Virtual Capacity Allocation

Performance Evaluation of Peer-to-Peer Energy Sharing Models

Table of Contents. 1 Introduction. 2 Models. 3 Numerical Examples. 4 Summary & Conclusions

Leadership Through Innovation. Asia Clean Energy Forum 2016

Game Theory Methods in Microgrids

Predicting gas usage as a function of driving behavior

WHEN THE SMART GRID MEETS ENERGY-EFFICIENT COMMUNICATIONS: GREEN WIRELESS CELLULAR NETWORKS POWERED BY THE SMART GRID

An Inventory Model with Demand Dependent Replenishment Rate for Damageable Item and Shortage

Multi-Period Vehicle Routing with Stochastic Customers

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Backorders case with Poisson demands and constant procurement lead time

Optimising Market Share and Profit Margin: SMDP-based Tariff Pricing under the Smart Grid Paradigm

1.818J/2.65J/3.564J/10.391J/11.371J/22.811J/ESD166J SUSTAINABLE ENERGY

Chapter 4. Models for Known Demand

Hidden Chinese Restaurant Game: Grand Information Extraction for Stochastic Network Learning

Optimal Inventory Allocation under Substitutable Demand

Management of Knowledge Intensive Systems As Supernetworks: Modeling, Analysis, Computations, and Applications. Anna Nagurney June Dong

Increasing Wireless Revenue with Service Differentiation

Markets for Demand-Side Flexibility based on the Transactive Energy Approach

Optimal Battery Size for a Green Base Station in a Smart Grid with a Renewable Energy Source

NARUC Town Hall. Nashville, TN July 23, 2016

PROJECT X : UCSD Presentation. The Lean Startup. (Think Big, Start Small) PAGE 01

Pricing Game under Imbalanced Power Structure

Product Portfolio Selection as a Bi-level Optimization Problem

Modeling of competition in revenue management Petr Fiala 1

Psychology and Economics Field Exam August 2015

Introduction to Management Science 8th Edition by Bernard W. Taylor III. Chapter 1 Management Science

Costing and Pricing: how much would it cost your innovation? Monica Pesce VVA Brussels

Equilibria, Supernetworks, and Evolutionary Variational Inequalities

This is a refereed journal and all articles are professionally screened and reviewed

Carbon Footprint Optimization - Game Theoretic Problems and Solutions

RL-BLH: Learning-Based Battery Control for Cost Savings and Privacy Preservation for Smart Meters

Chronological Simulation of the Interaction between Intermittent Generation and Distribution Network. G.N. Taranto C.C.O. Hincapié

Application of the method of data reconciliation for minimizing uncertainty of the weight function in the multicriteria optimization model

No THE MUSEUM PASS GAME AND ITS VALUE REVISITED. By Arantza Estévez-Fernández, Peter Borm, Herbert Hamers. January 2004 ISSN

Customer Learning and Revenue Maximizing Trial Provision

Optimal Pricing, Subsidies and Solar Panels

Collective Ratings for Online Communities With Strategic Users

Buyer Heterogeneity and Dynamic Sorting in Markets for Durable Lemons

Mixed Integer Programming for HVACs Operation

IMPACT OF THREE-PHASE PSEUDO-MEASUREMENT GENERATION FROM SMART METER DATA ON DISTRIBUTION GRID STATE ESTIMATION

Using Utility Information to Calibrate Customer Demand Management Behavior Models

For example, the California PUC recently required all IOU s to submit DRP s detailing the circuit level impacts of DER on the distribution system.

1.818J/2.65J/3.564J/10.391J/11.371J/22.811J/ESD166J SUSTAINABLE ENERGY. Prof. Michael W. Golay Nuclear Engineering Dept.

The Impact of Rumor Transmission on Product Pricing in BBV Weighted Networks

BIG DATA ANALYTICS: IMPACTS ON AMERICAN ELECTRIC UTILITIES

Wireless Network Pricing Chapter 5: Monopoly and Price Discriminations

Product Upgrading Decisions under Uncertainty in a Durable Goods Market

Sharing Information to Manage Perishables

Creating value through the digital energy (r)evolution

Pricing and Return Policy Decision for Close Loop Dual Channel Supply Chain

Statics and Dynamics of Complex Network Systems: Supply Chain Analysis and Financial Networks with Intermediation

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL., NO., 1. Dynamic Spectrum Sharing Auction with Time-Evolving Channel Qualities

Applying Stackelberg Game to Find the Best Price and Delivery Time Policies in Competition between Two Supply Chains

OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Inventory Models - Waldmann K.-H

SIDT 2015 Transport Systems Methods and Technologies POLITECNICO DI TORINO SEPTEMBER 14-15, 2015

Privacy, Information Acquisition, and Market Competition

Logistic and production Models

Pricing Coordination in Supply Chains through Revenue Sharing Contracts

[Jindal, 4(11) November, 2017] ISSN: IMPACT FACTOR

Supply Chain Supernetworks With Random Demands

Business: Sales and Marketing Crosswalk to AZ Math Standards

International Journal of Industrial Engineering Computations

The Relationship between Pricing and Consumers Switching Costs: Comparisons between the Myopic and Perfect-foresight Equilibria

Information Sharing to Improve Retail Product Freshness of Perishables

Global Microgrid Tour: All Signs Point to Australia. Peter Asmus, Principal Research Analyst, Navigant

A Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand

Distribution Network Design for Distributed Renewable Energy Sources. Ben Zhang B.Eng., Beijing University of Posts and Telecommunications, 2011

Distribution System Planning. Introduction and Overview Chris Villarreal <Month> <Day>, 2017

Toward Full Integration of Demand-Side Resources in Joint Forward Energy/Reserve Electricity Markets

OPTIMAL STORAGE POLICIES WITH WIND FORCAST UNCERTAINTIES

Optimizing the supply chain configuration with supply disruptions

330 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 3, NO. 2, JUNE 2017

The Efficient Allocation of Individuals to Positions

Dynamic Load Control and the Smart Grid in NYC

A Supplier Selection Model with Quality-Driven Demand and Capacitated Suppliers

GreenTech Business Network Event Smart Energy City: Future Energy Solutions

Impact of innovation on renewables Integration on renewables

Sustainable Supply Chain and Transportation Networks

Dynamic Pricing of Wireless Internet Based on Usage and Stochastically Changing Capacity

Capacity Dilemma: Economic Scale Size versus. Demand Fulfillment

Fuel Cost Optimization of an Islanded Microgrid Considering Environmental Impact

Designing Short-Term Trading Policies for Litecoin Cryptocurrency Using Reinforcement Learning Algorithms

e-procurement and Purchase Algorithm for Supply Chain Management

LO3 Energy 573 Sackett, Brooklyn, NY /10/18. Energy Networks Australia Unit 5, Level 12, 385 Bourke St Melbourne VIC Att: Stuart Johnson

Transcription:

Overview Day One Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

Overview Day Two Reinforcement Learning at Customers Post Decision State Learning Numerical Results

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

State of our planet Air pollution caused by fossil fuels Source: http://theearthproject.com/

Sourcehttp://www.openculture.com/ / Source: http://blog.livedoor.jp Source: http://blog.livedoor.jp

Motivation: why microgrid? With distributed generation and storage, electric power can be provided when the grid is down X Substation Transmission Distribution Loads Generator Source: http://www.engineering.com

Microgrid https://www.youtube.com/watch?v=ehkdyqnu-ac

PV and Load Daily Profile

Grid Center for Control System Security 480V Microgrid Other Remote DER sites Various Loads Distributed Energy Resources

Demand Response https://www.sce.com/

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

Abstract Paper goal: Dynamic pricing and energy scheduling in microgrid. Customer: consuming electricity Utility: electricity Generators Service Provider: Buy electricity from Utilities and sell to customer. Method: Reinforcement learning implementation that allow costumers and service providers (SP) to strategically learn without prior information. Service Provider (SP) Source: energy.gov Power generation unit Residential

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

System Time Slot To consider the variable load consumption of customer and retail price of electricity during a day, the set of period H={1,2,3,..,H-1} was introduced. Each time-slot t maps to period h from set H using equation : h t =mod(t,h) At each time slot, SP change the retail price.

Service Provider design Utility Energy Cost Function: c t (.) Service Provider Source: ww.hydrogencarsnow.com Energy consumption of customers : The SP buys the electricity from the utility with the price of c t (.) chosen from finite set C. The price c t is a function of time t and loads consumption i I e i t.

SP and Customers Retail Price : a t (.) e 1 t (d 1 ) Service Provider e 2 t (d 2 ) e 3 t (d 3 ) e 4 t (d 4 )

In the microgrid system, the system provider determine the retail price function of the system ( a t can be a second order equation of customer consumption. The set of SP action, or retail price options are limited to a set A with n member A={a 1,a 2,,a n }. SP charge any customer a t (e it ), where e it denote customer energy consumption at time t.

Model of Customer Response Load model Each customer i has total load demand of d i t The customer i decides to consume e it < d i t 1) a i (e t i ) 2) u i (d it - e it ) 1) μ i e i t 2) λ i (d i t e i t ) Condition d it is selected from the discrete and finite set of D i disutility of customer I will be reported to The customer i cost were defined as 0 λ i 1 0 μ i 1 the sp 1) Satisfy Load: e it ; Dissatisfy Load: d it - e i t 2) Dissatisfy utility: u( d it e it ) 3) 4)

Electricity Cost of Service Provider The SP buys the electricity from the utility with the price of c t, chosen from finite set C. Transition probability from c t at time t to c t+1 We denote the SP cost as: where the first term denotes the electricity cost of the service provider and the second term denotes the service provider s revenue from selling energy to the customers.

Timeline of interaction among the microgrid component

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

Problem Formulation In this section, they first formulate a dynamic pricing problem in the framework of MDP. Then, by using reinforcement learning, They develop an efficient and fast dynamic pricing algorithm which does not require the information about the system dynamics and uncertainties.

MDP Formulation First consider customer as deterministic and myopic. Then for now, customer decision is to choose least possible cost: MDP problem was defined with 1- Set of decision makers actions 2- Set of system states 3- System states transition 4- System cost Function

SP is a decision maker. I. The SP actions is choosing a retail price from set A. II. The microgrid states if function of customers demand vector, time and electricity price III. The transition from state to next state

Continue System cost is defined as weighted sum of Sp and Customer cost: In which choosing cost. gives priority on SP or customer

The objective is to find the stationary policy that: 1) maps states to action Policy 2) minimize expected discount value

The optimal stationary policy π can be well defined by using the optimal action-value function Q : S A R which satisfies the following Bellman optimality equation: In which is optimal state-value function. Since Q(s, a) is the expected discounted system cost with action a in state s, we can obtain the optimal stationary policy as:

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

Energy Consumption-Based approximation State Two drawback Of their model: 1- large number of states 2- can t access customer states due to privacy To solve problems, they came up with new states: Where

Energy Consumption-Based Approximation State Since D i t is set of independent variable, by the law of large number the i D i t I gose to expected value. In the practical microgrid system with a large number of customers, a provides enough information for the service provider to infer the D t.

Overview Introduction & background Goal & objectives System Model Reinforcement Learning at Service Provider Energy Consumption-Based Approximated Virtual Experience for Accelerating Learning

Virtual Experience Definition Definition: Experience tuple is define as Update multiple state-action pair at each time. set of equivalent tuple: If we have these two conditions: Set of equivalent tuple which are statistically equivalent

Assumption: Virtual Experience in The System SP has a transition probability of p c (c t+1 c t, h t ) Set of equivalent experience tuple:

Recap Introduction to Microgrid SP and Load Model MDP formulation for SP to minimize system cost Presenting two methods for reducing space of Q-learning

Overview Day Two Reinforcement Learning at Customers Post Decision State Learning Numerical Results Conclusion

Customers Problem Formulation Q-learning for customer 1- Set of decision makers actions A set of finite energy consumption function 2- Set of system states A set of customer I s states 3- System cost Function

Overview Day Two Reinforcement Learning at Customers Post Decision State Learning Numerical Results Conclusion

Post Decision State Learning Definition By introducing the PDS, we can factor the transition probability function into known and unknown components, Where the known component accounts for the transition from the current state to the PDS, And the unknown component accounts for the transition from the PDS to the next state.

PDS and Conventional Q Relationship The optimal PDS value function and conventional Q learning relation Given the optimal PDS value function, the optimal policy can be computed as

PDS for Learning Proposition 1: and are equivalent. Therefore, it can use the PDS value function to learn the optimal policy. While Q-learning uses a sample average of the action-value function to approximate Q*, PDS learning uses a sample average of the PDS value function to approximate.

Post Decision State Learning

States definition Post Decision State Learning Costumer i have information about its consumptions and its cost State transition probability

State value function of customer I s state and PDS PDS optimal policy

PDS Learning Algorithm

Overview Day Two Reinforcement Learning at Customers Post Decision State Learning Numerical Results Conclusion

Numerical Results H=24 Total of 20 customers Load profile

Backlog rate Parameters

Performance Comparison With Myopic Optimization Set cost coefficient ρ = 0.5 Set Q-learning discount factor γ=0 1. The average system costs increase as λ increases in both pricing algorithms. 2. The performance gap between two algorithms increases as λ increases Performance comparison of our reinforcement learning algorithm and the myopic optimization algorithm varying λ

Set λ = 1 Impact of Weighting Factor ρ 2) As ρ increases, the cost of Customers decreases, and the cost of the service provider increases 3) As ρ increases, the service provider reduces the average retail price. Impact of the weighting factor ρ on the performances of customers and service provider.

Virtual Experience Update Set λ = 1 and ρ = 0.5 They Claimed : We can observe that our algorithm with virtual experience provides a significantly improved learning speed compared to that of the conventional Q-learning algorithm.!!!

Customers With Learning Capability Set λ =1 lower average system cost lower customers average Acceptable performance for ρ = 0

PDS Learning VS. Conventional Q Learning set λ = 1 and ρ = 0.5

Overview Day Two Reinforcement Learning at Customers Post Decision State Learning Numerical Results Conclusion

Conclusion This paper formulate an MDP problem, where the service provider observes the system states transmission and decide the retail electricity price to minimize the total expected cost of customer disutility. Each customer can decide its energy consumption based on the observed retail price aiming at minimizing its expected cost. The Q learning algorithm can be used to solve Bellman optimality equation when we don t have a prior knowledge about system transition. The type of customers and their disutility function can change optimization results. Industrial loads may have a high dissatisfying utility.

Conclusion System with high λ has high system cost; High value for λ indicate that customers are shifting their extra loads to the next hour. Since they shift their loads every time, they have almost the same profile after demand response. The effect of virtual experience depends on the number of different cost function in the set C. Q-learning with the big λ show big system cost. However, when loads have the learning ability, the big λ will has less impact on the system cost. Three presented methods Including Energy Consumption Based Approximate State, Virtual Experience, and Post-Decision Learning had an effective response on accelerate the Q-learning algorithm.

Customer learning capability, significantly reduced system and customers cost.

Future Work Studying the strategic behaviors of the rational agents and its impact on the system performance. Considering the impact of various type of energy in dynamic pricing.

References 1. Kim, Byung-Gook, et al. "Dynamic pricing and energy consumption scheduling with reinforcement learning." IEEE Transactions on Smart Grid, (2016 ) 2. Mastronarde, Nicholas, and Mihaela van der Schaar. "Joint physical-layer and system-level power management for delay-sensitive wireless communications." IEEE Transactions on Mobile Computing 12.4 (2013): 694-709. 3. N. Mastronarde and M. van der Schaar, Fast Reinforcement Learning for Energy-Efficient Wireless Communications, technical report, http://arxiv.org/abs/1009.5773, 2012..