Modifying the Seed Matrix in the Iterative Proportional Fitting Method of Transit Survey Expansion [ITM # 92]

Size: px
Start display at page:

Download "Modifying the Seed Matrix in the Iterative Proportional Fitting Method of Transit Survey Expansion [ITM # 92]"

Transcription

1 Paper Author (s) Sujith Rapolu (corresponding), AECOM Ashutosh Kumar, AECOM David Schmitt, AECOM Paper Title & Number Modifying the Seed Matrix in the Iterative Proportional Fitting Method of Transit Survey Expansion [ITM # 92] Abstract On-board transit survey data helps in understanding the travel patterns of the existing riders and hence plays a critical role in the calibration and validation of trip-based and tour-based travel demand models. The origin to destination (O-D) flows obtained from the survey data can be utilized by transportation planners and operators for short-term operational planning and longer-term systems planning, and therefore, the accuracy of O-D flows fundamentally impacts real-world transportation systems. The traditional practice of expanding the on-board survey records to the total observed counts by direction, time of day and route is susceptible to response bias and hence may not necessarily produce reasonable estimates of O-D travel flows on the system. Expanding a seed O-D matrix using iterative proportional fitting (IPF) method, where a seed matrix is scaled iteratively to match the row and column totals is one way to address this loss of fidelity in O-D flows. But, the seed O-D matrix usually contains missing movements which can skew the O-D flows in the final results. This research brief explores an innovative and practical approach to modify the seed table in IPF method of expansion to account for missing movements based on auxiliary data sources. The authors applied this approach on a recent on-board survey conducted on the Tri-Rail commuter rail in Southeast Florida. Results show that the common way of adding a dummy value of 0.5 or 1.0 to all cells with a missing movement over-estimates very short trips by nearly 100%. To solve this problem, the authors used auxiliary data sources and knowledge about the Tri-Rail system to modify the seed matrix. Results show a much more accurate estimation of very-short trips and O-D flows. Overall, the approach demonstrated by the authors in this research brief shows the importance of using auxiliary data to modify the seed O-D matrix when IPF method is used in on-board survey expansion. Statement of Financial Interest The financial interests of the authors would not be affected by the acceptance of this research brief. Statement of Innovation This research brief explores an innovative and practical approach to modify the seed table in the iterative proportional fitting (IPF) method of on-board survey expansion to account for missing movements based on auxiliary data sources. It is important to address this issue because such

2 movements with zero values are lost when IPF method is applied without modifying the seed table. This brief also shows that the common way of adding a dummy value of 0.5 or 1.0 to all cells with a missing movement can result in over-estimation of very short trips by nearly 100%. Hence, cells with missing movements need to be modified only when that particular movement is known to occur in real world. This results in much more accurate estimation of O-D flows, which leads to better calibration and validation of the trip-based and tour-based models, and more suitable transportation systems.

3 Modifying the Seed Matrix in the Iterative Proportional Fitting Method of Transit Survey Expansion Prepared for the 2014 TRB Conference on Innovations in Travel Modeling, Baltimore, Maryland Sujith Rapolu Ashutosh Kumar David Schmitt Objective The main objective of this research brief is to develop a practical approach of addressing the missing movements in the seed matrix when the iterative proportional fitting (IPF) method is used to expand onboard surveys. Purpose and Innovation On-board transit survey data helps in understanding the travel patterns of the existing riders and hence plays a critical role in the calibration and validation of trip-based and tour-based travel demand models. The origin to destination (O-D) flows obtained from the survey data can be utilized by transportation planners and operators for short-term operational planning and longer-term systems planning, and therefore, the accuracy of O-D flows fundamentally impacts real-world transportation systems. The traditional practice is to expand the on-board survey records to the total observed counts by direction, time of day and route. The main drawback of this expansion process is that it is susceptible to response bias and hence may not necessarily produce reasonable estimates of O-D travel flows on the system. This practice may result in an inconsistent understanding of the travel markets, and potentially distort any forecasts made by the travel demand model. Expanding a seed O-D matrix using IPF method is one way to address this loss of fidelity in O-D flows from the conventional expansion process. But, the seed O-D matrix usually contains missing movements which can skew the O-D flows in the final results. This drawback in the IPF method creates a need to develop a framework that addresses the missing movements in the seed matrix in order to maintain accuracy. This research brief explores an innovative and practical approach to modify the seed table in IPF method of expansion to account for missing movements based on auxiliary data sources. It is important to address this issue because such movements with zero values are lost when IPF method is applied without modifying the seed table. This brief also shows that the common way of adding a dummy value of 0.5 or 1.0 to all cells with a missing movement can result in over-estimation of very short trips by nearly 100%. Hence, cells with missing movements need to be modified only when that particular movement is known to occur in real world. This results in much more accurate estimation of O-D flows, which leads to better calibration and validation of the trip-based and tour-based models, and more suitable transportation systems. Methodology In IPF method, a seed flow table obtained from the on-board survey data is scaled iteratively to match the row and column totals. The seed flow table is the number of survey records between the origins and destinations. The control totals for the rows and columns can be obtained from available data sources like Automatic Passenger Counters (APC) or from the manual counts. At the least, seed matrices must be developed by direction to properly account for the origin-to-destination flows. Depending on the available counts and the survey response rate, additional seed matrices can be developed by period or by groups of trains/buses.

4 One issue with the IPF method is that any missing movement in the seed matrix would result in zero trips for that particular movement even after expansion. This could be a problem, particularly when it is known that the movement happens in the real world. The simplest way to account for this is to insert values of 0.5 or 1.0 for all missing movements in the seed table. The drawback of this approach is that this insertion is not based on any knowledge of the system and it assumes that all the missing movements occur in the real world and are equal to each other. An enhanced method of adding synthetic records is to add a record only for those movements that are known to happen in real life. This process will depend on the characteristics of the transit service on which the on-board survey is conducted and the availability of auxiliary data. For example, fare-gate station-to-station movement data can be used to add these synthetic records. In this research brief, synthetic records were added based on past knowledge of the transit service. Once the seed matrices are developed by appropriately adding synthetic records, they are expanded to the control totals using IPF method. The resulting matrices can then be used to develop expansion factors for the survey records. Any travel markets now developed from the expanded survey will have a better representation of O-D flows. Case Study A recent on-board survey conducted on the Tri-Rail commuter rail service in Southeast Florida is used to demonstrate the process of adding synthetic records. The Tri-Rail alignment generally runs in a northsouth direction parallel to and west of I-95, often several miles to the west of the concentrated development of the region s major downtowns (West Palm Beach, Fort Lauderdale, and Miami). The weekday headway is minutes in the peak period and 60 minutes in the off-peak period with a minutes travel time between Mangonia Park (just north and west of West Palm Beach) and Miami International Airport (about 5 miles west of downtown Miami). Tri-Rail currently serves 17 stations (Miami International Airport is temporarily closed due to construction) in three different counties. The final sample rate of the cleaned survey was 26% of the daily boardings. IPF method was applied by direction and by time of day. It was accomplished using CUBE s FRATAR program. First, a time period was defined for every time point in the scheduled time table. The three time periods that were used are the same as those used in the Southeast Regional Planning Model (SERPM 6.7); AM Peak (6:30 AM 9:30 AM), PM Peak (3:30 PM 6:30 PM) and Off Peak (9:30 AM 3:30 PM & 6:30 PM 6:30 AM). Next, for each useable survey record, a time period was assigned based on the origin station and train number (this provides an estimate of the boarding time of the surveyed rider). Six station-to-station flow tables (three time periods in each direction) were then developed from the survey records for each of the three time periods and the two directions. Station level door counts were available for all the trains operating on the survey date. These counts were aggregated at the station level by direction, and by each of the three time periods. The resulting ONS and OFFS by direction, period and station were used as the origin and destination control totals in the IPF method. The station-to-station flow tables from the survey records had some missing movements. To account for this, synthetic records were added. First, the seed matrix was modified by adding values of 1.0 to all cells with missing movements. Applying IPF method on this modified matrix resulted in an abnormally high number of 1-station trips. This result deviated from observed Tri-Rail trip patterns in the past which showed very little 1-station trips. To analyze the impact of the weight of the synthetic record being added, IPF method was applied again on the seed matrix, but, a weight of 0.5 was instead used for all cells with missing movements. There was a drop in the 1-station trips from 780 trips (using a weight of 1.0) to 659

5 trips (using a weight of 0.5). But, this was still high in comparison to the past Tri-Rail trip patterns. Hence, the following procedure was eventually adopted while adding these records: Past Tri-Rail surveys showed that one-station trips accounted for only about 2-3% of all the trips. Hence, no synthetic record was added for missing movements involving only one station. For every other missing movement, a value of 1.0 was added in the station-to-station flow matrices. Applying IPF method on this new seed matrix resulted in a magnitude of 1-station trips which was consistent with the observed data. The total number of 1-station trips was down to about 379 trips, which represented about 2.6% of the total Tri-Rail trips. It should be noted that the easiest way of modifying the seed matrix as discussed previously, resulted in an over-estimation of the very short trips by nearly 100%. Also, once the seed matrix is modified by adding synthetic records, it is important to assign some trip characteristics to these records so that the total trips are maintained when the survey records are expanded. The synthetic records that were added in this survey took the average trip characteristic (trip purpose, access mode and market segment) of all the corresponding survey records which had either the same origin or destination station as that of the synthetic record. For comparison, the survey was also expanded using the traditional method of scaling the records to ONS by period and by direction. Table 1 compares the observed boarding and alighting activity by station with that obtained from the two expansion processes. Table 1: Comparison of station activity from the two expansion processes with observed counts Difference with % Difference with (ONS+OFFS)/2 observed observed Station name Observed IPF Traditional IPF Traditional IPF Traditional Mangonia Park 1,056 1,045 1,159 (11) 103-1% 10% West Palm Beach 1,208 1,195 1,317 (13) 108-1% 9% Lake Worth (100) 3% -11% Boynton Beach % 0% Delray Beach (18) 2% -3% Boca Raton 1,488 1,464 1,834 (24) 346-2% 24% Deerfield Beach (29) 1% -4% Pompano Beach (115) 1% -15% Cypress Creek 1,030 1, (77) 0% -7% Fort Lauderdale % 0% Fort Lauderdale Airport (41) 0% -6% Sheridan Street % 5% Hollywood % 0% Golden Glades (115) 1% -20% Opa-locka (0) 25 0% 8% Metrorail Transfer Station 1,238 1,198 1,081 (39) (156) -3% -13% Hialeah Market/MIA (8) 42-1% 6% Total 14,300 14,300 14,300 (0) (0) 0% 0%

6 It can be seen that the traditional method of expansion doesn t properly represent the boarding/alighting activity at the station level. The station level activity is either over-estimated or under-estimated. On the other hand, the IPF method of expansion results in station activity which is very similar to the observed counts. Table 2 shows the delta station-to-station flow table in a production-to-attraction format between the IPF method and the traditional method of expansion. As can be seen from the table, the two expansions produce very different travel patterns, especially the productions and attractions to Boca Raton which are over-represented by 16% and 30% respectively in the traditional expansion. Given the results from table 1, the travel patterns from IPF expansion provide more faith. Table 2: Delta station-to-station flow table (Traditional expansion minus IPF expansion)* Attraction Station Production Station Total Total % 1-Mangonia Park % 2-West Palm Beach % 3-Lake Worth % 4-Boynton Beach % 5-Delray Beach % 6-Boca Raton % 7-Deerfield Beach % 8-Pompano Beach % 9-Cypress Creek % 10-Fort Lauderdale % 11-Fort Lauderdale Airport % 12-Sheridan Street % 13-Hollywood % 14-Golden Glades % 15-Opa-locka % 16-Metrorail % 17-Hialeah Market/MIA % Total % Total % 0% 9% -14% -7% -8% 30% -9% -22% -5% 0% -6% -6% -10% -23% 0% -10% 5% 0% *Top and bottom 5 cells inside the table are highlighted. Also, top and bottom 2 row and column totals are highlighted Conclusions and directions for future research Using IPF method to expand the transit on-board surveys accounts for the O-D flow movements which get ignored in the common method of expanding to the total boardings. This is very helpful in properly informing the regional trip-based and tour-based travel demand models about the flow of trips on the transit system. But, the seed matrix in IPF method usually has missing movements. This research brief discusses a practical approach to add synthetic records to account for missing movements in the seed matrix. On-board survey on a commuter rail in southeast Florida was used to demonstrate that the easiest way of adding values of 1.0 or 0.5 to all missing values can result in over-estimation of a crucial trip pattern by nearly 100%. An enhanced method of adding the synthetic records solved this issue. Further, the final expansion results showed a better station level boarding/alighting activity and flow movements by using IPF method, in comparison to the traditional method. Although this brief discusses a method to account for the missing movements in the seed table used in IPF, there is a potential for further research. The case-study demonstrated in this brief gave an equal weight of 1.0 to all those missing cells which were not 1-station movements. This was done because no auxiliary data was available to quantify the number of trips on movements involving more than 1-station. But, further research can be done to determine the weight of all the synthetic records being added to the seed matrix. The weights would ideally be substantiated with some observed data on trip movements in the transit system.