As gorkem and Daniel mentioned, we could not make comparion runs in QA, so We develop our own pipeline for testing.
The testing workflow is basically pull the TMI information when it’s published find interesting TMIs we want to test on.
TMIs involving more than 20 flights at least
In total we collected 256 SLAM runs of 40 TMIs from Jan 9th to March 3rd.
select SLAM runs whose inputs align with our generated datasets
using the FLASH 2.0 pipeline to calculate the summary stats for both model solutions
In this way, we are able to make apple to apple comparison on the performance of the SLICK OR model and the FLASH 2.0 model.
Since FLASH 2.0 is considerting downline impacts. And given that swapping the TMI flights would not essentially change much of the delay of the TMI flights themselves, so there would not much reduction on the average delay of the TMI flights. but we can observe significant reduction on downline flight delays
This plot plots the SFO TMIs we tested from jan 19 to March 1st,
As we can see, the light blue bars are the perc, the dark blue bars the
And also, the TIMs are sorted based on the intensity of the TMIs, which are eveluated based on the average edct delay mins. we can see that generally in more severe TMIs, the reduction on the average downline delay is more significant
Next we want to investigate how FLASH 2.0 address the crew connections challenges we face in severe IRROPs. The figure here plots the numbers of infeasible crew connections for both TMI flights and downline flights in the SFO TMIs. And the benefits of FLASH 2.0 in reducing infeasible crew connections emerge in severe TMIs like the last three TMIs, whose edcts are greater than one and half hours.
and the plot on the next flghts, which is the average crew connection delay minutes, further validate the implicaiton. The most significatn reduction is observed in the most intense TMIs.
On the turn delay aspects, the average delay is significantly reduced consistently. And on average, a 24 reduction is observed.
And similarly, the number of late turn flights are consistently reduced.
Lastly, since FLASH 2.0 also takes costumer itinerary info, we also investigate the performance improvement regarding the customer misconnections. So overall there is a 27% reduction in custommer connection.
The TMI involved in this plot is not limited to SFO TMIs.