Working with rail timetable data

(yes, a post not about smartcards!)

I’m writing this post in the hope that it’ll aid people in working with CIF (railway timetable) data in general, as a pointer as to the nuances of the data and how to handle it. All my findings are, generally, implemented within my open source (and licenced under GPL) application called CIFReader, available on Github.

It’s written in C++ and requires the Boost and MySQL++ libraries in order to operate and be compiled. Anyone who wants any help is more than welcome to email me, but on the proviso that I am normally very busy.  Any comments or additions to the post, please email me too.

Associations

Associations are, arguably, the hardest part of the entire schema in order to be able to deal with programatically. There are a few core segments that need to be handled separately.

Basic handling

Associations refer to a “main” and “associated” service. Having played with the data for a while, it is OK to assume that a main train will generally actually be the main service in play.

Example

AANW85711W859601112111212020000001VVSELGH     TP    ... P

Schedule W85711 is the 2054 London Waterloo to Poole and W85960 is the 2226 Eastleigh to Portsmouth Harbour. Intuitively, together, this is the 2054 London Waterloo to Poole & Portsmouth Harbour via Eastleigh.

This association is a WTT association (indicated by the P) running from 11th December 2011 to 2nd December 2012 on Sundays only. It divides at Eastleigh and the associated service starts on the same day as the first.

Midnight handling

Intuitively, it would make sense that the date indicator on the association indicates to the associated schedule’s start date at the association point. This, however, isn’t the case. The date indicator refers to whether the associated schedule starts the same day, day after or before the main schedule starts. For instance on the northbound Highland Caledonian Sleeper, the Fort William and Aberdeen portions refer to the “next day” on the association.

Example

AANG60813G600791112251212020000001VVNEDINBUR  TP   ...  P

Schedule G60813 is the 2057 Euston to Inverness, G60079 is the 0440 Edinburgh to Aberdeen. Importantly, this association occurs at Edinburgh and the association is on the next day. It’s important to note that while it is easy to handle the association for the main service – the association must be looked up carefully for the associated service, particularly with regard to STP overlays and cancellations.

Cancelling/deleting

Cancelling and deletion are quite straight forward as per a normal schedule, but certain considerations need to be taken into account.

CIF does not give out the TIPLOC instances when giving data out with respect to a cancellation or an association deletion. This can be extremely awkward in cases where (for any reason) two trains associate twice – in most cases, this is probably an error…but there will always be the odd occasion where it’s reality.

For this reason, a decision needs to be taken whether to delete all references to an association or just remove the first match. In CIFReader, I choose to remove all associations as it is likely that in future updates it could be clarified further with a new or revision record.

I don’t have a brilliant example of this as it’s hard to pinpoint (for obvious reasons) but the best example I can give is of 5K61 Bedwyn to Bedwyn. From looking at this schedule, you can see that there are two (albeit one is incorrect) associations in place to 1K61.

Basic Schedules

One minor issue I have spotted with basic schedules when trying to dealing with historical data. It is possible to be given three separate STP indicators for a schedule. Those being a WTT, CAN and then a VAR. In CIF, this is a P, C then O. The specification suggests that it is only possible to receive a P then a C or O.

If you only receive full extracts or run updates and drop all historical records once their end date has passed, then this is not going to be an issue for you. CIFReader is built, largely, to allow a historical record to be created as well.

Example (the schedule has not been printed, as it’s not relevant – note that the records have been shortened with .. where it was whitespace to make it fit the page!)

BSNW154581112111212020000001 POO2D06..124936004 EMU357 100D..S..P
BSNW154581203111203110000001 POO2D06..124936004 EMU357 100D..S..O
... in a later file ...
BSNW154581202261203250000001        ..1                       ..C

Clearly, we can see here three new schedule records, I’ll explain each one in order:

  1. W15458 is a WTT schedule, running from 11th December 2011 to 2nd December 2012, on Sundays only.
  2. W15458 then receives an overlay on 11th March 2012.
  3. W15458 then is cancelled on Sundays from 26th February 2012 to 25th March 2012.

Clearly, if you are trying to build an item of historical record, then you will need to consider this in mind. My preference is to build my own CIF of a given window and be able to load that up.

In any case, knowing this, it may be wise to use an order of priority in order to be able to use STP indicators historically. The ideal preference would likely be NOCP, however, the choice is yours!