Tracking the trains, in realtime

Heatmap of UK

Realtime Trains generated heatmap of delays across the UK

Over the past few weeks, I’ve been working away on realtimetrains.co.uk which is the evolution on from my original train timetables websites.

Ever since RailMiles was first launched, I’ve been trying to find ways of making it easier to insert journeys. The main problem was, always, trying to find a reliable source of actual train times. I dabbled with the Live Departure Boards Web Service for a while, before it was locked down, but found it rather cumbersome to use.

I’m writing this post to mark the a milestone of the release of Realtime Trains and this posts details the functionality that the site is able to perform.

Getting it all online…

Quite a few people have sent me emails with comments about the site – and I thank all those who have. If you have any comments, please email me or leave a comment on the bottom of the blog post.

It’s taken a long process of about 4 months of development with the realtime data feeds, and since September 2011 for the raw scheduling data. It’s a time consuming process, but I’m happy with where it is. The old site, traintimes.im as it was then, ran on a single server and was able to handle the load demands.

Realtime Trains runs on Amazon Web Services’ Elastic Compute Cloud (EC2) in Ireland. It’s core is a cluster of 3 instances which provide 2 database servers and 1 compute layer. The web instances are load balanced in order to try and provide an efficient service and scales up and down as necessary. It’s a very expensive project to run for a student so I’m trying to find ways of monetising it. I’m going to be releasing an API to access the underlying dataset that I hold (there is more to it than what Realtime Trains shows) and so hopefully there can be some distance made from that. It’ll be free up to a certain use limit.

TRUST

The main feed that most people will be making use of is the TRUST feed – which covers most of the railway in the UK. My main bugbear was the problem of intermediate stations that don’t report – but that is where the TD feed came in. I’ll talk about that a bit later.

What I’m doing with TRUST?

I take in the full ‘firehose’, so to speak, and process it against the schedule database. There are a couple of different types of messages that come down the feed and are documented in the developer information. There’s a bit of a ‘blackout’ overnight with fewer messages than would be expected coming down, and there are some hacks to get around this as it affects the reporting of trains during the next day.

Doing things that TRUST can’t tell us

From the TRUST data, we can get data that TRUST itself can’t tell us. For instance, if a train calls at a station but happens to “depart” before it “arrives” – then the train failed to stop at that station. TRUST can’t tell us about intermediate station cancellations.

0933 Waterloo to Guildford, ran fast between Vauxhall and Wimbledon

0933 Waterloo to Guildford, ran fast between Vauxhall and Wimbledon

The screenshot also highlights another one of the problems – no reports. I’m working on something that can extrapolate the actual timings where we don’t have any TD data using a similar system to how I predict forward timings.

Predicting the train times is another major issue for any website that is attempting to aim itself towards the passenger. TRUST can’t tell us anything about what it expects a train to do other than when telling us about a movement telling us the run time to the next location in the schedule. In order to make the predictions I make a little more accurate, I’ve made use of the RailMiles Mileage Engine to “fill” a schedule with all its intermediate passing locations and then using TRUST “off route” reports and TD messaging to fill it in. All this gives me more data to predict with.

1605 Waterloo - Weymouth, just passing Fleet

1605 Waterloo - Weymouth, just passing Fleet (reported via TD)

You can see what sort of data a report is using by rolling over the time in the realtime column. For times that are actual (shown in bold), you can see the following:

  • TRUST (SMART) – a report from TRUST, via SMART. SMART is the TD aggregator that TRUST uses to create automatic reports.
  • TRUST (SDR) – a report from TRUST, via SDR. This means that a signaller or other person with direct input has entered the report into TRUST.
  • TD (xx) – a report generated by Realtime Trains from a TD feed in area “xx”. At time of writing, this can be WI (Wimbledon), SU (Surbiton/Woking), BE (Basingstoke), EH (Eastleigh) and ZB (Bournemouth).
  • RTT – an extrapolated report based on other actual realtime data. This doesn’t appear in many locations but I hope that it won’t appear in many places!

TD

TD is, arguably, the more exciting feed. TRUST itself, effectively, takes TD data and looks up moves against a database of berth moves. Realtime Trains does the same thing, but with more locations.

By using this data, it’s possible to do a number of things that you simply cannot do with just the TRUST data.

More reports

1339 London Waterloo to Poole, reports at all stations (showing Totton to Poole)

1339 London Waterloo to Poole, reports at all stations. Shown here is Totton to Poole.

Along my route, London Waterloo to Weymouth, TRUST doesn’t report in all places. My aim is, eventually, to map every station to a set of TD moves where they are available. The best place to start was between Brockenhurst and Bournemouth, where no reports are available at intermediate stations.

Platforms at terminal stations in advance

Departures from London Waterloo

Departures from London Waterloo at 1712. Platforms shown in bold are confirmed.

In areas where TD information is available and used by Realtime Trains, it’s possible for it to show confirmed platforms in advance of departure. These, normally, will be in line with when the platforms are advertised on the station departure boards but may be in advance. Realtime Trains confirms the platform when a signaller places its identity in the signalling system on the departure platform. It’s not flawless, so keep an eye in disruption, but it works and is accurate most of the time.

Advance notice platforms when a train is running

1650 London Waterloo to Woking

1650 London Waterloo to Woking, with confirmed platforms in advance at Hersham, Walton-on-Thames and Weybridge

As trains are running along, it’s possible to confirm platforms in advance of a train’s arrival by taking advantage of the track layout along the route. In this instance, as the train went onto the slow line after Surbiton, it is able to confirm the platforms up to Weybridge as there are no ways of changing the line the train is on until there. Upon departure from Weybridge, the platform will confirm at Byfleet & New Haw – and the same will happen for West Byfleet upon departure from there.

In the future, I can expand this to determine whether a train can’t call at a station purely by the line it is on. The reason why we have to do this is simple – we don’t have access to the feeds that TOCs use to advertise changes. When we do – most of this will be unnecessary; even then, some trains are changed at very short notice and Realtime Trains will still be able to take advantage of this.

Where TD currently reports?

My aim, as declared earlier, is to cover all the network with TD reports (where available). At the moment, though, there are reports on the following TOCs and lines:

  • CrossCountry – Basingstoke to Bournemouth
  • First Great Western – Romsey to Fareham via Southampton Central and Chandlers Ford
  • Southern – Southampton Central to Fareham
  • South West Trains – London Waterloo to Weymouth, Hampton Court branch, Eastleigh and Southampton Central to Fareham, Eastleigh to Romsey via Chandlers Ford, Lymington branch

Mashing the data together

Using a mixture of TRUST and TD data, I can create snapshots of the network. I currently do this in two ways.

Creating a heatmap

Snapshot of the network as at 1705 29th October

Heatmap of delays on the rail network at 1705 on 29th October

I’ve created a heatmap (available here) that updates every 15 minutes showing the current state of play, with respect to delays, on the network. You can zoom in on your area and the heatmap will redraw within the window available. The brighter the colour, the greater the delays.

Station updates

Kingham railway station, current delays of up to 15 minutes

Kingham station. At 1710, there were delays of up to 15 minutes at this station.

Every station updates itself every 5 minutes to give a fresh snapshot of services at that station. It looks at services running 30 minutes either side of the time the snapshot was generated and takes a look at all the delays and cancellations of services at the station. Through that, an assessment is made as to the current performance and a yellow or red banner may appear summarising the present situation.

On the example shown, Kingham station currently has one train that is expected 14 minutes late. Realtime Trains is able to anticipate that other services at the station could be similarly delayed – and while it isn’t predicted on the Worcester train – it shows the banner in order to recognise the possibility.

Where to go in the future?

There’s a lot more data that could be released by different parties (and work is being done by others with it) and this can make the site much more helpful in different ways.

In the meantime, I’m going to continue along the path I’ve chosen and will continue aiming towards trying to reduce reliance on a huge number of different systems. There’s also the small matter of the third year of my degree – which now takes priority.

Technical Detail

The TRUST messages are described and the strategy I take with them is listed below (skip over it if you’re not technically inclined!):

  • 0001 (Train Activation) – these messages, if a train is activated automatically, are sent an hour or two hours before the train departs its origin. They can also be activated manually – the earliest activation I’ve seen is 13 hours ahead of departure. Using the tp_origin_timestamp field (other fields are affected by timezone issues), I transform it into a running date and search for it against the UID. This then ‘activates’ the schedule.
  • 0002 (Train Cancellation) – these messages cancel a service the remainder of a schedule from a point X. These can be changed using another cancellation message or reversed using a reinstatement.
  • 0003 (Train Movement) – these messages represent a train arriving or departing from a point. For trains passing a location, you will still receive arrival and departure messages. This is the main message where the earlier missing ones (typically those activating trains) cause a problem. The solution I developed during Loco2‘s excellent #trainhack related to automatically activating these services as there is plenty of metadata available from movements. I’ll describe that solution later – or you can look at the source code when it’s available (soon).
  • 0004 (Train Unknown Movement) – I’m not presently dealing with these, and there aren’t that frequent. TRUST is pretty good at tracking trains 🙂
  • 0005 (Train Reinstatement) – reverses a Cancellation message. That’s all you really need to know.
  • 0006 (Trust Change Of Origin) – similar to a cancellation message, it cancels a service from the beginning of its path to a point X. It can be changed and/or reversed using a change of origin message. I made the mistake of assuming a reinstatement message could reverse both initially – it’s confusing.
  • 0007 (Train Change of Identity) – again, not too many of these. These only really apply when TRUST has changed the running identity of a train and typically only relates to freight train services. A great example of how this applies can be seen on Wikipedia on the last example.
  • 0008 (Train Change of Location) – I can honestly say that I don’t think I’ve ever seen *any* of these messages. So I don’t know how to handle them.

How do you solve missing TRUST messages?

The greatest impact a missing activation message can be seen if there’s disruption early in the morning. The reliance on TRUST for an activation message means that if a train is cancelled, you can’t show it as you don’t know how TRUST is referring to a train. I’ve managed to get around this, partially, by working using train movement metadata to refer to a train.

I’ve not yet come up with a solution for cancellations – but my solution for movements is simple and, really, means that you don’t need an activation at all in normal conditions. All I use is the Train ID, Planned Timestamp (bear in mind they’re always in a “local” timezone) and the STANOX. In order to find the train, I look it up in the normal way that I would for creating a location line-up, and then grab the 2nd to 6th characters from the TRUST identity and match against the reporting number and planned departure time (train ID in the schedule). If it’s a match and it hasn’t been TRUST activated, then I’ll pick it, activate it and apply the movement to it.