Awkward dates
Ever had an awkward date? Unfortunately, I don’t mean the one-on-one kind involving meeting a potential partner but simply the calendar variety. Working with them in JavaScript can be painful. Thinking about this led me to some deeper problems around representing ambiguous dates and varying degrees of date-precision.
Some context
You may have seen from my blog posts over much of May that I’ve been increasingly interested in data visualisation.
I’m still very much experimenting but of key interest are timelines and capturing narratives through data.
First I tried to create a draft visualisation of my journal-writing habits and I wrote a post on how my journal writing habits have developed over the last half-year.
Then I moved on to using some sample data and wrote a post exploring representing several timelines at once with the use of “swimlane” charts.
My post this week was supposed to be about how I’d developed that idea further. But then I got caught up in what seemed at first to be a little problem.
Timelines rely on dates and times. But I was coming across a number of issues with working with them in my code, not least knowing how to write them.
Date formats
First a note about date formats in general. There are many different ways to write or print a date.
UK vs US dating
The most commonly known and significant difference is between what I’ll temporarily call UK and US date formats. In the UK, we typically write the date in a DMY format with the day of the month first, then the month, then the year. In the US, it’s standard to use MDY, with the month coming before the day.
Both these formats have their own logic. While I think it makes sense to write the date in format where the units follow sequentially (the day belongs to the month, and the month belongs to the year), I can also see sense in writing a date as you would say it. This is also why in the UK there’s sometimes a tendency to write dates in the ordinal form (for example, “20th June”) rather than the cardinal (“20 June”).
As a side note it’s interesting to see that there are some countries, like Canada, Kenya and the Philippines, according to the Wikipedia page on date format by country, that use both DMY and MDY formats as standard.
Other variants
But, of course, even within these two formats there can be a lot of variance:
- Do you write the month’s name fully, use an abbrevation, or refer to it by number?
- Do you include the weekday name in the date?
- Do you write the year with the last two digits (much more common before 2000 and the millennium bug scare) or with the full four?
- Do you separate date’s components with slashes or hyphens?
All these questions before even considering the time component of the date.
Orders of magnitude
My preference has always been for something like the ISO 8601 standard which follows a format that goes YYYY-MM-DD
for the date and YYYY-MM-DDThh:mm:ss±hh:mm
for a date-time combination with timezone shown as how many hours and minutes it deviates from UCT (Coordinated Universal Time).
I think YMD formats, like the ISO format, make more sense than either of the UK or US formats mentioned above because it makes sense to start with the largest unit first. I do have a penchant for representing dates without all the punctuation and delimiters – so instead the date of today’s post being 2016-06-20
it would simply be an integer 20160620
. I’ll explore that more in detail later.
JavaScript weirdness
JavaScript has always had a lot of odd quirks and strange edge cases to be aware of – whether it’s var
-less variable declarations leading to global scope, the changing nature of this
, or even the weird effects of type coercion and operator overloading.
But I hadn’t fully appreciated until this last week just how weird JavaScript is with dates.
Some of the odd behaviour is noted in Sergei Dorogin’s post on JavaScript dates gotchas.
Fiddly fallbacks
If you create a Date object in JavaScript, it will attempt to interpret it first as an ISO-format date. But, if it’s not an ISO format, JavaScript then tries parsing it as a number of other formats to try and get a date from the string, as outlined by this MSDN page on JavaScript and date fallbacks.
This leads to some odd effects. Try a string of a single digit:
> new Date("5")
< Tue May 01 2001 00:00:00 GMT+0100 (BST)
Here it reads it as a month – namely, because the only character is 5
, May in 2001. Even though there is an obscure fallback logic behind it, this, to me, is bizarre.
Double-digit dates
Similarly, if we try two digits, at first the result seems similar:
> new Date("05")
< Tue May 01 2001 00:00:00 GMT+0100 (BST)
But with a different numeral, two characters behave very differently:
> new Date("65")
< Fri Jan 01 1965 00:00:00 GMT+0000 (GMT)
In fact, I ran a test for this using the following script and got some unexpected results:
for (var i = 10; i < 100; i += 1) { console.log(i, new Date(`${i)`)) }
I’ve summarised the results of double digit strings in a table for future reference:
String | Result |
"00" | Returns first day of January for 2000. |
"01" -"12" | Returns first day of corresponding month in 2001. |
"13" -"31" | Returns Invalid Date . |
"32" -"99" | Returns first day of January for corresponding year of 20th century (1932-1999). |
As you can see there logic to this is not immediately obvious and the behaviour comes off as very weird.
Triple-digit dates
If we try three digits, the results seem to make even less sense:
String | Result |
"000" | Returns first day of January for 2001. |
"001" -"012" | Returns first day of corresponding month in 2001. |
"013" -"031" | Returns Invalid Date . |
"032" -"049" | Returns first day of January for corresponding year of 21st century (1932-1949). |
"050" -"099" | Returns first day of January for corresponding year of 20th century (1950-1999). |
"100" -"999" | Returns first day of January for given yen (100-999). |
And then, with four digits, we’re in much more comfortable territory with each string being interpreted simply as the year, as with the first four characters of an ISO-format date string.
> new Date("0005")
< Sat Jan 01 5 00:00:00 GMT+0000 (GMT)
But all the foregoing behaviour leaves a lot of uncertainty when dealing with JavaScript dates.
So, apart from noting these peculiarities, why am I bothered by this?
Dickie dates
I became interested in this behaviour when playing with the dates in some sample data on the longevity of kingdoms in Asia from a D3 visualisation and seeing how JavaScript reacted.
But I also became interested in dates in general when typing up some notes on a book.
The book is John Dickie’s Mafia Republic. I became interested in the dates of particular events and the phases that Dickie notes in mafia history.
I started making notes like this:
Note | Start date | End date | Source | Page |
Borghese organised a riot following bombing of a fascist eagle (originally placed to mark Mussolini’s first visit to Reggio Calabria). The bomb was planted by neofascists as a pretext for the riot. | 19691027 | Mafia Republic | p163 |
Here I’ve noted the date with a YYYYMMDD
format, close to the ISO-format date described earlier but without the hyphens.
However there are parts of the book where Dickie talks about events that happen or the feeling of a particular decade like the 1980s. This led me to ask myself: what if I want to represent the 1980s as a date?
It should be possible. After all, I can specify a particular day without specifying a particular time.
With the sample “kingdoms” data I worked with in my post on improving someone else’s timescale visualisation, only the years were given. In converting the integers in the data to fit my new time axis, I had to fill in the rest of the date information – basically by adding midnight of January 1st to every start year and a minute to midnight on December 31st for every end year.
Why couldn’t I do the same for any unit in the date sequence?
Accurate ambiguity
When recording my notes myself I’ve had to allow for some ambiguity because I haven’t always got exact dates.
For example, if Dickie says something happened in the 1980s, recording exact start and end dates – like 19800101000000
to 19891231235959
doesn’t really capture the ambiguity of the information.
To capture this ambiguity accurately I’ve tended simply a shorthand in the start field; in this example, I would simply write 198
which only contains as much information as I have to provide.
Of course I need a way for my script to take this ambiguity and interpret it.
Implementing imprecision
If 198
is to refer to the 1980s then my code needs to be expand it to 19800101000000
for the start date and 19891231235959
.
I thought it would be quite easy to use a template string that could be spliced with my abbreviated date string to complete the rest of the string.
And at first I thought this might be as simple as 00000101000000
for the start and 99991231235959
for the end. So when we have 198
for the 1980s we simply take everything but the first three characters for these templated strings.
But this approach doesn’t quite work for certain parts of the string.
Monthly mishaps
For example, by the logic above, the string 19801
implicitly suggests the last three months of the year 1980 – that is, just those months whose months’ number begin with a 1
, October, November and December. Using numbering for months seems bound to have this odd effect, the idea that 0
could capture the first nine months and 1
the final three. It’s like a base-12 numeral system would be more appropriate. But as with this kind of more human readable system, we have to deal with these kinds of edge cases.
If the 5th and final digit in the string is 1
I want the next digit I complete the string with to be 2
rather than the nonsensical 9
, which would represent the following July rather than the December of the given year.
However, this still leaves a problem if a string like 19802
is given. This is technically invalid. If completed as 19802001000000
which is what we get if we used our completion to string to fill out the rest, it would be interpreted as August 1981. Would the implication then be that the range should stretch to twelve months after that date?
One can start to appreciate the problems that JavaScript has interpreting dates. And it gets even more complicated if we consider the days within a month where the end number we want varies by the month specified.
D3 date support
Luckily, D3 gets around some of the issues with JavaScript by enabling customised by strict date support.
One can read the D3 API on page on time formatting, to find out more about this. Essentially, one can specify a format as a string and D3 will parse it along those lines.
For example, to interpret dates strictly according to the ISO format, one simply needs to code:
var datetime = d3.time.format("%Y-%m-%dT%H:%M:%SZ")
datetime.parse("2016-06-20T10:00:00Z") // returns date of this blog post with time specified
If any part is missed, even the Z at the end, D3 will return the string given as null
instead of a Date object.
Using D3’s inbuilt function for this helps get around the flakiness of JavaScript’s attempt to second-guess the user.
The problem
To state the problem as clearly as possible: I want the ability to note dates precisely where needed and imprecisely in other cases, where the code can still behave strictly and unambiguously.
To do this I will need to establish a set of rules and enact these in code. This is what I intend to do soon.
Also next
I’ve spent the whole of this post talking about the data for one axis on the kind of visualisations I’ve been trying to work with.
In other posts I also want to address the verticality of these visualisations, how lanes stack, how that might be automated. So there’s plenty to consider.