SEO-friendly slugs – Guy Pursey

Last week, I posted on cross-linking within my blog. It was only after I published the post that I realised I’d got some elements of the linking wrong.

The problem

Who would have thought that the simple act of linking from one post to another would be so fraught?

This is an issue of trying to maintain my own versioned channel-neutral system locally but pushing out to a system that has its own conventions. At the moment that external system is Scriptogram. It turns out that Scriptogram’s slugs, which I built a script around last time, will only take alphanumeric characters, will only represent them in the lowercase, and will only allow an additional hyphen character to delimit units within the slug.

I didn’t realise this until after I’d written my script. It doesn’t help that Scriptogram’s own support site has been unavailable for some time. All the more reason to maintain that channel-neutral system, so that I can switch when I’m ready.

The solution

There are a couple of different options here.

I can rename all the folders and files in my local system and push to Scriptogram, or I can rewrite the script that pushes out to Scriptogram to rename them for me as they go out.

Looking into Scriptogram’s restraints on slugs and thinking about why those restraints were there got me thinking about my own filenaming conventions again.

Currently, folder and filenames on my local system are prefixed with a hyphen-separated date, an uppercase T, a hyphen-separated time, and an underscore, so that the datetime stamp is clearly delimited from the rest of the slug. Lowercasing the T is not a problem – it’s quite ugly anyway – and replacing the underscore with a hyphen is fairly uncontentious. However, I did wonder if I might want, in future, to vary the level of detail in the datetime stamp. At some point for example I might include the seconds in the time part, or I might recede the detail so that only the date appears but not the time. With multiple hyphens and no clear delimiter, would it then be clear where the datetime part ended and where the rest of the slug began?

Perhaps it doesn’t matter but the whole thing could be tidied up if I simply get rid of all those hyphens from the datetime stamp and have everything before the first hyphen represent datetime, whatever the length of that part of the string.

In this way, last week’s slug …

2014-11-23T19-37_crosslinking-blog-posts

… would become …

201411231937-crosslinking-blog-posts

This looks neater and allows for flexibility whilst being more easily machine-passable. It also instantly works for Scriptogram.

Some thoughts on URLs and SEO

Search engine optimisation (SEO) is the main reason for Scriptogram to place the restrictions it does on slugs.

By having a whitelist of alphanumeric characters and hyphens only, Scriptogram avoids problems with the URL being misinterpreted or misparsed in some way and becomes more search engine-friendly. As Moz’s The Beginner’s Guide to SEO in its “URL Construction Guidelines” section states:

Not all web applications accurately interpret separators like underscore “_,” plus “+,” or space “%20,” so use the hyphen “-“ character to separate words in a URL, as in google-fresh-factor for URLs example above.

While having SEO-friendly names in my local system isn’t necessary, these general guidelines seem like good advice for whichever channel I might post to, and therefore might as well be implemented at the local level too. It saves duplicate effort being spent later.

I did think about separating out the datetime stamp so that it was in a separate part of the path. For example, Bhagwad Park’s post about WordPress recommends using slashes to structure the URL in terms of years, months, and even categories. While the date aspect of this makes sense to me – it could enable users to delete the rest of slug to see everything belonging to a particular year – this is not a channel-neutral aspect and a certainly don’t want to structure for local system along these lines, where having all posts date-prefixed but in one folder makes it easier to organise. Categories are another issue altogether and something to be considered later.

StackOverflow seems to work on the principle of having a short numerical ID slug followed by a slash and a more descriptive hyphen-separated slug after it. The descriptive part is optional so that http://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls will take you to the same post as http://stackoverflow.com/questions/47427/. The description in theory gives potential visitors an indication of what’s on the page, though the fact that this is optional is both an advantage and a disadvantage – one can easily change the post title at any time but it’s just as easy for anyone sending on a link to hide the true nature of the page’s content.

In any case, this is again a channel-specific feature and not one that my current system on Scriptogram can implement – any slashes get stripped out of the slug by Scriptogram anyway.

What next?

I spent a lot of this week tidying up and at some point I’d like to get away from blog posts about my blog posts. It’s all a bit too recursive and reflective. I have plenty of drafts on other topics waiting to be finished and published.

Specifically though, following on from this, there are SEO guidelines on keywords, which it might be worth exploring at some later point. I need to think about this in relation to what I want my blog to achieve and need to give some more thought to keywords, categories and tags in general.