Having a sitemap for your website is essential. There are lots of plugins and websites out there that can generate a simple sitemap for you. I built this site using NuxtJS and am using @nuxt/sitemap which does a fair job generating a simple sitemap for your site. The problem with this solution is that the sitemap generated is incredibly simple and won’t do much for your site’s SEO. After a couple of hours of experimenting, I was able to enhance the sitemap that the plugin generates and today I will share my solution with you!
Why Not Contribute to the Project
Let’s go ahead and get the obvious question out of the way: “Why didn’t you pull down this library, add your solution, and submit a PR for your changes?” Well, that was my initial intention. If you look at the nuxt-community/sitemap-module github page you will see that there are 5 open issues and a PR open for generating multiple sitemaps (at least at the time of this writing). There are two main contributors working on this project, they don’t seem to be too active on it, and with that kind of queue in front of me, I decided to roll my own solution. I wanted my changes to have an immediate impact on my production environment.
I’ve contributed to open source in the past and will again, but I can also tell you it takes time to get your changes merged. First, you’ve got to wrap your brain around the project’s code to gain a thorough understanding of what it is doing. You don’t want to start throwing code at something without knowing what side effects it can cause, and you have to understand how it is working to make the best contribution possible. If you make it past that hurdle, then you usually have to deal with tests. There is the chance that the developers are using a testing suite you’ve never used which means you will end up reading docs. Even if you have used it, you will still have to take the time to write the tests. Even if you can quickly add your changes and write tests, you then have to submit the PR and wait for the developers to review and merge it. I am trying to get this site to a more polished state quickly and do not want to spend the extra time required to contribute to this project right now. That might be something I circle back to at some point. Call me selfish!
An Overview of My Solution
The solution I came up with is specific to my exact needs but could easily be catered to your own needs with a few small changes. Since I created a custom blog setup, my solution is broken into two main parts: generating the XML for the main views and generating the XML for the blogs. If you are doing something similar, then this is the solution for you! Otherwise, you can still use the part that generates the extra sitemap xml for the main views for your own project with some small changes.
I created a NodeJS script that does all the heavy lifting and named it 'sitemapGenerate.js’. The script needs to be run after the sitemap plugin generates the simple sitemap.xml file as it uses this as a starter. Approaching things this way means that my script doesn‘t have to figure out which views exist as the plugin does that for me. My script only has to add to what is already generated after the 'nuxt generate’ task has been completed. After reading this article, you might notice that I am generating a lot of XML and it will seem like I might not have needed to approach things this way. Honestly, I did things this way in case I add more views to my site at a later point as I believe this will still be helpful.
Setting Everything Up
As with most modern JavaScript solutions, we need to install some npm packages. Create your script file in the root of your project (I named mine ‘sitemapGenerate.js’) and run the follwing command:
npm i -S cheerio xml-js
If have a blog setup in your Nuxt project that uses markdown files to generate your views like I am doing, then you will also need the following:
npm i -S marked esm
You might already have marked installed if your blog is generated from markdown so ignore that installation if that is the case.
Here’s a brief overview of why we need these modules:
- cheerio: to parse the HTML files generated by Nuxt. Also, if using a markdown driven blog setup, this will parse the HTML generated by your markdown.
- xml-js: to turn your sitemap XML into JSON then to turn your JSON back to XML before you write your sitemap.xml file.
- marked: to parse markdown into HTML.
- esm: If you have ES6 modules exporting your blog details for Nuxt, you will need this to import those modules into the script. The other option is to use Babel to transpile your script, but that requires more setup.
Import/Require Those Dependencies!
Alright, we have to get those dependencies into the script. If you are using a blog set up, then you also need to get your blogs’ modules imported. At the top of the script file, add the following and adjust paths to suit your specific needs:
const esmImport = require(‘esm’)(module); // it was this or a whole babel config
const Coding = esmImport(‘./content/directory/coding’);
const Gaming = esmImport(‘./content/directory/gaming’);
const cheerio = require(‘cheerio’);
const path = require(‘path’);
const fs = require(‘fs’);
const marked = require(‘marked’);
const convert = require(‘xml-js’);
const util = require(‘util’);
// array of coding blogs from content/directory/coding.js
const codingArr = Coding.default().map(item => {
return { url: `/blog/coding/${item.slug}`, item };
});
// array of gaming blogs from content/directory/gaming.js
const gamingArr = Gaming.default().map(item => {
return { url: `/blog/gaming/${item.slug}`, item };
});
const posts = { coding: [...codingArr], gaming: [...gamingArr] };
// declare here & hoist later because requiring now would try to load sitemap.xml before it exists
let sitemapXml, sitemapJson, smUrls;
To briefly go over this, I’m importing 'esm’ as const esmImport
and invoking it with the ‘module’ argument. const Coding
& const Gaming
are my two ES6 blog modules and I am using 'esmImport’ to bring them into a NodeJS script without using Babel. const cheerio
, const marked
, & const convert
are bringing in the other npm packages we installed earlier. const path
, const fs
, & const util
are importing modules that are built into NodeJS.
const codingArr
& const gamingArr
is where I am making arrays from my blog modules. The ‘url’ attribute just allows me to add the full url as a string and I assign the ‘item’ attribute as the entire item from the array so I have everything I need later. From here I have ‘const posts’ which is just where I combine the gaming and coding blog arrays into a JS object with the key being 'coding' or 'gaming' for each blog section and the value is the array that contains the posts for each. Later, I loop over these arrays and flatten them to return the xml-js formatted array of objects to correctly represent the blog posts in the sitemap.xml file.
To end this section I have let sitemapXml, sitemapJson, smUrls
because they can’t yet be assigned but need to be global. I hoist those variables later once I invoke the script. The reasoning here is because I run this from a npm script and, if these variables are defined before the script is invoked, the process will throw an exception. This is because this script will be loaded as soon as the npm script begins and the sitemap.xml generated by the Nuxt plugin won’t yet exist, but Node will try to declare these variables immediately. We have to wait until this script is actually invoked to define these variables since that happens after the ‘nuxt generate’ process has ended.
Generating Sitemap XML from Your Views
Now it is time to write the function that will generate the additional XML for your main views. Go ahead and add:
function makeSitemapForPages() {
return new Promise(resolve => {});
}
I am returning a promise here because I need this to finish running before I write the generated result to the sitemap.xml file. You can pass reject here and do some error handling or exit the process, but I chose not to considering this will be run in the build process and, if an error is thrown, the build will just fail and I can review the problem and make the necessary changes. Since I am using this for my own static site hosted by Netlify, I get an email if the deployment fails, and I can visit the issue on my own time. If this were being used for something with much more weight (a work project, web application with thousands of users, etc) I would add the reject functionality. I am sometimes lazy with my own personal projects. Isn’t the saying something like “the plumber’s pipes are always clogged”?
At the top of this function, I create a variable I named ‘xml’ and assigned it to this:
// built is path for generated html file; raw is array of component vue files that makeup the generated html
const xml = [
{
built: ‘index.html’,
raw: [‘pages/index.vue’, ‘components/NavBar.vue’, ‘components/Social.vue’]
},
{
built: ‘about/index.html’,
raw: [
‘pages/about.vue’,
‘components/about/contact.vue’,
‘components/about/general.vue’,
‘components/about/site.vue’,
‘components/about/tech.vue’,
‘components/about/work.vue’
]
},
{ built: ‘blog/index.html’, raw: [‘pages/blog/index.vue’] },
{
built: ‘blog/coding/index.html’,
raw: [
‘pages/blog/coding/index.vue’,
‘components/blog/blogComments.vue’,
‘components/blog/blogContent.vue’,
‘components/blog/blogHeader.vue’,
‘components/blog/blogListContainer.vue’,
‘components/blog/blogListItem.vue’,
‘components/blog/blogListMaster.vue’,
‘components/blog/blogPostSlug.vue’
]
},
{
built: ‘blog/gaming/index.html’,
raw: [
‘pages/blog/gaming/index.vue’,
‘components/blog/blogComments.vue’,
‘components/blog/blogContent.vue’,
‘components/blog/blogHeader.vue’,
‘components/blog/blogListContainer.vue’,
‘components/blog/blogListItem.vue’,
‘components/blog/blogListMaster.vue’,
‘components/blog/blogPostSlug.vue’
]
}
]
This is essentially an array of objects where the ‘built’ key contains the path to the Nuxt generated HTML file for the view and the ‘raw’ key contains an array of the components that make up that view. The ‘raw’ array is necessary for getting a date for the '
From here, I just chain a .map
off of the array definition that looks like this:
…
.map(page => {
// read the file
const file = fs.readFileSync(path.resolve(__dirname, ‘dist/’, page.built));
// parse html with cheerio
const $ = cheerio.load(file);
// build the loc tag
const route = `https://joeyg.me/${page.built}`;
const routeSplit = route.split(‘/’);
routeSplit.pop();
const routeCleaned = routeSplit.join(‘/’);
// build the lastmod tag using file stats
const lastmodDate = page.raw
.map(file => {
const stats = fs.statSync(path.resolve(__dirname, file));
return new Date(util.inspect(stats.mtime));
})
.sort((a, b) => {
return a < b ? 1 : a > b ? -1 : 0;
})[0];
// build the images tags
const images = Array.from($(‘img’))
.filter(img => !img.attribs.src.startsWith(‘data’))
.map(img => {
const src = img.attribs.src.startsWith('/')
? (img.attribs.src.startsWith('/_nuxt') ? `https://joeyg.me${img.attribs.src}` : `https:${img.attribs.src}`)
: img.attribs.src;
const imageTag = { ‘image:loc’: { _text: src } };
if (img.attribs.alt) {
imageTag[‘image:title’] = img.attribs.alt;
}
return imageTag;
});
return {
loc: { _text: routeCleaned },
lastmod: { _text: lastmodDate.toISOString() },
‘image:image’: images
};
});
resolve(xml);
…
This needs explaining. First, I’m defining const file
and setting it to the contents of the file I want to load using the ‘fs’ module I loaded at the top and the ‘readFileSync’ method. Now that the file is loaded, I declare const $ = cheerio.load(file);
to load the file into cheerio and allow for jQuery style parsing of the html. This will allow me to search for image tags in that view. The next 4 lines is just taking the page.built url, prepending it with my domain, splitting it, and removing the '/index.html’ from the end since that is actually the canonical route of that view.
The next section takes the array in page.raw that comprises my raw .vue files that make up each view. I run a .map
on the array and use fs.statSync to get the details of each file. I use the util module I loaded earlier to create a new JavaScript date object containing the last modified date of the file. Now I have an array of last modified dates for the files that make up the views. From here, the simplest approach seemed to sort the array of dates I created in descending order then take the first index of the array since that will be the most recent date. Now I have what I need for the
In the next section, I build the image tags. The '$’ variable is like a jQuery selector on the html file I loaded into cheerio earlier. I use Array.from($(’img’))
because it behaves like a jQuery selector that returns an array where you would normally call '.each’. ‘Array.from’ allows me to iterate using normal JavaScript syntax and, since I am running a filter and map on the array, this was the easiest thing to do.
The filter call in this section is removing the embedded base64 images. These get embedded during my build process and are mostly just icons and decor so they shouldn’t be included in the sitemap anyway. From here, I map over what remains of the images grabbing the src and alt tags from each image. You’ll notice that I have a part when defining src that checks to see if the url is relative or starts with http and adds the ‘https:/’ in the event that it is a relative path. All that is left to do here is create the imageTag variable in a format that can be converted back to XML using xml-js later. I then check to see if the image has an alt tag and if so, I add that to the ‘image:title’ property of the object. The return statement at the end is where it all gets returned in the needed format. Now I have an array of routes, each with an array of images and the last modified date and all of it formatted for xml-js to parse into XML. Immediately after this block I added resolve(xml);
to resolve the promise and return the data I just created.
Parsing the Blog Posts
I’m going to skim over this part because it is very specific to my needs, but could still prove useful for someone out there. I’ll have a link to this file at the bottom of the post if you want to see the whole file.
The main difference with how the sitemap entries for the blog posts are created is that the data is generated from the JS object array I created at the top of the file that contains all the data for my two blogs and the actual MD file for each post that generates the HTML. This section of code is the biggest change in approach:
…
const postMd = fs.readFileSync(
path.resolve(__dirname, `content/posts/gaming/${post.item.id}.md`),
‘utf8’
);
// parse the md file into html
const md = marked(postMd, {
breaks: true,
gfm: true,
smartypants: true
});
…
Basically this just reads the MD file for each post then uses the marked library to convert it to HTML. From here I approach things in a very similar fashion except for using the created_at date in my blog arrays as the data for the
Putting It All Together
All that is left to do is write the actual sitemap.xml file and the IIFE that will kick the whole thing off. The IIFE is simple:
(function() {
// hoist here because sitemap.xml doesn’t exist on load but will on execute
sitemapXml = fs.readFileSync(path.resolve(__dirname, ‘dist/sitemap.xml’), ‘utf8’);
sitemapJson = convert.xml2js(sitemapXml, { compact: true, spaces: 2 });
smUrls = sitemapJson.urlset.url;
// once both promises are resolved, call writeSitemap and pass results
Promise.all([makeSitemapForBlogs(), makeSitemapForPages()]).then(result => {
writeSitemap({ blog: result[0], pages: result[1] });
});
})();
Here I am assigning the three variables in the ‘let’ statement at the top since now the build process has generated the minimal version of the sitemap.xml file. I load in the sitemap file, convert it to JSON, then get the contents of the urlset from it. Rom the I have a Promise.all call where I pass my array containing the 2 promise calls. When the promises resolve, I simply call the function that writes to the file and pass the results from the promise.
Writing Your New Sitemap
Now that I’ve generated two JSON arrays of urls with lastmod, image links, and image titles, it is time to overwrite the sitemap.xml file. The objects inside the array are already formatted in such a way for the xml-js library to properly write the sitemap file. It took me a little time to figure out, but the main things to remember with xml-js are:
- the outer object key will be the text inside of the tab. For example,
{’image:image’:…}
creates<image:image></image:image>
. - After you have the outer key, there are 2 main keys I use in the inner object and they are '_text' and '_attribute'.
- '_text' is the key for placing text into the tab. Example:
{'image:image': {'image:loc': {_text: 'https://example.com'}}}
would generate:
…
<image:image>
<image:loc>http://example.com</image:loc>
</image:image>
…
- ‘_attribute’ adds an attribute to the tag. Example:
…
sitemapJsonShell.urlset._attributes = {
xmlns: `http://www.sitemaps.org/schemas/sitemap/0.9`,
‘xmlns:image’: `http://www.google.com/schemas/sitemap-image/1.1`
};
…
would generate:
…
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
…
</urlset>
…
I add attributes to the urlset tag with the above code. From there, I rewrite the urlset contents by simply using the ES6 spread operator on the 2 arrays or carefully formatted objects I passed to this function and write the file passing the flattened array with the xml-js function call to convert this all to XML:
…
sitemapJsonShell.urlset.url = [...xmlObj.pages, ...xmlObj.blog];
// write the sitemap
fs.writeFileSync(
path.join(__dirname, ‘dist/sitemap.xml’),
convert.js2xml(sitemapJsonShell, { compact: true, spaces: 2 })
);
…
At this point, I have overwritten the original, barebones sitemap.xml file generated by the plugin in my nuxt.config.js file with a much more detailed version. I added a console log after the write is finished just so I know it ran when I watch the terminal output in my build process. Now Google can better crawl my site and hopefully place it higher in search results!
To trigger the script to run at the end of my build process, I needed to add a call to this script to the end of my npm task and have this run in series. To do this, I simply changes my npm generate task to be:
…
“generate”: “nuxt generate && node sitemapGenerate.js”,
…
Here I just added the && node sitemapGenerate.js
to what was already there. The '&&’ just tells npm to wait until nuxt generate
is finished before executing the script to ensure that the sitemap.xml generated from the plugin exists beforehand.
Maybe This Was Helpful?
I’m still very new to blogging about development/writing code. That said, I hope if you made it to this point that you found some usefulness with what I’ve written today. Let me know in the comments below if anything here has been of use to you! In all honesty, I wrote this little script very quickly so I know it could be refactored to be a little DRYer and more concise, but it serves a purpose for me.
You can find the full file on my GitHub here: file on GitHub