C# / .NETDevOpsMisc
C# / .NET
Large sitemap
Alexandru Puiu
Alexandru Puiu
October 10, 2018
1 min

We built a social network-type site recently with quite a large number of public-facing pages, and like any good site, we created a sitemap with links to every page. All fine on launch, every page we tested finished in under 3ms of logic processing, and everyone was very happy. A few weeks pass and everything looks great, until all of a sudden all our request times go through the roof, and the database is maxing out and we’re scratching our heads at what’s causing all the load. No abnormal requests, average number of users on the site, then a few minutes later it passes and everything back to normal. It happened a few more times over the next couple weeks, until we finally found the root cause. Sitemaps are usually small, so we completely omitted looking at what’s actually generated, since that never caused problems before.

Well, this time the sitemap contained tens of thousands of links, and it was all in a single giant XML being generated on-demand.

We solved this quite quick by creating a sitemap index file with links to every other page we had, each one containing 1,024 links. This allowed search engine bots to request a small chunk at a time as needed, and made the job easier on both the bot and the site. We used RavenDB as the backing document store, so all we needed was to know the number of sitemap pages we needed, and create nodes for each.

/// <summary>
/// Generate an XML sitemap index file
/// </summary>
/// <returns></returns>
[HttpGet, Route("~/sitemap.xml")]
[SwaggerResponseRemoveDefaults]
[AllowAnonymous]
[SwaggerResponse(System.Net.HttpStatusCode.OK, Description = "Sitemap index served")]
public async Task<IActionResult> SitemapIndexXml()
{
  try
  {
    List<SitemapNode> nodes = new List<SitemapNode>();

    XNamespace xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XElement root = new XElement(xmlns + "sitemapindex");

    using (var session = _store.OpenAsyncSession())
    {
      var usersCount = await session.Query<ApplicationUser>().CountAsync();

      for (var i = 0; i <= Math.Ceiling(usersCount / 1024.0); i++)
      {
        XElement urlElement = new XElement(
          xmlns + "sitemap",
          new XElement(xmlns + "loc", 
              Uri.EscapeUriString($"https://site.com/sitemap-{i}.xml")),
          new XElement(xmlns + "lastmod", 
              DateTime.Now.ToString("yyyy-MM-ddTHH:mm:sszzz")));
        root.Add(urlElement);
      }
    }

    XDocument document = new XDocument(root);

    return this.Content(document.ToString(), "application/xml", Encoding.UTF8);
  }
  catch (Exception ex)
  {
    _logger.LogError(0, ex, $"Error while generating sitemap");

    throw;
  }
}

Next, we added the individual sitemap pages. We had a few static pages, so we decided to stick these into the first sitemap page, but they could have been their own endpoint just as well.

/// <summary>
/// Generate an XML sitemap
/// </summary>
/// <returns></returns>
[HttpGet, Route("~/sitemap-{page}.xml")]
[SwaggerResponseRemoveDefaults]
[AllowAnonymous]
[SwaggerResponse(System.Net.HttpStatusCode.OK, Description = "Sitemap served")]
public async Task<IActionResult> SitemapXml(int page)
{
  try
  {
    List<SitemapNode> nodes = new List<SitemapNode>();

    if (page == 0)
    {
      nodes.Add(new SitemapNode { Url = "https://site.com/", Priority = 1 });
      nodes.Add(new SitemapNode { Url = "https://site.com/about", Priority = 0.9 });
      nodes.Add(new SitemapNode { Url = "https://site.com/contact", Priority = 0.9 });
      nodes.Add(new SitemapNode { Url = "https://site.com/terms", Priority = 0.9 });
      nodes.Add(new SitemapNode { Url = "https://site.com/privacy", Priority = 0.9 });
    }
    else
    {
      using (var session = _store.OpenAsyncSession())
      {
        session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
        RavenQueryStatistics stats;

        var users = session.Query<ApplicationUser>().Statistics(out stats)
                           .Skip(page * 1024).Take(1024);

        foreach (var user in await users.ToListAsync())
          nodes.Add(new SitemapNode { 
              Url = $"https://site.com/profile/{user.UserName}/about", 
              Frequency = SitemapFrequency.Weekly, 
              Priority = 0.8 
          });
      }
    }

    XNamespace xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XElement root = new XElement(xmlns + "urlset");

    foreach (var sitemapNode in nodes)
    {
      XElement urlElement = new XElement(xmlns + "url",
        new XElement(xmlns + "loc", Uri.EscapeUriString(sitemapNode.Url)),
        sitemapNode.LastModified == null ? null : new XElement(xmlns + "lastmod",
        sitemapNode.LastModified.Value.ToLocalTime().ToString("yyyy-MM-ddTHH:mm:sszzz")),
        sitemapNode.Frequency == null ? null : new XElement(xmlns + "changefreq",
          sitemapNode.Frequency.Value.ToString().ToLowerInvariant()),
        sitemapNode.Priority == null ? null : new XElement(xmlns + "priority",
          sitemapNode.Priority.Value.ToString("F1", CultureInfo.InvariantCulture)));
      root.Add(urlElement);
    }

    XDocument document = new XDocument(root);

    return this.Content(document.ToString(), "application/xml", Encoding.UTF8);
  }
  catch (Exception ex)
  {
    _logger.LogError(0, ex, $"Error while generating sitemap");

    throw;
  }
}

Tags

seo
Alexandru Puiu

Alexandru Puiu

Engineer / Security Architect

Systems Engineering advocate, Software Engineer, Security Architect / Researcher, SQL/NoSQL DBA, and Certified Scrum Master with a passion for Distributed Systems, AI and IoT..

Expertise

.NET
RavenDB
Kubernetes

Social Media

githubtwitterwebsite

Related Posts

RavenDB Integration Testing
Using RavenDB in Integration Testing
December 24, 2022
2 min

Subscribe To My Newsletter

I'll only send worthwhile content I think you'll want, less than once a month, and promise to never spam or sell your information!
© 2023, All Rights Reserved.

Quick Links

Get In TouchAbout Me

Social Media