Sitemap generation on microservices

Yana Sidorova

Middle backend developer Hawking Bros

We continue the series of articles about microservice architecture. Last time we talked about switching to a microservice architecture. This time we will talk about generating a file sitemap.xml and about our solution to this problem

Introduction

What is a Sitemap?

Sitemap is an xml file that contains information about the site’s pages: URLs, file types, publication dates of articles, and so on. You need to provide this file to the search engine so that the site pages are indexed correctly for search.

When is a Sitemap needed?

If the site is already built on a microservice architecture, there is no need to explain that not all links to pages are in the top menu. Most likely, this is a complex system containing a large amount of content, and we would like to show this content to the user when searching on Google.

What is the problem?

There are automatic generators, but they only work with static and rarely updated information. So you need to write your own method: collect all static URLs, generate dynamic ones based on the information in the database; then create a file and fill it in, without forgetting about the structure of the XML document.

The problem with using microservice architecture is that microservice databases are autonomous and isolated, each of them contains data that needs to be received on a separate request. Where should I place the Sitemap generation method? Will it send a request to each microservice? How long does data collection take? Where to store the file at all sitemap.xml ?

There are different ways to solve the problem, but we will share our vision.

Scheme and technologies

Situation

In our example, the site is an online store with more than a hundred thousand products. The company has about 50 branches, each with its own contact page, blog with news and promotions and the like. All these entities are divided into microservices.

Technologies

Laravel PHP framework, Kafka queue broker, ElasticSearch search engine as a data warehouse, PostgreSQL DBMS. The interaction between frontend and backend is implemented using the REST API.

Point

So, to generate the finished file sitemap.xml required:

Give a command to collect page addresses to microservices that should generate them.
Save the generated data in a shared storage.
Give the command to receive and format this data in xml format.
Save the generated xml file for quick access when requested.

To get:

Send an API request to the appropriate microservice.
Get the contents of a file from quick access or from an external storage.
Give the data in the response.

To do this, we will need a message broker and an external storage.

Kafka is used as a message broker in this example, ElasticSearch is responsible for external storage.

A message about sitemap generation is sent to Kafka (by clicking a button in the administrative panel, from a cron task or directly from the terminal). Specifically, this type of message is listened to only by those microservices that should participate in the collection of addresses. Further, the microservices methods inside themselves go through static urls set by constants, through their database, if required, and also form part of the file and save it to a document of a pre-created index.

Sitemap Generation

Now, in order to take this data and make a document out of it, a certain microservice (for example, seo) accepts an API request.

To return an answer, first the methods of the seo microservice look into the cache. The cache storage time depends on the frequency of updated data, in our case it is one day. If the necessary data is not in the cache, the methods search for them in the microservice database with an update date no earlier than a day. If there is no data there, the following process occurs.

The microservice accesses the ElasticSearch index, retrieves parts of the file, combines and applies the necessary formatting to them. Then the contents of the file are saved to the database, cached for further quick access and given in the response. The file is not physically stored, its data is collected from parts in external storage. We have configured our web server so that when switching to /sitemap.xml a request was sent to the backend, which in turn returns the generated xml.

Getting a Sitemap

Conclusion

Thus, when requesting sitemap.xml from the browser, the web server sends an API request to the seo microservice and the received response is inserted on the page.

This scheme makes it easy to add new microservices to the file generation process. To do this, you need to connect them to listening to the message and implement methods of collecting addresses inside yourself.

Using a message queue allows you not to wait for the response of successive requests to each microservice, but to generate parts of the file asynchronously to avoid the 504 Gateway Time Out error.

If some microservice is unavailable or the generation ended with an error, some of the addresses will simply not be included in the general list, and the error can be easily tracked in the logs. But he himself sitemap.xml it will always be available, even during address updates, since another request is responsible for receiving the file.

IOS Development Company | Python Outsourcing Company