Creating multiple crawlers for custom Index in sitecore

When we discuss/talk about indexes to be used in sitecore solution, various indexing strategies, no# of indexes, application performance  and etc, following are few things which we should review and discuss first:

  1. What is the search requirements, which depends more on business requirements.
  2. Do we really need indexes?
  3. Can we make use out of the box sitecore APIs to get the data from sitecore?, or extra index set up required.
  4. If index required, can we make use of out of the box sitecore indexes(lucene/SOLR or Coveo)?
  5. Do we need to set up custom indexes?
  6. If custom index(s) needed, what are the different content you want to add to your index?

These are some of the questions which we should think, before discussing anything more or making any call for indexes and it’s setup.

But, i would like to focus on the area where the business justifies the need of indexes and custom indexes to be more specific, and which is mainly this blog is focused on.

Question: Can we make use of sitecore out of the box indexes for any custom implementation?

I would say No to this, as we should always try to architect the solution as loosely coupled as possible, but again it also depends on the requirement sometime, we shouldn’t be thinking about making a new custom index when the custom need is not that complex, and didn’t affect the performance of the application as a whole.

Following are couple of things to consider:

  1. How update index strategy will work based on specific conditions and scenarios.
  2. Any configuration update to the index, which is needed for the custom implementation.

In certain scenarios where we want to make the implementation as clean as possible and want to make sure it didn’t affect the maintenance and updates, I think we can definitely go with custom indexes, and make changes to the settings based on the requirement.

Let’s looks the configuration of default SOLR index:

solr_default_web_index

If you see the locations tag, there exists a crawler which will make sure to index the contents of the applications from specific database and specific location within sitecore, In this specific case it will index everything under sitecore, because that’s what the value we have under Root node.

Now, let’s think about a scenario where we want to index only two locations:

  1. All page items(which has presentation added to it) and
  2. All media items or any page item(s) which are structured outside of home node.
  3. To be more generic, we are trying to add two different locations in index, it can exists within home or outside home.

In order to support this scenario, we can add multiple crawlers for the same index, please see the below config for sample:

solr_multiple_crawlers_index

From the above screen shot you can see we have the same index, but with two crawlers which points to two different locations in sitecore.

Also,we  have to make sure that  both the crawlers in this case use the different name, which is more meaningful to the content type which will be indexed, we have used “PageCrawler“- which index all page items under home node and “MediaCrawler“- which index all media items under media library folder.

Following are the advantages of having multiple crawlers in the same index:

  1. Helps in boosting sitecore/application performance.
  2. Helps in applying business rules directly in the query, like pagination,sorting of pages,as now we get the different types of items from the same index.
  3. Easy maintainance.

Setting up only one crawler, or going with multiple crawlers is something point of discussion and  it mainly depends on the requirements of the application, as in general we should always keep in mind the application performance and maintenance in mind before wiring up any solution to the system.

Reference(s):

  1. https://community.sitecore.net/developers/f/5/t/2112

 

I hope this post can help someone who has some questions on where and how to implement multiple crawlers in sitecore, and indexing question(s) in general.

Happy learning 🙂