Searching the module catalogue with Funnelback

Nick Mullen
Tuesday 19 January 2021

We have recently revamped the University’s module catalogue to take advantage of a number of Funnelback features, delivering a more sophisticated search algorithm and returning higher quality results.

In this blog post, I am going to discuss the steps taken to set up the key search features used in the module catalogue. This article is aimed at people who have some experience using Funnelback v15.16. I don’t go in to great technical detail here but I have added in links to the relevant Funnelback documents so you can dig a bit deeper if required.

All the data used by the search comes from one XML data feed. The feed contains details of all the modules that the university offers. The search web pages are published by our TerminalFour (T4) content management system.

1 – Mapping the XML data

Before doing anything with the XML data in Funnelback, I first mapped out each data element to be used. This enables Funnelback to convert the XML feed into variables that can be accessed within the search process. I created a new ‘Module Catalogue’ web collection, pointing the collection at the XML data feed by setting the starting URL and updating the collection.

I then configured the XML processor to identify each XML item and mapped out each XML node using the Configure metadata mapping function.

Screen shot of Funnelbacks Meta mapping.
Each element within the XML is mapped using the mappings functionality.

2 – Right hand filter options

The module catalogue has a number of options to filter the search results.  I created new facets under the ‘Customise Faceted Navigation’ option, remembering to select publish on the Facet list. From the facet list you can define what meta data to use as a filter, control the type of selection such as checkbox or radio button, and set the display order.

Screen shot of Funnelbacks Facet list
Each navigation option is added to the facet list.

The numbers that appear to the right of the filters is the running total of results (if the given filter is applied). From early feedback, we found this could get a bit confusing for users, especially if multiple filters were applied for the same Meta type. Therefore, we have limited the user’s ability to apply multiple filters by using radio select buttons.

Screen shot of navigation facets.


3 – Collection configuration settings

Over the course of this development, I have tried a wide range of different setting combinations within the collection.cfg file. Through a process of trial and error, I have settled on these options.

Option Description
auto-completion.auto-completion.triggers=
s.result.metaData["modulecode"]!,
s.result.metaData["title"]!
Will create auto completion triggers for module code and module title.
indexer_options=-mdsfml40000 –forcexml Increase the size of metadata fields to 40000 and process all documents as XML.
query_processor_options=
-sco=2[modulecode,title]
-wmeta.title=0.7
Use the custom meta fields module code and module title as part of the ranking algorithm.
Reduce the module title weighting so that it is lower than the module code (that has the default value of 1).

4 – Set the result URL with a hook script

One problem with the XML feed is that it does not contain the URL that is needed to redirect the user after a result is selected. To fix this I used a hook_post_datafetch groovy script to generate this URL when the collection is indexed. The script runs after Funnelback collects the data, but before the data is processed. The script then loops over the results, building a URL using the metadata and then updates the search results with the new URL.

[
transaction?.response?.resultPacket?.results.each() {
   it.metaData["URL"] = 'https://www.st-andrews.ac.uk /search/module-search/catalogue/?code='+it.metaData["modulecode"]+'&academic_year='+it.metaData["ayrs"];
   it.liveUrl = it.metaData["URL"];
   it.displayUrl= it.metaData["URL"];
   it.cacheUrl = it.metaData["URL"];
]

5 – Auto complete concierge

Screen shot of the module catalogue concierge .

The auto complete drop down is split into two columns. The first/left column contains search suggestions that are words and phrases found within the content that matches the entered text. Selecting one of these options will reload the search with the selected word or phrase.

The second/right column contains modules found within a predefined data structure. These suggestions are provided in a standard layout (module code – module title). Selecting one of these links will take users directly to that module’s details page bypassing the search results.

Suggested search terms

Suggested search terms are straightforward to implement using the standard T4 functionality. All that is needed is a little JavaScript in the frontend to configure the autocomplete by adding in the collection and profile name that is to be used to generate the suggestions.

(function($) {
  $('input.query').autocompletion({
    datasets: {
      organic: {
        collection: 'collection_name',
        profile   : '_default',
        program   : 'http://funnelback.server/s/suggest.json',
      }
    },
    length: 3
  });
})(jQuery); 

Structured data

Structured data is more complex to implement but gives much greater control over the results.

I went through several iterations before settling on a custom approach.

For my first attempt, I created a php script to generate a CSV file from the XML feed.  Details of the CSV file format can be found within the Funnelback documents.

I then uploaded the CSV file to a new ‘auto-completion’ profile and configured Funnelback to use the CSV file as the structured data source for the auto complete.

[
jQuery('input.query').autocompletion({
  datasets: {
   staff: {
    name: 'Staff members',
    collection: 'staff-members',
    profile : 'auto-completion',
    show: 10,
    template: {
      suggestion: '{{label.firstName}} {{label.lastName}} ({{label.metaData.role}}){{label.summary}}', }, 
       } 
    },
 length: 3 });

]

This approach worked but it requires the csv file to be manually updated regularly.

For my second attempt, I uploaded a script onto the Funnelback server to automatically create the CSV file each time the collection is crawled.   I created a new “auto-completion” profile and added an .ftl script that outputs the results data in a CSV format.  I then added a post index command to fire the .ftl script as part of the indexing process. Full details of the CSV generation process can be found on the funnelback web site.

This worked and it removed the need to upload a csv file but the results were inconsistent. I am not sure why but I found it worked well using some search terms but did not work with others.

For my third attempt, by now I was getting fed-up, I had spent days looking at this and still could not get a reliable solution. Therefore, I decided to build my own structured data search. In the end it took a couple of hours to create a script that reliably returns module suggestions. All I had to do was make a PHP script to search the XML feed and return JSON in the same format used by Funnelback.

[
  {
  "key":"Mathematical and statistical modelling for Biologists",
  "disp":"BL5115 - Mathematical and statistical modelling for Biologists",
  "disp_t":"H",
  "wt":"70",
  "cat":"module",
  "cat_t":"",
  "action":"https://www.st-andrews.ac.uk/ search/catalogue/?code=BL5115&academic_year=2020/1",
  "action_t":"U",
  "modulecode":"BL5115"
  }, 
  etc.. 
]

Then it was just a simple case of replacing the JavaScript configuration so that my script was being called instead of Funnelback’s script. So now when the user searches, the search term is sent to my php script which returns suggestions in the form of json, removing the need to generate or upload csv files.

[
  organic: {
    name: 'Suggested search terms',
    collection: 'uosa-module-catalogue',
    profile : '_default',
    show: 5,
    program: 'https://www.st-andrews.ac.uk/s/suggest.json'
  },
    modules: {
    name: 'Modules found',
    profile : '_default',
    show: 10,
    sort: 0,
    program: 'https://www.st-andrews.ac.uk/php/module-catalogue/'
  }
]

I never did get to the bottom of why Funnelback structured-auto-completion was so inconsistent. My guess is I was hitting a limit somewhere in the process, possibly the size of a hash table or a limit in a file size.


6 Wild card search *

At the university, each module has a module code created using a specific formula. The first two characters denote the module domain and the first number is the modules level, for example a module code that starts with “MT3 “ is (MT) maths level 3. Our user research has shown that students know this formula and use the code to search for modules. For example, a student wishing to view all maths modules may simply search for the term “MT”. This type of searching is called a wildcard search, where you try to find results using only part of a word (or reference code in this case).

Funnelback works by searching for whole words and does not support wildcard searching out of the box. A wildcard search can be problematic. Searching a collection in this manner is process heavy and can lead to poor search performance. Wildcard searching can also lead to a poor quality of results, for example searching for a module with the code that started with TH ? The search would return a result for every instance of “th” that it finds within the collection. The search would find any ‘this’, ‘then’, ‘the’ etc. This would swamp the results rendering the search useless. The trick is to combine a normal search with a separate wildcard search that only searches the module code. In this way a wildcard search is only performed on the module code with the remaining data being searched normally.

To do this I created a PHP script that takes the XML feed and turns it into one web page listing all the module codes. I then created a new Funnelback collection and configured it to index my new web page. This gives me a new Funnelback collection that only contains module codes.

Following a process described in the Funnelback forum I used a pre-process hook script in the module catalogue collection to modify the search term. The script takes the search term and passes it to the suggested search terms function of the second (module codes only) collection. This is the very same suggested search terms functionality that I used in step 5. Only this time I am using it on the collection that only contains a list of module codes. What I get back are module codes that contain the search term.

The search suggestion function will find module codes that contain the search term. Let us say the user searches for ‘MT’ and the suggestions from the module code collection returns MT3802 and MT5099. These results are appended to the original search term so that when the user searches for MT what they are really searching for is the term “MT MT3802 MT5099”. To the user it will look like a wildcard search is being performed on the module code but what is really happening is Funnelback is performing a standard search with a modified search term.

The modified search term is used to search the original module catalogue collection and hey presto we now have a list of all the modules that have ‘MT’ in the module code or the whole word “MT” with in the module text.

We will be monitoring our web traffic to see if these updates have made it easier for students and course advisers to find the most relevant modules. In future updates we will explore how the modules catalogue works in conjunction with the course catalogue, looking at how users transition from course information into module pages.

Related topics