There was a time when I used to read quite a bit. Two young kids at home and a young, growing company fill up most of my time these days. Except for this time of the year! Last two weeks of December are the time when works slows down a little and the schools are off. I finally find some time to catch up on my reading during these days.
This year, I have picked up a few interesting books for my end of the year reading but one of my favorite books of all times is Douglas Adams’ Hitchhiker’s Guide to Galaxy. All the characters in that book – from the Marvin the Paranoid Android to the elevators that can see in the future – are amazing.
One of the species in that book is the Babel fish – “The Babel fish is small, yellow, leech-like, and probably the oddest thing in the universe. It feeds on brain wave energy, absorbing all unconscious frequencies and then excreting telepathically a matrix formed from the conscious frequencies and nerve signals picked up from the speech centres of the brain, the practical upshot of which is that if you stick one in your ear, you can instantly understand anything said to you in any form of language: the speech you hear decodes the brain wave matrix.” In simple English – when you stick a babel fish in your ear – you can understand and make sense of any language spoken in the universe.
We had a need for a babel fish for eCommerce earlier this year. One of our customers operates a number of online stores in Europe, selling everything from bike parts to bathroom fixtures. They had this challenge where they wanted to learn the customer’s vocabulary to automatically improve search experience for their customers. Now, usually we would handle something like this by using lemmatization and stemming but since their websites were not in English, it posed a problem.
So, we came up with a solution that could automatically learn from what people do when they are not satisfied with their search results. Here is a snapshot of what we learnt:
If you look at the bottom of this graph – you can see that people search for ‘reflexvast’, which is Swedish for Reflective Vest. And since they do not find the right results, they change their query to reflex (reflective) or vast (vest). What we learn from this behavior is that reflex vast is also spelt as reflexvast by a lot of customers. So, the system learnt to handle that automatically by creating synonyms of its own.
From the top of the list, we also learnt that when people search for ‘skoskydd’ (shoe covers), they end up changing the search keywords to ‘skooverdrag’ (shoe over draw). They did the same thing for Skooverdrag vinter (show covers for winters). That’s another set of automatically learnt synonyms.
The threshold limits helped us ensure that we only added synonyms that were getting a lot of traction and ignore one off cases. The result lead to a system that automatically started learning a language that we had not dealt with before. Since then, we have used this algorithm for a lot of our customers to help them automatically improve search relevance.
We also gave a code name to this algorithm – babel fish for eCommerce.