There was a time when I used to read quite a bit. Two young kids at home and a young, growing company fill up most of my time these days. Except for this time of the year! Last two weeks of December are the time when work slows down a little and the schools are off. I finally find some time to catch up on my reading during these days.
This year, I have picked up a few interesting books for my end of the year reading but one of my favorite books of all times is Douglas Adams’ Hitchhiker’s Guide to Galaxy. All the characters in that book – from the Marvin the Paranoid Android to the elevators that can see in the future – are amazing.
One of the species in that book is the Babel fish – “The Babel fish is small, yellow, leech-like, and probably the oddest thing in the universe. It feeds on brain wave energy, absorbing all unconscious frequencies and then excreting telepathically a matrix formed from the conscious frequencies and nerve signals picked up from the speech centres of the brain, the practical upshot of which is that if you stick one in your ear, you can instantly understand anything said to you in any form of language: the speech you hear decodes the brain wave matrix.” In simple English – when you stick a babel fish in your ear – you can understand and make sense of any language spoken in the universe.
We were in need of a babel fish for eCommerce search earlier this year. One of our customers operates a number of online stores in Europe, selling everything from bike parts to bathroom fixtures. They had this challenge where they wanted to learn the customer’s vocabulary to automatically improve the search experience for their customers. Now, usually we handle something like this by using lemmatization and stemming but since their websites are not in English, it poses a problem.
So, we came up with a solution that could automatically learn from what people do when they are not satisfied with their search results. Here is a snapshot of what we learned:
If you look at the bottom of this graph – you can see that people search for ‘reflexvast’, which is Swedish for Reflective Vest. And since they do not find the right results, they change their query to reflex (reflective) or vast (vest). What we learn from this behavior is that a lot of customers spells reflex vast also as reflexvast. So, the system learns to handle that automatically by creating synonyms of its own.
From the top of the list, we also learn that when people search for ‘skoskydd’ (shoe covers), they end up changing the search keywords to ‘skooverdrag’ (shoe over draw). They did the same thing for Skooverdrag vinter (show covers for winters). That’s another set of automatically learned synonyms.
The threshold limits help us to ensure that we only add synonyms that are getting a lot of traction and ignore one-off cases. The result leads to a system that automatically started learning a language that we have not dealt with before. Since then, we use this algorithm for a lot of our customers to help them automatically improve search relevance.
We also gave a code name to this algorithm – babel fish for eCommerce.