Autocomplete Fuzzy Matching
Thursday, July 8th, 2010
Autocomplete widgets live in nearly every system that requires filtering items via input against large amounts of data. This includes address books, email contacts, restaurants, even social graphs. However, in most matching algorithms, Engineers don’t take into account that people don’t know how to spell AND/OR are lazy. Thus here is a really simple solution to work around this problem, and in my own opinion, will vastly improve the user experience.
Look ahead matching
Let’s say you have five people. Daniel, Dustin, David, Damarcus, and Russ. Now let’s say a user types in dus. We would match Dustin and Damarcus. Likewise, if we typed in us, we would get an output of Dustin, Damarcus, and Russ.
Enter RegExp
At this point, it’s sort of a no-brainer. Your input-based regular expression can be created as such:
var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];
function matchPeople(input) {
var reg = new RegExp(input.split('').join('\\w*').replace(/\W/, ""), 'i');
return people.filter(function(person) {
if (person.match(reg)) {
return person;
}
});
}
Voila! Now you never have to type another vowel again! (At least in spirit… you know what I mean). In addition to this, you can do further post-process sorting based on Levenshtein distance which will graduate better matches to the top.
An Aside: Caching
Since this tends to do be heavy on regular expression construction, a good idea would be to cache your compiled expressions. This can easily be done using my Cache Provider as such:
var regCache = new CacheProvider;
function matchPeople(input) {
var reg = regCache.get(input) || regCache.set(input, new RegExp(input.split('').join('\\w*').replace(/\W/, ""), 'i'));
// etc...
}
Now any subsequent calls to the same input will run against a pre-compiled expression. And that folks, is all I got for the night. Cheers.
10 Responses to “Autocomplete Fuzzy Matching”
find it
recent
- Autocomplete Fuzzy Matching
- JavaScript Cache Provider
- JavaScript Animate
- Asynchronous method queue chaining in JavaScript
- Something changed
- Unofficial Twitter Widget Documentation
- Twita@talinkahashify your tweets
- Me on Photography and JavaScript
- RegEx Brain Teaser Part II
- Get your Gmail Stickers
- Parenthetical back matching in regular expressions
- The Skinny on Doctypes
i am dustin diaz

July 8th, 2010 at 7:30 am
[...] Fuzzy Matching: “Let’s say you have five people. Daniel, Dustin, David, Damarcus, and Russ. Now let’s say a user types in dus. We would match Dustin and Damarcus. Likewise, if we typed in us, we would get an output of Dustin, Damarcus, and Russ.” [...]
July 8th, 2010 at 8:04 am
[...] This post was mentioned on Twitter by Dustin Diaz, 1Marc and Srdjan Pejic. Srdjan Pejic said: RT @ded: New post: Autocomplete fuzzy matching – http://bit.ly/9bTUYa <– really nice [...]
July 8th, 2010 at 11:01 am
Good stuff, but there are a few gotchas in your example.
You’ll want to escape regex characters in the input string, or characters like dots and parens will result in unexpected behavior or syntax errors.
Also, while you describe typing in “dus” to match “Dustin”, this won’t work with the given example, since it uses a case-sensitive match.
Finally, it may be worth mentioning that Array.prototype.filter is part of ES5 and won’t work in old browsers.
July 8th, 2010 at 1:29 pm
Hi Ryan,
I’ve added the ‘i’ operator (I forgot to add it) to the regex. And good catch on the escaping. Perhaps a simple replace(/\W/, “”); on the input should take care of it.
Also, I think most folks on this blog are aware of .filter() – I have a post on that.
July 8th, 2010 at 1:53 pm
One more round of code golf. :)
The replace needs to happen before the split/join, and it needs to be global:
var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];
function matchPeople(input) {
var reg = new RegExp(input.replace(/\W/g, '').split('').join('\\w*'), 'i');
return people.filter(function(person) {
if (person.match(reg)) {
return person;
}
});
}
July 8th, 2010 at 1:53 pm
Pardon the lack of whitespace. Your blog seems to have eaten it.
July 9th, 2010 at 12:27 am
I like this idea.
I’m wondering whether you can’t just use the captures of the join and get rid of the filter alltogether… Something like this:
var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];
var input = “ds”;
var search = ‘@’+people.join(‘@’)+’@';
var target = ‘@([^@]*?’+input.split(”).join(‘[^@]*?’)+’[^@]*?)@’;
var reg = new RegExp(target, ‘ig’);
var matches = search.match(reg);
I can’t seem to get rid of the delimiters though (the capturing groups… i’m not sure why they won’t work), but you can do something like this in the end..
matches.join(”).replace(/@@/g, ‘@’).split(‘@’).slice(1,-1);
to get an array with the hits.
July 11th, 2010 at 1:21 pm
Peter,
You’ve introduced two new regular expressions – not to mention some crypitic looking code. And in the end, it doesn’t even work (make “us” your input).
I think between the code golf between Ryan and myself, we’ve got something good.
July 13th, 2010 at 12:28 am
Dustin,
There’d actually only be one regular expression. And I’m sure I can tune it to remove the additional garble, I just can’t seem to get it right at the moment.
Even so, I wonder whether the second (cachable) regex will be slower than a call to each…
Anyways, the example might indeed be flawed (sorry ’bout that ;) but my point was that a single regex could do what you’re now doing with a loop. Please excuse me if I don’t feel like tweaking for an hour to get that point into a better poc ;)
July 13th, 2010 at 12:29 am
each… i mean filter.