i am dustin diaz

and while you're at it...

boosh.

don't worry about it.

Autocomplete Fuzzy Matching

Thursday, July 8th, 2010

Autocomplete widgets live in nearly every system that requires filtering items via input against large amounts of data. This includes address books, email contacts, restaurants, even social graphs. However, in most matching algorithms, Engineers don’t take into account that people don’t know how to spell AND/OR are lazy. Thus here is a really simple solution to work around this problem, and in my own opinion, will vastly improve the user experience.

Look ahead matching

Let’s say you have five people. Daniel, Dustin, David, Damarcus, and Russ. Now let’s say a user types in dus. We would match Dustin and Damarcus. Likewise, if we typed in us, we would get an output of Dustin, Damarcus, and Russ.

Enter RegExp

At this point, it’s sort of a no-brainer. Your input-based regular expression can be created as such:

var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];

function matchPeople(input) {
  var reg = new RegExp(input.split('').join('\\w*').replace(/\W/, ""), 'i');
  return people.filter(function(person) {
    if (person.match(reg)) {
      return person;
    }
  });
}

Voila! Now you never have to type another vowel again! (At least in spirit… you know what I mean). In addition to this, you can do further post-process sorting based on Levenshtein distance which will graduate better matches to the top.

An Aside: Caching

Since this tends to do be heavy on regular expression construction, a good idea would be to cache your compiled expressions. This can easily be done using my Cache Provider as such:

var regCache = new CacheProvider;

function matchPeople(input) {
  var reg = regCache.get(input) || regCache.set(input, new RegExp(input.split('').join('\\w*').replace(/\W/, ""), 'i'));
  // etc...
}

Now any subsequent calls to the same input will run against a pre-compiled expression. And that folks, is all I got for the night. Cheers.

10 Responses to “Autocomplete Fuzzy Matching”

  1. Bram.us » Javascript Autocomplete Fuzzy Matching

    [...] Fuzzy Matching: “Let’s say you have five people. Daniel, Dustin, David, Damarcus, and Russ. Now let’s say a user types in dus. We would match Dustin and Damarcus. Likewise, if we typed in us, we would get an output of Dustin, Damarcus, and Russ.” [...]

  2. Tweets that mention Autocomplete Fuzzy Matching -- Topsy.com

    [...] This post was mentioned on Twitter by Dustin Diaz, 1Marc and Srdjan Pejic. Srdjan Pejic said: RT @ded: New post: Autocomplete fuzzy matching – http://bit.ly/9bTUYa <– really nice [...]

  3. Ryan Grove

    Good stuff, but there are a few gotchas in your example.

    You’ll want to escape regex characters in the input string, or characters like dots and parens will result in unexpected behavior or syntax errors.

    Also, while you describe typing in “dus” to match “Dustin”, this won’t work with the given example, since it uses a case-sensitive match.

    Finally, it may be worth mentioning that Array.prototype.filter is part of ES5 and won’t work in old browsers.

  4. Dustin Diaz

    Hi Ryan,
    I’ve added the ‘i’ operator (I forgot to add it) to the regex. And good catch on the escaping. Perhaps a simple replace(/\W/, “”); on the input should take care of it.

    Also, I think most folks on this blog are aware of .filter() – I have a post on that.

  5. Ryan Grove

    One more round of code golf. :)

    The replace needs to happen before the split/join, and it needs to be global:


    var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];

    function matchPeople(input) {
    var reg = new RegExp(input.replace(/\W/g, '').split('').join('\\w*'), 'i');
    return people.filter(function(person) {
    if (person.match(reg)) {
    return person;
    }
    });
    }

  6. Ryan Grove

    Pardon the lack of whitespace. Your blog seems to have eaten it.

  7. Peter van der Zee

    I like this idea.

    I’m wondering whether you can’t just use the captures of the join and get rid of the filter alltogether… Something like this:

    var people = ['Daniel', 'Dustin', 'David', 'Damarcus', 'Russ'];
    var input = “ds”;
    var search = ‘@’+people.join(‘@’)+’@';
    var target = ‘@([^@]*?’+input.split(”).join(‘[^@]*?’)+’[^@]*?)@’;
    var reg = new RegExp(target, ‘ig’);
    var matches = search.match(reg);

    I can’t seem to get rid of the delimiters though (the capturing groups… i’m not sure why they won’t work), but you can do something like this in the end..

    matches.join(”).replace(/@@/g, ‘@’).split(‘@’).slice(1,-1);

    to get an array with the hits.

  8. Dustin Diaz

    Peter,
    You’ve introduced two new regular expressions – not to mention some crypitic looking code. And in the end, it doesn’t even work (make “us” your input).

    I think between the code golf between Ryan and myself, we’ve got something good.

  9. Peter van der Zee

    Dustin,

    There’d actually only be one regular expression. And I’m sure I can tune it to remove the additional garble, I just can’t seem to get it right at the moment.

    Even so, I wonder whether the second (cachable) regex will be slower than a call to each…

    Anyways, the example might indeed be flawed (sorry ’bout that ;) but my point was that a single regex could do what you’re now doing with a loop. Please excuse me if I don’t feel like tweaking for an hour to get that point into a better poc ;)

  10. Peter van der Zee

    each… i mean filter.

Leave a Reply

Phone Number:

If you're about to post code in your comment, please wrap your code with the tag-combo <pre><code>. Also please escape your html entities - otherwise they will be stripped out. I recommend using postable.

Comments will be closed on September 6, 2010.

find it

recent

me on twitter

this is who i am

Hi, my name is Dustin Diaz and I'm an Engineer @Twitter, author of JavaScript Design Patterns, and a photographer. This is my website. Welcome!

On this site I write about JavaScript. I appreciate you dropping by.