Wednesday, January 21, 2009

Data Structures: Try to Trie

(Over the past six months, I've conducted over twenty in-house interviews at Amazon. What follows is the answer to one of the questions I've commonly asked, which is now retired from my list. This makes it imminently boring to non-technical types.)

Lead-in:Can you tell me the best way to store a dictionary?

The candidate usually identifies why we want to store the dictionary correctly - to look up words later - and invariably they come to the same conclusion as previous candidates before them: a hashtable.

At this point, I we talk about the benefits of hashtables - the average lookup time is great. The worst lookup time is horrible, as are the space requirements, which makes a good solution only for certain cases. More conversation is had about different architectures (server, desktop, small machine, mobile, etc) and their differing requirements.

Then I drop a twist on them: the dictionary data structure must fit in less memory than the space the dictionary file uses on disk. After a few minutes of staring at me, the candidate's jaw usually begins to slacken.

At this point, we talk about tries, a data structure that is designed to store information about sets of things (sets of letter in the case of words), especially their existence or non-existence. Essentially, it's a tree of nodes, with each node having a map that connects a letter with a child node. In any given node, you can mark it as an 'end of word' (or 'end of word' in the dictionary case) - any search that ends on a marked node should return 'true' for the existence of that word.

A trie has properties allow you to take advantage of the fact that most words start with a common prefix (think 'presume', 'pressure', 'present', 'presidential' - they all start with 'pres'.) All of the words with common prefixes get compressed down into one path on the tree, which means you can fit a lot of words in relatively little space.

Follow-up: OK, so we know what a trie is - can you code one on the whiteboard?

I've had varying degrees of success with this question - most people can't code on a whiteboard and it shows when they try. That's a topic for another time- right now, let's concentrate on implementing the trie in Java.

Start with the basics - a Trie must be able to do two things - insert a new word and look up the existence of a word. Let's prototype those methods:


public class Trie {
public void insert(String data) {
}
public boolean contains(String data) {
return false;
}
}


So far so, good. Some candidates choose to make Trie and Node a separate class - that's a matter of style the I don't distinguish as different. I choose to implement it in one class because it is easier overall.

Let's take part of the description and make it real: "a trie has a map of characters to child nodes" and "has an 'end of word' marker"


public class Trie {

private Map paths=new HashMap();
private boolean isWord=false;

public void insert(String data) {
}
public boolean contains(String data) {
return false;
}
}


Here, I usually dock the candidate if they don't use byte, don't use a boolean for the marker, and especially if they don't know how to build a map in Java. Generics are optional, but the bonus on this problem requires it and most candidates forget the cast from Object to Trie when they look up nodes later.

Next, we have to implement the insert method. This is essentially picking off the first byte, looking up to see if there is a path present, creting one if not, and repeating until we hit the end of the string. Once we're at the end, we set the marker to true.


public class Trie {

private Map paths=new HashMap();
private boolean isWord=false;

public void insert(String data) {
if (data==null) {
return;
}
byte [] bytes=data.getBytes();
Trie current=this;
for (byte b: bytes) {
if (!current.paths.containsKey(b)) {
Trie next=new Trie();
current.paths.put(b, next);
current=next;
} else {
current=current.paths.get(b);
}
}
current.isWord=true;
}

public boolean contains(String data) {
return false;
}
}


This does what it says - iterate over the bytes and when you get to the end, mark it. Penalties are given for not checking for null and no points are lost if recustion is used instead of iteration.

Next, we do lookup:


public class Trie {

private Map paths=new HashMap();
private boolean isWord=false;

public void insert(String data) {
if (data==null) {
return;
}
byte [] bytes=data.getBytes();
Trie current=this;
for (byte b: bytes) {
if (!current.paths.containsKey(b)) {
Trie next=new Trie();
current.paths.put(b, next);
current=next;
} else {
current=current.paths.get(b);
}
}
current.isWord=true;
}

public boolean contains(String data) {
if (data==null) {
return false;
}
byte [] bytes=data.getBytes();
Trie current=this;
for (byte b: bytes) {
if (!current.paths.containsKey(b)) {
return false;
} else {
current=current.paths.get(b);
}
}
return current.isWord;
}
}


That's it. One working Trie, usable on strings.

Bonus:: Ok, so let's make it work on any arbitrary set of data, such as an array of Integers, but not limited to that. I want a generic Trie.

This is an exercise left to the reader - needless to say, it's not hard and getting it perfect will give you a Trie that will work for every future use. Post your solution in the comments below!

Saturday, January 10, 2009

Thoughts on Service Economies, Economic Depression, and Huiman Awareness

When I moved to the Northwest, I made a journey to a place south of Seattle I had only heard about - Fry's Electronics. Reading about it in Coupland's Microserfs, I fell in love with that place and Ikea, and the two became the things I associated with the Northwest - I think I had gone to Fry's twice before I ever went to Pike Place Market or the Space Needle.

Every couple of months, I make a trip down there - it's about a 30 minute drive, sometimes longer with traffic - usually with no specific reason, just to see what's on sale and maybe pick up some gear for fun. Today I went in to get a second LCD monitor - I found one I had been looking at for a while and it was 50 bucks cheaper than Amazon's list price. It was a great find and I was very excited - finally, some more real estate for projects at home!

Fourty-five minutes later, I walked out of Fry's with no monitor, frustrated and annoyed.

In the time since I saw what I wanted, no salesperson approached me. I looked for a stack of the LCD monitor I wanted and found none - this means a salesperson has to get it from the back and bring it to you for purchase. After the first five minutes, I started making eye contact with a few salespeople, some of whom were tied up with other customers.

I then started making eye contact and smiling, which also didn't work for the next few minutes, while touching and examining the merchandise to show interest. This didn't work.

I finally got a salesperson to acknowledge my presence; she was very nice, smiled, and unfortunately, did not work in that department. She walked over to a salesperson who was busy helping an elderly couple decide on a monitor - she pointed me out, he nodded. And he never got back to me, even after the couple had left.

I switched to folding my arms and pacing around slowly, which is a signal of impatience. One salesperson, whose eye I had caught twenty minutes earlier, came over within three feet,. I made eye contact, he avoided me, and adjusted the monitors, checking some numbers. He walked off without even asking if I had been helped.

I waited five more minutes and left. I couldn't have given Fry's my money that day - I tried and they failed to take it from me.

The point of this isn't to slam this bad service - it's to point out how it could have been better. In a service industry in an economic depression or downturn, sales performers rise to the top and those that do not perform fall to the bottom. It's a simple fact.

By being aware of their surroundings and their customers, each salesperson I saw or saw me could have made a ten-minute sale of two-hundred dollars. That's US$20/minute or US$1200/hour if sustained throughout the store - that should pay for the workforce for a 20-person store easily.

In downturns, companies that have excellent service outshine the others - if people are spending less in general, you want to be known as a service leader. This means that those that buy from you expect quality and will statistically pick you over an unknown or negatively-viewed competitor.

It all starts with the worker - the individual who interacts with the customer.

So, how can workers obtain this awareness? Observe behaviors - people who have not made up their minds usually display generally neutral body language. In business, someone who is overly friendly towards someone they don't know often wants something - the only thing I want from a salesperson when I'm friendly is some quick assistance.

Similarly, pensive customers are sales lost - those who are going to be leaving your store soon and maybe not coming back. By catching them on their way out, you might be able to at least get them to come back to the store, instead of never coming back at all.

The benefit to the worker is enormous as well. Citing a number of sales on a review is going to give you more leverage in getting promoted or a raise. Being able to show how much a department will lack when you leave is a reason to keep you around. And should your reputation as a salesperson get outside your store or to the regional or national management, you may not be working the floors much longer, but training your replacement when you move into management.


"Be civil to all; sociable to many; familiar with few; friend to one; enemy to none."
Benjamin Franklin