Thursday, November 15, 2012

LOOKUP, VLOOKUP, and HLOOKUP in Excel

One of the litmus tests of knowing Excel and actually knowing Excel is the LOOKUP family of functions. If you ever go to a job interview where there is a heavy focus on Excel, you are more than likely to find a question about the LOOKUP function. That's not to say the function is super powerful or anything, in fact, it basically is the Find function in Excel just in equation form (sorta). The LOOKUP family of functions are not famous for what they do but for what role they play when trying to do pretty advance stuff in Excel. In other words, the LOOKUP family of functions by themselves aren't all that useful, but they are critical if you want to implement advance functionality without the use of VBA code. The COUNTIF and SUMIF function are pretty much up there with the LOOKUP family, in that they provide some useful functionality alone, but their true power doesn't come about until you use them in combination with other things. Well anyway let's look at the family of LOOKUPs.
Excel Function Short little description
LOOKUPProvides 1-to-1 matching of data
VLOOKUPUses the left most column of a table to provide matching
HLOOKUPUses the top most row of a table to provide matching
So what does this all mean? Let's begin with looking at a simple Excel sheet.
As you can see you have a Part, Price, and Contact column. Now with this table let's see what result we get from each function, let begin with looking at the parameters for LOOKUP. =LOOKUP(value,lookup_vector,result_vector) value, which is required, is the value that you want to find. lookup_vector, which is also required, is the list of items (aka range) that you want to search for a match. result_vector, which is optional, is the list of of items that should be returned once a match has been found. The catch to this function is that the lookup_vector MUST be in ascending order or else you will get some funky results. So let's say you do that, we will sort the Part column in ascending order and then write our function as such in cell E1. =LOOKUP(E2,A2:A11,B2:B11) Right now the return value will be #N/A, but if you type "Hand Saw" into cell E2, E1 will equal 18.95. Let's see why, this is the table sorted by column A, which is our Part column:
r/cABC
1PartPriceContact
21 gal. Paint$8.95 Bob
31/2 gal. Paint$5.89 Bob
415 ft. Ladder$22.95 Jane
52" Screw$0.20 Earl
62.5" Screw$0.22 Earl
73 ft PVC Piping$11.43 Betty
87 ft. Ladder$28.96 Jane
9Electric Saw$39.95 Joe
10Hand Saw$18.95 Joe
11T-Joint PVC$1.73 Betty
The lookup formula says to use the value in E2 to find a match in cells A2 through A11. When we set E2 = "Hand Saw" it finds a match in A9 which is the 8th line that we searched. Thus the 8th result in B2:B11, which is $18.95, will be returned. Again, this only applies when the lookup_vector is sorted in ascending order. The result_vector is optional, if you omit it, then it will return the match to you, in other words it would act like you had typed: =LOOKUP(E2,A2:A11,A2:A11) Which doesn't seem like it would be useful but is if you are doing something like checking if a person is on a list or not, which could be done with COUNTA but I'll talk about that later on down the road. Again, the end result doesn't seem so spectacular, especially given that you can always use CTRL+F to find what you are looking for. Repeat with me, "It's not about the direct results but what you can do with it with other functions." LOOKUP has another, different, syntax that uses an array. I will cover that syntax in a later post, but it basically does the same thing except instead of providing a range of cells to look at, you provide a static list of items to check against. Also, LOOKUP not only works looking down but can also look across, let's transpose our table and look at it like this.
r/cABCDEFGHIJK
1Part1 gal. Paint1/2 gal. Paint15 ft. Ladder2" Screw2.5" Screw3 ft PVC Piping7 ft. LadderElectric SawHand SawT-Joint PVC
2Price$8.95 $5.89 $22.95 $0.20 $0.22 $11.43 $28.96 $39.95 $18.95 $1.73
3ContactBobBobJaneEarlEarlBettyJaneJoeJoeBetty
This method is a little harder because Excel doesn't give you an easy way to sort a row in ascending order, but this is what it looks like. You can now do the exact same thing using this code and placing this formula in cell A4: =LOOKUP(A5,B1:K1,B2:K2) Typing "Hand Saw" in A5 will result in A4 = 18.95 This is useful for times when you have things where each column is a month and you want to look at a single value per each month.
Let's look at VLOOKUP, VLOOKUP is an explicit LOOKUP going up and down. LOOKUP can go left to right or up and down, VLOOKUP is restricted to just up and down. However, VLOOKUP is a little better because things do not need to be sorted in ascending order (if preconditions are met), just each item in the lookup vector has to be unique (well that's not a requirement, just that if there are two of the same thing in the result vector, only the first match will be returned.) Also, VLOOKUP is a little faster than LOOKUP itself. That's because with LOOKUP Excel has to determine which direction the lookup is happening in before evaluating. So let's take a look at VLOOKUP. =VLOOKUP(lookup_value, table_array, col_index_num, range_lookup) lookup_value, which is required, the the value you want to lookup to find a match. table_array, which is also required, is the table that you want to lookup in. col_index_num, which you guessed it is requires, is the column from that table that you want the result to return. range_lookup, which is optional, is a little more difficult to define so I'll cover that in more detail below. The lookup_value parameter should already be obvious to you. That's basically what you want to compare. However, the next one, table_array may have you a bit miffed. This is a single to multiple column range. So A1:A123 and A1:E123 are both valid inputs. You might be asking yourself, if we have multiple columns, how does Excel know which column to search to find a match? The answer is that VLOOKUP will always search the first (left-most) column [I'm not sure if that logic is reversed for countries that read right-to-left] for the value that is given in the lookup_value parameter. The third parameter is the col_index_num. If a match is found, then it is this column number in the table_array that is returned. This is what usually confuses people because the column being searched in 1, the next column is 2, and so on. People sometimes think, this is the fifth column in the worksheet, but it may only be the third column in the table_array. The last parameter, range_lookup, is a hard one to remember because it changes how the whole function will work. Basically the parameter accepts a TRUE or FALSE value. Paradoxically, TRUE means to NOT do an exact match and FALSE means to do an exact match. I've heard it a million different ways. Basically, TRUE means find exact or close to exact match, FALSE means find exact match only. How you answer this will affect how the functions works. Let's look at if we state TRUE. Here we MUST have the list in ascending order. Also, you cannot use any kind of wildcard, or at least use a wildcard and expect some results. Multiple entries of the same value will return the first value in the list. So if you have "Popcorn Cheese" and "Popcorn flavors" and your lookup_value = "Popcorn". You are going to get "Popcorn Cheese". FALSE changes a few things. The list can be in any order! You are allowed to use wildcards! That means you can lookup "Ba?" and it will look for any three letter words which begin with "Ba" In the event of multiple entries that matching what you are searching for, only the first, AS THE LIST IS CURRENT LAYOUT, will be returned. The catch here is that if you resort the list differently, you'll get a different result. That last part can be the most confusing. So basically a VLOOKUP allows us to use a table and if we use the FALSE option, then you don't have to worry about the table being in order. Most of the time you will want to use VLOOKUP, because you don't have to worry about the list being in order. However, TRUE or FALSE, you do have to have some concern about duplicate entries within the table. You can always use COUNTIF to determine if duplicates exist and if so how you want to handle that kind of situation.
Now I would go into detail about HLOOKUP but it is pretty much VLOOKUP but looks left to right as opposed to top to bottom as VLOOKUP. So there isn't much more to cover about it. One of the big questions I get about HLOOKUP is, "What do you use it for?" VLOOKUP seems pretty useful because it searches in the same manner as databases tend to spit out information. HLOOKUP doesn't seem so useful since databases tend not to spit information out from left to right. Now I can't speak for everyone, but I use HLOOKUP for zoom-in functionality. Let's say you have a table with factors on the far left and dates as the column headers. Basically what you want to do is allow a drop down that allows the user to select a factor in the far left column and then type in a date. With these two inputs, provide the data at intersection. Basically what we want to do is use HLOOKUP. The first parameter is the date we've typed in, the table is obviously the table we made up, and the return row is the row number that we've selected from the drop down. Since the dates are pretty much in ascending order 01/01/2011 to 01/01/2012 (for example), you don't have to use FALSE, you can use TRUE. So wait a second how exactly is this useful? Imagine you need to extract data quickly out of a spreadsheet that your boss uses to keep track of things. HLOOKUP would be perfect, so long as your boss keeps the format the same all the way across. Things look the way your boss wants it to look, and you're able to pull useful information from the spreadsheet, without having to yank your hair out! It's win, win! That's what make the LOOKUP family of functions so important. They are useful tools in extracting data from complex sets. Once you have the data extracted, you in all your wonderful Excel glory can then begin to normalize it, because we all know that the first step to taking your boss' crazy Excel sheet and placing it into a nice and neat Access file, is to normalize it.

No comments: