VERC · Hash Tables

AUG	Oct	JUL
	27
2005	2006	2007

4 captures

18 Feb 06 - 11 Jul 07

Close Help

Popular Resources

Half-Life 2 Mod FAQ

Valve Hammer Editor

Hammer 3.5 beta test

Half-Life Utilities

game data files

ZHLT 2.5.3 custom build

Half-Life SDK

Feedback
If you've got any feedback, suggestions, or bugs to report regarding the Collective website, go here!

Feedback (301)

Hosted Sites

Valve ERC

Collective

TFMapped

Spirit of Half-Life

Selective Design

Pixel Reviews

recent articles

NPC and Item Placement Theory
17/03/05 11:35pm PST
Non-Player Character (NPC) and item placement can influence both the gameflow and immersion of a level. This article aims to give some pointers on how to properly place them.
- Hugh 'Hugh' Lloyd

Got Props?
13/03/05 08:32am PST
A common problem in HL2 mapping is props not showing up in game. This article explains why and offers solutions.
- Jeff 'Yesukai' Pritchard

Simulating Randomness
18/12/04 11:29pm PST
This article focuses on how to properly simulate random events that should occur at a certain average frequency, or within a certain probability per period of time.
- Skyler 'Zipster' York

Adding Single-Player Weapons to Half-Life 2
15/12/04 06:52pm PST
Covers the process behind adding weapons to a single-player Half-Life 2 modification.
- Skyler 'Zipster' York

Your world in HL2
06/12/04 12:17am PST
This article gives tips and advice to anyone wanting to make custom photorealistic textures to be used in Half-Life 2.
- Oksid

Hiding in Shadow
21/08/04 01:11pm PDT
Describes how to create a function that has monsters disregard you if you are hiding in a certain level of "darkness," which can be set from within map properties.
- Anders [Wolf] Jenbo (NoBody)

XSI EXP for Half-Life 2 Tutorial - Camera Control
23/09/04 12:43am PDT
A SOFTIMAGE|XSI tutorial explaining all of the camera controls available to you in XSI!
- Josh Enes

Bump Mapping in Half-Life
08/08/04 11:58am PDT
Details a method of achieving real-time bump mapping in Half-Life, and provides an implementation of the algorithm.
- Francis 'DeathWish' Woodhouse

Real-Time "TRON 2.0" Glow For Low-Spec Hardware
19/06/04 02:06pm PDT
A sequel to the original "Real-Time 'TRON 2.0' Glow" article, this describes how to implement real-time glow that works on low-spec graphics cards.
- Francis 'DeathWish' Woodhouse

Hitboxes and Code
05/06/04 06:25pm PDT
How do I make only one part of a monster take damage? Learn about the relationship between model hitboxes and what you can do with them in a characters code.
- Jonathan 'Teh_Freak' Smith

Hash Tables
[Tue Feb 12, 2002 / 01:55pm PST] Skyler 'Zipster' York - comments (6)

Welcome to the wonderful world of hash tables. Curiosly enough, hash has nothing to do with this article at all, and I have no idea why they call it a hash table anyway. Ironically enough, Webster's Dictionary defines hash as a synonym for muddle or "to confuse"; hodgepodge was the exact wording if I recall. But don't let that discourage you. Let me get back on topic. :) ^(/icebreaker)^

Hash tables, hash tables... just what the heck are they? There are many different organization styles for a hash table, but this article will focus on one: chaining. It's a hybrid between an array and a linked list, and its efficiency over the latter two depends on the programmer's ability to create a good hashing function - don't worry, I'll explain everything.

A hash table is an array of pointers, each which points to the beginning of a linked list. That's it! The efficiency of the system comes from the fact that array lookups are performed at godly speeds compared to linked lists iterations, but if the first array lookup doesn't satisfy the job, the linked list at that location is iterated until the requested information is found! If your array is designed well, you mimimize the number of lookups. Basically, it tackles the issue of speed associated with linked lists and the issue of memory usage associated with arrays. Here's a picture for you:

I mentioned earlier that the programmer plays a big role in determining the efficiency of the hash table. Hash tables are commonly used to store records and other database related things. It follows that a hash table must have some way of quickly finding and storing data. But how? This is where the hashing function comes in. It's sole job is to translate raw data into an index for the hash array. So you can see that this function must be well designed in order to be fast, and have an algorithm that produces varied indexes - remember that we fall back on linked lists if there is more than one record at a certain index.

The hashing function need not be overly complicated - just enough to quickly generate a nice range of indexes. Let's look at an example. Suppose we have a dictionary program that stores the definitions of words. One record consists of a word string and a definition string. The most logical organization of this array would be alphabetacally; thus our initial hash table will have 26 pointers, one for each letter of the alphabet. The following function will take a word (a.k.a "raw data") and convert it into an index:

int HashIt(const char *word) { return ((*word) - 'a'); }

This makes the assumption that the word is in lowercase, or at least the first letter, but you get the idea - the proper index is returned. Of course, you might want to do some error checking for characters that would return negative indexes or those larger than the array, but it should be kept outside our hashing function.

At any rate, once you have the index, you jump to that point in the array (lookup #1) and iterate through the linked list at that position until you find the word you are looking for.

That last example wasn't that practical. For example, I can think of hundreds (ok, maybe only a couple dozen) of words off the top of my head that start with the letter 's', and they'll all be linked at the same index. The chances that the desired word will be near the beginning of the list get slim, and it would degenerate into your everyday linked list iteration. But one thing's for sure. It beats a regular linked list for things such as word associations (or whatever the term I should use is).

Hash table indexes don't really need to rely upon a logical relationship with the data it's "hashing". As long as it does its job, go ahead and use it. Lastly, there's the issue of how large to initialize the hash array to (hey, stop staring at my dangling participle!!). For something as simple as a dictionary, 26 works for basic alphabetizing. But let's suppose you want to store these words based on the first two letters. That would make our array 26² elements long, or 676. It works out that this array would require about 2.5 K more in memory for the pointers. That's nothing, yet we would optimize the speed of the hash table by reducing the number of words per index. Sweet. And if we go for three letters? 17576 entries, 66 K more - again, additional optimization with an unoticable increase in pointer memory.

Well there you have it - hash tables. Not as notorious as linked lists or arrays (because they require a bit more in terms of implementation), but they serve their purpose well. Remember, they can be used for more than just character data. However, don't use them too much. They should be used primarily when record-style information (which is also refered to as "associated data" in the context of a hash table) is being stored and high-speed access in required. Like everything else in C++, and programming in general, they aren't the new solution to all our data storage needs. However, when used appropiatly, you can really whip up some nice applications.

Recommended Reading: Hash Tables

article created on Sat Feb 09, 2002 / 09:04pm PST
this item has been viewed 3198 times [general / coding]

Only registered users can post comments. Have you registered yet?

user comments

displaying comments of normal or higher rating

1.
Nicholas "Starwolf" Pitino
Tue Nov 12, 2002 / 09:29am PST

Well, that was all greek to me but never the less, good article Zippy.

2.
Fraktur
Wed Jan 08, 2003 / 06:08am PST

Uhm!! The size of the table doesnt depend on your hash function. Your example with the 2 first letters of a word; It gives 26^2 possible keys. but u dont need an table that big. it may be smaller but not bigger. simply divide the key by the table size. just ensure that your hash function is able to produce more keys than you'll need and divide. also be sure that your hash functions can produce every key so that every table position is usable.

And you dont even need the lists. if u get a key that is already occupied you search the next free array element. or use fibonacci to determine next element to use to spread the data better. And dont forget to make modulo table size after adding something to the key or you'll get core dumped . Also only use that if u know the maximum amount of records u need to store.

3.
Skyler 'Zipster' York
Wed Jan 08, 2003 / 09:32am PST

The concept behind a hash table is to find a reasonable balance between the number of array indices and the speed of a lookup. True, the table could be less than 26^2 keys; it could follow my first example and only be 26 keys. Heck, why not make that just plain 2? However, the less number of keys you have, the longer it will take to find any given key on average. That is why I suggested you should evaluate just how much memory you are going to use for the pointers as you increase the size of the table, because it turns out that you can generate quite a large table and still not use that much money.

Uhm!! The size of the table doesnt depend on your hash function.
It doesn't depend on the hash function. Once again, it all comes down to speed vs. storage. The reason it seems that the hash function is so closely related to the size of the table is because the hash function is designed to produce a range of indices that spread across the entire table, just like you said. If a collision does occur, then you fall back on the linked list at that index. The only assumption made is that there are as many words beginning with "aa" as there are beginning with "zz", or "nn". To generate a better distribution would take some sort of active knowledge of language. For example, if you know there are more "bb" words than "xx" words, then you spread out the "bb" words across multiple indices with some clever hash. But without any knowledge of this, the best you can do is generate the widest range of indices possible and hope that it averages out.

4.
Brian 'Yamazaki' Dorner
Wed Jan 08, 2003 / 12:07pm PST

To further derail the discussion of has tables, I could give my quick solution to alphabetical data. A 26 pointer tree (I forget the scientific term for this, n-tree?), each pointer being for a specific letter in the alphabet. Time to search the tree structure and find any particular word is equal to the length of the word and has nothing to do with the number of words in the tree. I used this cheap trick on a class project. Everyone else had 2-pointer trees and there 60,000 word dictionaries took 30mins to create and sort, mine took 30 seconds. Saved me days of time when everyone else would run it, walk off for a coffee and return... and mine had already finished several times over and I had the chance to debug it. I even got a bonus point for specifically designing the tree to match what data it was designed to hold (dictionary), as opposed to using the generic one (binary).

And that holds true for hashing (I managed to return to the topic! Yay!). Data structures for games should be specifically designed for the data they're meant to hold. You won't find generic insertion and deletion functions in a BSP tree, everything will be centered on quickly parsing the tree for geometry... not handling generic chunks of data.

If you've got a game with several hundred AI controlled units in play, and you need to quickly find a specific unit and work with it somehow, you'll probably write up a hash function that can quickly find that unit based upon its type (air, ground, sea), behaviour (enemy, friend, neutral, non-AI, etc.) or whatever parameters would be best used to sort all your units.

And so on

5.
Skyler 'Zipster' York
Wed Jan 08, 2003 / 03:55pm PST

Ahh, so they all used a binary tree while you used a smarter, 26 pointer tree. That's quite clever, and would make it much faster.

In retrospect, I find that maybe the dictionary example wasn't the best usage of a hash table, but it's certainly the easiest way to explain what a hash table is and how it can be used.

6.
Fraktur
Thu Jan 09, 2003 / 03:46am PST

right. know the data and youre able to speed up. e.g. perfect hashing. if u know that every key is uniqe (e.g. IP numbers), finding one in a hashtable is O(1).
I first learned perfect hashing. After i understood it i found the problem that different keys may produce same hash values. And the solution is no big deal.

I like the 26 pointer tree. Lets say u want to store phone numbers. then u need more RAM for pointers then for phone numbers. Take such an empty tree and put Yamazakis phone# in. Then you have 208 pointers. All except 8 are NULL.
comment modified on Thu Jan 09, 2003 / 03:56am PST