Sunday, December 18, 2005
Looking at the project that I mentioned earlier this month I began to wonder if I could use Soundex to identify possible matches.

I first came across Soundex in the late 70's when working on a hospital IT system. The system had a surname index that used Soundex ti generate the initial key into the index (the index then had the patient ID number as a secondary key thus considerably reducing the time taken to search for patients by surname).

The reason for using the Soundex algorithm is that terms that are often misspelled can be a problem for database designers, for example, Names are variable length, can have strange spellings, and they are not unique. Many names have a wide rang of ethnic origins, which can give us names pronounced the same way but spelled differently and vice versa.

To solve this problem, we need to find some method of coding names which can find similar sounding one. Just such a family of coding algorithms exist and are called SoundExe, after the first patented version which was patented by Margaret O'Dell and Robert C. Russell in 1918.

A Soundex search algorithm takes a word, such as a person's name, as input and produces a character string which identifies a set of words that are (roughly) phonetically alike. It is very handy for searching large databases when the user has incomplete data.

The algorithm that I used in the late 70's is actually fairly straight forward to code and requires just a single pass over the input word as can be seen from the steps shown below :-

1. Capitalize all letters in the word and drop all punctuation marks. Pad the word with rightmost blanks as needed during each procedure step.
2. Retain the first letter of the word.
3. Change all occurrence of the following letters to '0' (zero):
'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'.
4 Change letters from the following sets into the digit given:
1 = 'B', 'F', 'P', 'V'
2 = 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z'
3 = 'D','T'
4 = 'L'
5 = 'M','N'
6 = 'R'
5. Remove all pairs of digits which occur beside each other from the string that resulted after step (4).
6. Remove all zeros from the string that results from step 5.0 (placed there in step 3)
7. Pad the string that resulted from step (6) with trailing zeros and return only the first four positions, which will be of the form .

To give an example using my surname "Mitchell" :-

1. Becomes MITCHELL

2. Keep the M

3. The rest becomes 0TC00LL (losing the first letter and change the I, H and E to zeros)

4. Becomes M0320044

5. Becomes M03204

6. Becomes M324

7. Final result is M324 and this is stored in the index

If someone was searching for Mitchel the rules above would return the same result thus increasing the chance of finding the right person.

There are many uses for Soundex and it need not only be used for names, for example it could also be used for addresses or any other free format alpha string that needs to be searched quickly and afficiently.

Friday, December 09, 2005
As part of a project for a prospective client I have been asked to look at the state of their client information. This is held across several databases and they want to combine the information into a master data-set.

I know from experience of health authority patient Admin systems and financial fund management systems that there is more than likely going to be duplicates across the different sources of data.

Having looked, briefly, into the possibility of doing the work myself I decided that time would be be better spent if this could be contracted out to a third part.

This company can provide both deduplication software and services to merge and de-dupe data from different databases, even in different formats, although for this exercise there would be no need to use them for excel spreadsheet data.

I'll get a cost from them and build this into the quote for the prospect (I think I'll even mention the fact that the work is being carried out by experts in the field - although at this stage I obviously will not tell them who it is).
Tuesday, December 06, 2005
It's now only 20 days to Christmas, and time is running out for getting presents sorted.

Yesterday I found just the thing for one of the Grandchildren and was able to order it on-line for collection from a local store here in Peterborough. The only problem was that when I went to pay and collect it the store was too busy to cope with the number of customers at the checkouts. Now, this store has some machines that allow you to enter the reservation number or the product code, insert your credit card and pay for the item with out going anywhere near the check-outs. "Great" I thought, "I'll do that". Unfortunately, of the 4 machines, once had run out of paper to print the receipt, two others were taking peoples card details and printing blank receipts and the 4th had a queue of people that reached the back of the store and halfway back to the front.

Guess what I did - yes, that's right, I walked out of the store and into the toy shop down the road. They had the same item at the same price and no queues !

Technology is great, but when it goes wrong - boy does it screw up the process !
Monday, December 05, 2005
Do you know how confidential login information of an actual online account (such as PayPal and online bank accounts for example) owner can be stolen and misused?

1. Being careless with your information: This type of fraud can be committed very easily and does not require too much effort on the part of the fraudster. Users very often write down their login details for various websites with the fear of forgetting them. Anyone having access to these written details can login to the online account and treat the account as if it was his own. Another possibility that could easily open an online banking account to fraud is when the user selects a very simple or easy password that can be easily guessed, such as their first name. Fraudsters only need to make a few guesses before they arrive at the correct password to enter the account. These are the simplest ways in which fraud can be committed and they do not require any email scam to be done.

2. Identity theft through an email scam: Phishing, or identity theft as it is commonly known, involves an attempt by a fraudster to extract the login details of an account from the actual owner of the account. Armed with these details, the fraudster can be very dangerous as full control of the account can be obtained. In this case, emails will be randomly sent to many email addresses informing the receiver of a problem with their account. For these email scams to work, the receiver of the email needs to login to his account by clicking a link on the email. The exact contents of each email scam may differ (?we need you to confirm your details?, ?we have noticed strange transactions? etc) but the objective of all of them remains the same. Once the user clicks the link in the email, he is taken to a web page that closely resembles a regular login page, even down to having the correct logo's and login page. This page is, however, a fake and is hosted by the fraudster (not the bank / PayPal etc) with the sole purpose of collecting confidential login details from the real owner of the account. If the owner of the account falls for this trick their account will soon be operated (and probably emptied) by the fraudster. Attempts to phish online accounts have become quite common, you may get several each day, and each time a fraudster unleashes his cruel trick a number of innocent account holders become victims.

The above two methods account for a major share of the frauds and email scams being committed in recent times. It is not very difficult to stay clear from these frauds however :-

1. Choose a password that is not very easy to guess. Using your first or last name for your password is not a very good idea. Frauds can be committed easily if you note your password in places that are accessible to others. Remember to change your password periodically and certainly change it if you suspect that you have become a victim of an email scam or other type of fraud.

2. Never click links on emails to access your account. Always use your web browser and type in the complete name of the website to login. All email scams urge you to click a link on the email and access your website (you may notice that hovering over the link displays a diffferent site in the status bar of the browser). The login information is then saved to a website that is not the real website. This allows fraudsters to login to your online account and make transactions on your account. Make sure that when you are on the log-in page that the page has the locked padlock on the status bar and the address starts with https://

3. Login to your account periodically and look for any strange or unexpected transactions. The transactions could relate to either a receipt or payment of money. If you notice any abnormal movement in your account, consider it to be a possible fraud and inform the bank / PayPal immediately. Also change the password immediately to reduce the chances of further damage.

4. If you are in the habit of logging into your account and then leaving the active account minimized on your browser, you could be helping someone commit fraud on your online account very easily, especially if you walk away from your PC. Such security lapses do not require email scams or other methods. Always logout of your account once you have finished working on it or when you will not be using it for a couple of minutes. To be on the safe side, close the browser window, and if using a computer in public (a library or internet caf� for example), reboot the system when you have finished.

More tips about buying safely on-line are in this article.