(Refine) Two columns have the same information but jumbled up

This is to illustrate one way in which you can approach faceting data.

A complicated expression is created piece-by piece from simpler ones.


I’ve tidied up a list of names so that it’s all nice and tidy: LASTNAME FIRSTINITIAL.

Now I’ve found another list of names which I am almost certain contains exactly the same names. If the values are the same then the second column can just be forgotten about.

But this second list has the initials removed half the time and the names sometimes in different orders, so simply filtering by value of column one == value of column two won’t work.

Look at these lines, they’re small enough that you can see the surnames are the same. But if there were tens of thousands of entries which were the same in this way you’d never be able to tell.

showing the columns

The following are all Custom Facets. I’m not trying to change anything, just checking to see if they’re the same. Start by getting a fingerprint() of the surnames

#Isolate the surnames

First I replaced anything that isn’t a letter with a space value.replace("."," "), for any occasions where there were two initials with a period between them.

Then I got rid of any initials by using split(" ") to separate the values at each space:

replace periods and then split at spaces

I was using this to make sure that double initials get separated out and don’t get stuck together and end up being mistaken for a very short surname. You can see that the ampersand & is still there - fingerprint() below will discard it.

Then, each of these separated out values which was only a single character long was replaced with nothing "".

This is done using forEach():

First of all describe an array. We’ve done this with value.replace("."," ").split(" "). Each of the bits created by this split is an element in the array. forEach() will go through them one by one doing whatever you specify.

Then allocate a variable. This gives you a easy thing to refer to when you’re describing what forEach() should do with each element in the array. It can be anything, let’s make it v.

Then describe what should happen to each element of the array, using v to represent the element. As we want to remove elements which are only a single character long we will ise if()

if() gets three arguments. The first argument has to be an expression. if the expression is true it returns the second argument if the expression is false it returns the third argument.

So to get rid of any elements which are just a single character long: if(length(v)==1,"",v)

replace each element that's only a single character long with nothing

Use join(" ") to reattach the remaining bits back together with a space between them.

fingerprint() takes a string composed of a number of words and rearanges the words in alphabetical order. It also discards anything that’s not a letter or a number. This version of the string is easier to use for comparisons.

stick back together and get fingerprint

This gets a fingerprint of the surnames in one of the columns.

Now we just need to do the same for the other column and compare them.

#Compare the columns

The result true will be given for entries where the same surnames occur in both columns.

	value.replace("."," ").split(" "),v,if(length(v)==1,"",v)
	).join(" ")
	cells.Name1.value.replace("."," ").split(" "),v,if(length(v)==1,"",v)
	).join(" ")

The only change between the two lines is that value in the first is replaced with cells.Name2.value in the second

new column with true or false

To illustrate what happens when they’re not the same:

new column with true or false with false entry