# Algorithm: Selecting common elements of a collection

I recently had to write the following algorithm:

Given a group of tags, and a group of blog posts, where a blog post may contain zero-to-many tags, return the tags common to all posts.

This comparison is being done in-memory - accessing either collection does not cause a trip across the network (ie., to a database, etc).

Also, the `Tags` collection does not have a reference to `BlogPosts` that contain it. `BlogPosts` have a collection of `Tags` they contain.

Below is my implementation. It performs just fine, but I'm curious if there was a better way to implement it.

My implementation is in Actionscript, but I'm curious more from a algorithim perspective, so examples in any language is fine. (But if I don't know the language, I may ask you to clarify some aspects)

Any examples of improvements would be greatly received.

``````    private function getCommonTags(blogPosts:Vector.<BlogPost>):Vector.<Tag>
{
var commonTags:Vector.<Tag> = new Vector.<Tag>();
if (!blogPosts || blogPosts.length == 0)
return commonTags;

var blogPost:BlogPost = blogPosts[0];
if (!blogPost.tags || blogPost.tags.length == 0)
return commonTags;

commonTags = Vector.<Tag>(blogPosts[0].tags);

for each (var blogPost:BlogPost in blogPosts)
{
if (!blogPost.tags || blogPost.tags.length == 0 || commonTags.length == 0)
// Updated to fix bug mentioned below
// Optomized exit - there are no common tags
return new Vector.<Tag>();

for each (var tag:Tag in commonTags)
{
if (!blogPost.containsTagId(tag.id))
{
commonTags.splice(commonTags.indexOf(tag),1);
}
}
}
return commonTags;
}
``````

Well, you just need an efficient algorithm for computing the intersection of two sets because you can just repeatedly invoke the algorithm for more than two sets.

A quick algorithm is to add the items of the first set to a hash table and then iterate through the second set checking the hash table to see if it is present; if it is you add it to the list of items to be returned in the intersection.

I can state this program in an English sentence: "Return all tags that occur uniformly among the posts."

Giving the name `all_tags` to the list of tags and `post_tags` to the list of tags-associated-to-posts, I can say the same thing with this sentence in the J programming language

``````   all_tags #~ (#=+/) all_tags e.&>"_ 0 post_tags
``````

Looking at this in some detail:

• `#~` means "copy where"

• `(# = +/)` means "tally equals sum"

• `e.` means "exists in"

• `&>"_ 0` modifies `e.` so the entirety of `all_tags` is compared with each of the tag-sets in `post_tags`

If we want to make this a program that takes two arguments, rather than a program that is specific to these named lists, the corresponding program could be:

``````   common_to=:  [ #~ [:(#=+/) [ e.&>"_ 0 ]
``````

Running that program with the same data names would look like this:

``````   all_tags common_to post_tags
``````

It seems worth noting that we don't actually need the comprehensive list of tags, as that can be derived. (The calculation is `~. ; post_tags`.) That means we could write the program to take only a single argument. But since the problem presumes we already have the `all_tags` list computed, there's no need to compute it again.