Algorithm: Selecting common elements of a collection

0 votes
asked Dec 2, 2010 by marty-pitt

I recently had to write the following algorithm:

Given a group of tags, and a group of blog posts, where a blog post may contain zero-to-many tags, return the tags common to all posts.

This comparison is being done in-memory - accessing either collection does not cause a trip across the network (ie., to a database, etc).

Also, the Tags collection does not have a reference to BlogPosts that contain it. BlogPosts have a collection of Tags they contain.

Below is my implementation. It performs just fine, but I'm curious if there was a better way to implement it.

My implementation is in Actionscript, but I'm curious more from a algorithim perspective, so examples in any language is fine. (But if I don't know the language, I may ask you to clarify some aspects)

Any examples of improvements would be greatly received.

    private function getCommonTags(blogPosts:Vector.<BlogPost>):Vector.<Tag>
        var commonTags:Vector.<Tag> = new Vector.<Tag>();
        if (!blogPosts || blogPosts.length == 0)
            return commonTags;

        var blogPost:BlogPost = blogPosts[0];
        if (!blogPost.tags || blogPost.tags.length == 0)
            return commonTags;

        commonTags = Vector.<Tag>(blogPosts[0].tags);

        for each (var blogPost:BlogPost in blogPosts)
            if (!blogPost.tags || blogPost.tags.length == 0 || commonTags.length == 0)
                // Updated to fix bug mentioned below
                // Optomized exit - there are no common tags
                return new Vector.<Tag>();

            for each (var tag:Tag in commonTags)
                if (!blogPost.containsTagId(
        return commonTags;

2 Answers

0 votes
answered Dec 2, 2010 by jason

Well, you just need an efficient algorithm for computing the intersection of two sets because you can just repeatedly invoke the algorithm for more than two sets.

A quick algorithm is to add the items of the first set to a hash table and then iterate through the second set checking the hash table to see if it is present; if it is you add it to the list of items to be returned in the intersection.

0 votes
answered Dec 3, 2010 by kaleidic

I can state this program in an English sentence: "Return all tags that occur uniformly among the posts."

Giving the name all_tags to the list of tags and post_tags to the list of tags-associated-to-posts, I can say the same thing with this sentence in the J programming language

   all_tags #~ (#=+/) all_tags e.&>"_ 0 post_tags

Looking at this in some detail:

  • #~ means "copy where"

  • (# = +/) means "tally equals sum"

  • e. means "exists in"

  • &>"_ 0 modifies e. so the entirety of all_tags is compared with each of the tag-sets in post_tags

If we want to make this a program that takes two arguments, rather than a program that is specific to these named lists, the corresponding program could be:

   common_to=:  [ #~ [:(#=+/) [ e.&>"_ 0 ]

Running that program with the same data names would look like this:

   all_tags common_to post_tags

It seems worth noting that we don't actually need the comprehensive list of tags, as that can be derived. (The calculation is ~. ; post_tags.) That means we could write the program to take only a single argument. But since the problem presumes we already have the all_tags list computed, there's no need to compute it again.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter