SQL Precedence Matching

0 votes
asked Dec 22, 2008 by charlie-white

I'm trying to do precedence matching on a table within a stored procedure. The requirements are a bit tricky to explain, but hopefully this will make sense. Let's say we have a table called books, with id, author, title, date, and pages fields.

We also have a stored procedure that will match a query with ONE row in the table.

Here is the proc's signature:

create procedure match
  @pAuthor varchar(100)
  ,@pTitle varchar(100)
  ,@pDate varchar(100)
  ,@pPages varchar(100)

 as

 ...

The precedence rules are as follows:

  • First, try and match on all 4 parameters. If we find a match return.
  • Next try to match using any 3 parameters. The 1st parameter has the highest precedence here and the 4th the lowest. If we find any matches return the match.
  • Next we check if any two parameters match and finally if any one matches (still following the parameter order's precedence rules).

I have implemented this case-by-case. Eg:

 select @lvId = id 
 from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 
 ,pages = @pPages

if @@rowCount = 1 begin
  select @lvId
  return
end

 select @lvId = id 
  from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 

 if @@rowCount = 1 begin
  select @lvId
  return
end

....

However, for each new column in the table, the number of individual checks grows by an order of 2. I would really like to generalize this to X number of columns; however, I'm having trouble coming up with a scheme.

Thanks for the read, and I can provide any additional information needed.


Added:

Dave and Others, I tried implementing your code and it is choking on the first Order by Clause, where we add all the counts. Its giving me an invalid column name error. When I comment out the total count, and order by just the individual aliases, the proc compiles fine.

Anyone have any ideas?

This is in Microsoft Sql Server 2005

5 Answers

0 votes
answered Dec 22, 2008 by tom-h

You don't explain what should happen if more than one result matches any given set of parameters that is reached, so you will need to change this to account for those business rules. Right now I've set it to return books that match on later parameters ahead of those that don't. For example, a match on author, title, and pages would come before one that just matches on author and title.

Your RDBMS may have a different way of handling "TOP", so you may need to adjust for that as well.

SELECT TOP 1
     author,
     title,
     date,
     pages
FROM
     Books
WHERE
     author = @author OR
     title = @title OR
     date = @date OR
     pages = @pages OR
ORDER BY
     CASE WHEN author = @author THEN 1 ELSE 0 END +
     CASE WHEN title = @title THEN 1 ELSE 0 END +
     CASE WHEN date = @date THEN 1 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC,

     CASE WHEN author = @author THEN 8 ELSE 0 END +
     CASE WHEN title = @title THEN 4 ELSE 0 END +
     CASE WHEN date = @date THEN 2 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC
0 votes
answered Dec 22, 2008 by dave-costa

I don't have time to write out the query, but I think this idea would work.

For your predicate, use "author = @pAuthor OR title = @ptitle ...", so you get all candidate rows.

Use CASE expressions or whatever you like to create virtual columns in the result set, like:

SELECT CASE WHEN author = @pAuthor THEN 1 ELSE 0 END author_match,
       ...

Then add this order by and get the first row returned:

ORDER BY (author_match+title_match+date_match+page_match) DESC,
         author_match DESC,
         title_match DESC,
         date_match DESC
         page_match DESC

You still need to extend it for each new column, but only a little bit.

0 votes
answered Dec 22, 2008 by joshberke
      select id, 
               CASE WHEN @pPages = pages 
                    THEN 1 ELSE 0 
               END
             +  Case WHEN @pAuthor=author 
                    THEN 1 ELSE 0 
                END AS 
             /* +  Do this for each attribute. If each of your 
attributes are just as important as the other 
for example matching author is jsut as a good as matching title then 
leave the values alone, if different matches are more 
important then change the values */ as MatchRank  
        from books 

        where  author = @pAuthor OR
               title = @pTitle OR
               date = @pDate

     ORDER BY  MatchRank DESC

Edited

When I run this query (modified only to fit one of my own tables) it works fine in SQL2005.

I'd recommend a where clause but you will want to play around with this to see performance impacts. You will need to use an OR clause otherwise you will loose potential matches

0 votes
answered Dec 23, 2008 by dkretz

Try this:

ALTER PROCEDURE match  
  @pAuthor varchar(100)  
 ,@pTitle varchar(100)  
 ,@pDate varchar(100)  
 ,@pPages varchar(100)  
-- exec match 'a title', 'b author', '1/1/2007', 15  
AS

SELECT  id,

        CASE WHEN author = @pAuthor THEN 1 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 1 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 1 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS matches,

        CASE WHEN author = @pAuthor THEN 4 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 3 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 2 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor 
    OR title = @pTitle 
    OR bookdate = @PDate 
    OR pages = @pPages
ORDER BY matches DESC, score DESC

However, this of course causes a table scan. You can avoid that by making it a union of a CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take the TOP 1 anyway.

EDIT: Added the WHERE ... OR clause. I'd feel more comfortable if it were

SELECT ... FROM books WHERE author = @pAuthor
UNION
SELECT ... FROM books WHERE title = @pTitle
UNION
...
0 votes
answered Dec 23, 2008 by charlie-white

In regards to the Order By clause failing to compile:

As recursive said(in a comment), alias' may not be within expressions which are used in Order By clauses. to get around this I used a subquery which returned the rows, then ordered by in the outer query. In this way I am able to use the alias' in the order by clause. A little slower but a lot cleaner.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...