STL algorithm for merge with addition

0 votes
asked Jul 17, 2009 by sdg

I was using stl::merge to put two sorted collections into one.

But my object has a natural key; and a defined addition semantic, so what I am after is a merge_and_sum that would not just merge the two collections into a single N+M length collection, but if the operator== on the object returned true, would then operator+ them.

I have implemented it thus

template<class _InIt1, class _InIt2, class _OutIt> 
_OutIt merge_and_sum(_InIt1 _First1, _InIt1 _Last1, _InIt2 _First2, _InIt2 _Last2, _OutIt _Dest )
{   // copy merging ranges, both using operator<
    for (; _First1 != _Last1 && _First2 != _Last2; ++_Dest)
    {
        if ( *_First2 < *_First1 )
            *_Dest = *_First2, ++_First2;
        else if ( *_First2 == *_First1)
            *_Dest = *_First2 + *_First1, ++_First1, ++_First2;
        else
            *_Dest = *_First1, ++_First1;
    }
    _Dest = copy(_First1, _Last1, _Dest);   // copy any tail
    return (copy(_First2, _Last2, _Dest));
}

But was wondering if I have reinvented something that is composable from the other algorithms.

3 Answers

0 votes
answered Jul 17, 2009 by jonathan-graehl

It sounds like your collections are like multisets with duplicates collapsed by your + operator (maybe just summing the multiplicities instead of keeping redundant copies). I assume so, because you're not changing the sorting order when you +, so + isn't affecting your key.

You should use your implementation. There's nothing in STL that will do it as efficiently. The closest semantic I can think of is standard merge followed by unique_copy. You could almost get unique_copy to work with a side-effectful comparison operator, but that would be extremely ill advised, as the implementation doesn't promise to only compare things directly vs. via a value-copied temporary (or even a given number of times).

Your type and variable names are unpleasantly long ;)

0 votes
answered Jul 17, 2009 by rlbond

Well, your other option would be to use set_symmetric_difference to get the elements that were different, then use set_intersection to get the ones that are the same, but twice. Then add them together and insert into the first.

typedef set<MyType, MyComp> SetType;
SetType merge_and_add(const SetType& s1, const SetType& s2)
{
    SetType diff;
    set_symmetric_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), inserter(s2, s2.end());
    vector<SetType::value_type> same1, same2;
    set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(same1));
    set_intersection(s2.begin(), s2.end(), s1.begin(), s1.end(), back_inserter(same2));
    transform(same1.begin(), same1.end(), same2.begin(), inserter(diff, diff.begin()), plus<SetType::value_type, SetType::value_type>());
    return diff;
}

Side note! You should stick to either using operator==, in which case you should use an unordered_set, or you should use operator< for a regular set. A set is required to be partially ordered which means 2 entries are deemed equivalent if !(a < b) && !(b < a). So even if your two objects are unequal by operator==, if they satisfy this condition the set will consider them duplicates. So for your function supplied above I highly recommend refraining from using an == comparison.

0 votes
answered Jul 17, 2009 by steve-jessop

You could use std::merge with an output iterator of your own creation, which does the following in operator=. I think this ends up making more calls to operator== than your version, though, so unless it works out as less code it's probably not worth it.

if ((mylist.size() > 0) && (newvalue == mylist.back())) {
    mylist.back() += newvalue;
} else {
    mylist.push_back(newvalue);
}

(Actually, writing a proper output iterator might be more fiddly than that, I can't remember. But I hope you get the general idea).

mylist is a reference to the collection you're merging into. If the target doesn't have back(), then you'll have to buffer one value in the output iterator, and only write it once you see a non-equal value. Then define a flush function on the output iterator to write the last value, and call it at the end. I'm pretty sure that in this case it is too much mess to beat what you've already done.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...