In C++, is it still bad practice to return a vector from a function?

0 votes
asked Jun 28, 2010 by nate

Short version: It's common to return large objects—such as vectors/arrays—in many programming languages. Is this style now acceptable in C++0x if the class has a move constructor, or do C++ programmers consider it weird/ugly/abomination?

Long version: In C++0x is this still considered bad form?

std::vector<std::string> BuildLargeVector();
...
std::vector<std::string> v = BuildLargeVector();

The traditional version would look like this:

void BuildLargeVector(std::vector<std::string>& result);
...
std::vector<std::string> v;
BuildLargeVector(v);

In the newer version, the value returned from BuildLargeVector is an rvalue, so v would be constructed using the move constructor of std::vector, assuming (N)RVO doesn't take place.

Even prior to C++0x the first form would often be "efficient" because of (N)RVO. However, (N)RVO is at the discretion of the compiler. Now that we have rvalue references it is guaranteed that no deep copy will take place.

Edit: Question is really not about optimization. Both forms shown have near-identical performance in real-world programs. Whereas, in the past, the first form could have had order-of-magnitude worse performance. As a result the first form was a major code smell in C++ programming for a long time. Not anymore, I hope?

7 Answers

0 votes
answered Jan 28, 2010 by motti

If performance is a real issue you should realise that move semantics aren't always faster than copying. For example if you have a string that uses the small string optimization then for small strings a move constructor must do the exact same amount of work as a regular copy constructor.

0 votes
answered Jun 28, 2010 by peter-alexander

Dave Abrahams has a pretty comprehensive analysis of the speed of passing/returning values.

Short answer, if you need to return a value then return a value. Don't use output references because the compiler does it anyway. Of course there are caveats, so you should read that article.

0 votes
answered Jun 28, 2010 by nemanja-trifunovic

Just to nitpick a little: it is not common in many programming languages to return arrays from functions. In most of them, a reference to array is returned. In C++, the closest analogy would be returning boost::shared_array

0 votes
answered Jun 28, 2010 by jerry-coffin

At least IMO, it's usually a poor idea, but not for efficiency reasons. It's a poor idea because the function in question should usually be written as a generic algorithm that produces its output via an iterator. Almost any code that accepts or returns a container instead of operating on iterators should be considered suspect.

Don't get me wrong: there are times it makes sense to pass around collection-like objects (e.g., strings) but for the example cited, I'd consider passing or returning the vector a poor idea.

0 votes
answered Jun 28, 2010 by stinky472

I still think it is a bad practice but it's worth noting that my team uses MSVC 2008 and GCC 4.1, so we're not using the latest compilers.

Previously a lot of the hotspots shown in vtune with MSVC 2008 came down to string copying. We had code like this:

String Something::id() const
{
    return valid() ? m_id: "";
}

... note that we used our own String type (this was required because we're providing a software development kit where plugin writers could be using different compilers and therefore different, incompatible implementations of std::string/std::wstring).

I made a simple change in response to the call graph sampling profiling session showing String::String(const String&) to be taking up a significant amount of time. Methods like in the above example were the greatest contributors (actually the profiling session showed memory allocation and deallocation to be one of the biggest hotspots, with the String copy constructor being the primary contributor for the allocations).

The change I made was simple:

static String null_string;
const String& Something::id() const
{
    return valid() ? m_id: null_string;
}

Yet this made a world of difference! The hotspot went away in subsequent profiler sessions, and in addition to this we do a lot of thorough unit testing to keep track of our application performance. All kinds of performance test times dropped significantly after these simple changes.

Conclusion: we're not using the absolute latest compilers, but we still can't seem to depend on the compiler optimizing away the copying for returning by value reliably (at least not in all cases). That may not be the case for those using newer compilers like MSVC 2010. I'm looking forward to when we can use C++0x and simply use rvalue references and not ever have to worry that we're pessimizing our code by returning complex classes by value.

[Edit] As Nate pointed out, RVO applies to returning temporaries created inside of a function. In my case, there were no such temporaries (except for the invalid branch where we construct an empty string) and thus RVO would not have been applicable.

0 votes
answered Jun 28, 2010 by peterchen

The gist is:

Copy Elision and RVO can avoid the "scary copies" (the compiler is not required to implement these optimizations, and in some situations it can't be applied)

C++ 0x RValue references allow a string/vector implementations that guarantees that.

If you can abandon older compilers / STL implementations, return vectors freely (and make sure your own objects support it, too). If your code base needs to support "lesser" compilers, stick to the old style.

Unfortunately, that has major influence on your interfaces. If C++ 0x is not an option, and you need guarantees, you might use instead reference-counted or copy-on-write objects in some scenarios. They have downsides with multithreading, though.

(I wish just one answer in C++ would be simple and straightforward and without conditions).

0 votes
answered Sep 15, 2017 by boris

Indeed, since C++11, the cost of copying the std::vector is gone in most cases.

However, one should keep in mind that the cost of constructing the new vector (then destructing it) still exists, and using output parameters instead of returning by value is still useful when you desire to reuse the vector's capacity. This is documented as an exception in F.20 of the C++ Core Guidelines.

Compare:

std::vector<int> BuildLargeVector(int i) {
    return std::vector<int>(1000000, i);
}

int main() {
    for (int i = 0; i < 100; ++i) {
        std::vector<int> v = BuildLargeVector(i);
        // [...] do smth with v
    }
}

With:

void BuildLargeVector(/*out*/ std::vector<int>& v, int i) {
    v.assign(1000000, i);
}

int main() {
    std::vector<int> v;
    for (int i = 0; i < 100; ++i) {
        BuildLargeVector(/*out*/ v, i);
        // [...] do smth with v
    }
}

In the first example, there are many unnecessary dynamic allocations/deallocations happening, which are prevented in the second example by using an output parameter the old way, reusing already allocated memory. Whether or not this optimization is worth doing depends on the relative cost of the allocation/deallocation compared to the cost of computing/mutating the values.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...