Rusty Russell's Coding Blog | Stealing From Smart People

Mar/13

21

GCC and C vs C++ Speed, Measured.

With the imminent release of gcc 4.8, GCC has finally switched to C++ as the implementation language.  As usual, LWN has excellent coverage.  Those with long memories will remember Linux trying to use g++ back in 1992 and retreating in horror at the larger, slower code.  The main benefit was stricter typechecking, particularly for enums (a great idea: I had -Wstrict-enum patches for gcc about 12 years ago, which was a superset of the -Wenum-compare we have now, but never got it merged).

With this in mind, and Ian Taylor’s bold assertion that “The C subset of C++ is as efficient as C”, I wanted to test what had changed with some actual measurements.  So I grabbed gcc 4.7.2 (the last release which could do this), and built it with C and C++ compilers:

  1. ../gcc-4.7.2/configure –prefix=/usr/local/gcc-c –disable-bootstrap –enable-languages=c,c++ –disable-multiarch –disable-multilib
  2. ../gcc-4.7.2/configure –prefix=/usr/local/gcc-cxx –disable-bootstrap –enable-languages=c,c++ –disable-multiarch –disable-multilib –enable-build-with-cxx

The C++-compiled binaries are slightly larger, though that’s mostly debug info:

  1. -rwxr-xr-x 3 rusty rusty 1886551 Mar 18 17:13 /usr/local/gcc-c/bin/gcc
    text       data        bss        dec        hex    filename
    552530       3752       6888     563170      897e2    /usr/local/gcc-c/bin/gcc
  2. -rwxr-xr-x 3 rusty rusty 1956593 Mar 18 17:13 /usr/local/gcc-cxx/bin/gcc
    text       data        bss        dec        hex    filename
    552731       3760       7176     563667      899d3    /usr/local/gcc-cxx/bin/gcc

Then I used them both to compile a clean Linux kernel 10 times:

  1. for i in `seq 10`; do time make -s CC=/usr/local/gcc-c/bin/gcc 2>/dev/null; make -s clean; done
  2. for i in `seq 10`; do time make -s CC=/usr/local/gcc-cxx/bin/gcc 2>/dev/null; make -s clean; done

Using stats –trim-outliers, which throws away best and worse, and we have the times for the remaining 8:

  1. real    14m24.359000-35.107000(25.1521+/-0.62)s
    user    12m50.468000-52.576000(50.912+/-0.23)s
    sys    1m24.921000-27.465000(25.795+/-0.31)s
  2. real    14m27.148000-29.635000(27.8895+/-0.78)s
    user    12m50.428000-52.852000(51.956+/-0.7)s
    sys    1m26.597000-29.274000(27.863+/-0.66)s

So the C++-compiled binaries are measurably slower, though not noticably: it’s about 865 seconds vs 868 seconds, or about .3%.  Even if a kernel compile spends half its time linking, statting, etc, that’s under 1% slowdown.

And it’s perfectly explicable by the larger executable size.  If we strip all the gcc binaries, and do another 10 runs of each (… flash forward to the next day.. oops, powerfail, make that 2 days later):

  1. real    14m24.659000-33.435000(26.1196+/-0.65)s
    user    12m50.032000-57.701000(50.9755+/-0.36)s
    sys    1m26.057000-28.406000(26.863+/-0.36)s
  2. real    14m26.811000-29.284000(27.1308+/-0.17)s
    user    12m51.428000-52.696000(52.156+/-0.39)s
    sys    1m26.157000-27.973000(26.869+/-0.41)s

Now the difference is 0.1%, pretty much in the noise.

Summary: so whether you like C++ or not, the performance argument is moot.

RSS Feed

25 Comments for GCC and C vs C++ Speed, Measured.

Max Lybbert | March 21, 2013 at 10:44 am

While I’m glad to hear that GCC compiles C++ code quickly, I expected this article to be about how fast the code runs after compiled.

Then again, the optimizations that Stroustrup brags about available to C++ code require taking advantage of C++ features (e.g., std::sort can compile to faster code than qsort, but there’s no reason a C++-compiled qsort would ever run any faster than a C-compiled qsort; the big question is whether multiple instantiations of std::sort bloat the cache enough to offset the performance difference, but what kind of empirical test could determine if that happens?).

Author comment by rusty | March 21, 2013 at 10:54 am

> I expected this article to be about how fast the code runs after compiled.

Err, it was. The code was gcc 4.7.2, which can be compiled with a C compiler or a C++ compiler. I did both, then benchmarked the results…

Steven Rostedt | March 21, 2013 at 11:12 am

I think what Max was saying, although it was confusing in how he said it, but what about the performance of the finished executables?

Run tests on how fast the 2.7.3 compilers compile an old Linux kernel. Compare both the C gcc built 2.7.3 compiler with the C++ gcc built 2.7.3 compiler (both obviously compiled into C), on how fast they can build Linux.

Basically, the question is, does a gcc C++ built compiler still optimize the same way as the gcc C built one.

Does the output from both produce the same result? If so, then you don’t need to do the above tests.

James Henstridge | March 21, 2013 at 12:16 pm

You’ve said GCC 2.7.2 in the text, 4.7.2 in the configure commands and now 2.7.3 in a comment.

Just to be clear, you were testing about GCC 4.7.2 right?

Author comment by rusty | March 21, 2013 at 12:43 pm

Fixed, and yes. Thanks!

Author comment by rusty | March 21, 2013 at 12:46 pm

I assumed they produced the same result, yes. I’m rebuilding them now just to be sure.

Of course, the binaries won’t be identical, but I can do sanity checks.

Thanks,
Rusty.

Author comment by rusty | March 21, 2013 at 1:19 pm

Yep… sizes are exactly the same, objdump -d shows only difference is different junk in the .notes section.

Max Lybbert | March 21, 2013 at 3:01 pm

I’m sorry. It looms like I took my eye off the ball and forgot that the C++ code in question was code for a compiler, and the way to test its performance would be to compile stuff with it. So I withdraw my earlier comment.

Anonymous | March 21, 2013 at 3:56 pm

C++ code usually becomes slower than the equivalent C code by introducing more abstractions and indirection. GCC already has a giant pile of abstraction and indirection; C++ just systematizes it.

Diego Ongaro | March 21, 2013 at 5:54 pm

Just curious, do you know if the c++ version is compiled with -fno-exceptions? I wonder if that makes a difference.

Svenne | March 21, 2013 at 8:28 pm

(IF) The compilation is I/O bound your test is flawed.

Mitch | March 21, 2013 at 8:38 pm

I think CPU time was never the problem, per se, even in 1992.

In those days (actually just after the switch back to gcc) I was doing kernel builds on a machine with 8MiB of RAM. That was luxurious — many linux users were doing builds on 4MiB machines. g++ is of course a much larger compiler than gcc, which meant that you lost more disk cache when you used it. The end result was lots more I/O on your MFM hard drive. And of course if it pushed you into swap, things completely fell apart.

I suspect if you took a 1992 vintage g++ and ran it on a machine with 16GiB of RAM you’d find that there wasn’t much of a CPU cost back then either. The pain was merely a result of running an uncomfortably large binary.

The other unknown is how many actual C++-isms will start getting used in the gcc codebase now that they’ve made the switch. Compiling C with g++ is one thing, but if you start going crazy with templates your compiles will slow down.

Margaret | March 21, 2013 at 8:43 pm

By discarding outliers, you have compromised your result. There’s no point in taking anything but the best time when measuring execution time in a preemptive multiprocessing environment. The best time isn’t an outlier, every time other than it is.

Additionally, 0.3% of a difference over 10 iterations is not statistically significant.

Lastly, you have not explored or even enumerated the features in GCC which were (re-)implemented in C++ that would make this measurement reveal an unexpected result. If nothing else, you have enforced Ian Taylor’s assertion.

Cheers.

Jason P | March 22, 2013 at 3:02 am

Why not test to see if the run times are normally distributed and then do t-test to see if the implementations are statistically significant instead of waxing lyrical?

Author comment by rusty | March 22, 2013 at 9:54 am

Perhaps, in which case you can read the minima:
14m24.359000 vs 14m27.148000 (0.3%)
Then:
14m24.659000 vs 14m26.811000 (0.2%)

And yes, the point was that Ian’s assertion was correct.

Cheers,
Rusty.

Author comment by rusty | March 22, 2013 at 10:03 am

I’d want to do more runs… my conclusion just eyeballing these results was that there’s no significant effect to justify the effort.

Author comment by rusty | March 22, 2013 at 10:04 am

Yes, double-checked, -O2 -fno-exceptions -fno-rtti.

Jay Luek | March 22, 2013 at 4:46 pm

What are the numbers for compiling of GCC w/ a C++ compiler vs. GCC w/ a C compiler? That is to say, does the use of C++ make the compilation of GCC take longer?

Author comment by rusty | March 22, 2013 at 6:11 pm

The numbers I saw said it was about 3x slower to compile with g++, but I didn’t measure it myself…

Paolo Bonzini | March 23, 2013 at 4:14 am

Dear Anonymous who said “C++ code usually becomes slower than the equivalent C code by introducing more abstractions and indirection. GCC already has a giant pile of abstraction and indirection; C++ just systematizes it”,

the LWN article says that converting C hashtables to C++ hashtables sped up the compiler by 1-2%.

Paolo Bonzini | March 23, 2013 at 4:21 am

Compiling GCC with a C++ compiler is slower because the standard C++ library is now part of the bootstrap process. Because of this, it is compiled three times like the compiler itself. (Strictly speaking only two builds are necessary; the third one is only there to detect bugs in the compilers. The resulting binary is exactly the same as in the second build, and the build fails if this is not the case).

If you compile GCC with a C compiler, the C++ library is only built once, at the end of the bootstrapping.

LWN reports a C bootstrap is 30% faster, which means C++ is 40% slower.

Anonymous | March 23, 2013 at 6:27 am

@Paolo Bonzini: If you already have a C hash table, converting it to a C++ hash table implementation may well speed it up. I more meant that C++ tempts you to introduce *more* abstractions, indirections, or complex data structures that you might not otherwise have used.

JNn | March 24, 2013 at 4:10 pm

” If we strip all the gcc binaries ”

–> Stripping the binaries is an usual step before release. It’s hardly a problem of C++ compilation.

“but there’s no reason a C++-compiled qsort would ever run any faster than a C-compiled qsort;”

–> There are (and can be measured) ! Did you listen (or read) carefully to what Stroustrup said ?

Author comment by rusty | March 27, 2013 at 10:56 am

True Paulo, but my first thought was that converting those C++ hashtables back into C using ccan/hashtable will probably speed them even further :)

Jeremy | June 25, 2013 at 10:41 pm

Quick WordPress formatting tip — WordPress likes to eat double dashes (--) and turn them into en dashes (–) in your above code samples.

The way around this is to surround each line in <code> tags. (Handily, WordPress has a button for this if you use the “Text” editing mode.)

Compare:

../gcc-4.7.2/configure –prefix=/usr/local/gcc-c –disable-bootstrap –enable-languages=c,c++ –disable-multiarch –disable-multilib

versus

../gcc-4.7.2/configure --prefix=/usr/local/gcc-c --disable-bootstrap --enable-languages=c,c++ --disable-multiarch --disable-multilib

Leave a comment!

«

»

Find it!

Theme Design by devolux.org

Tag Cloud