Rusty Russell's Coding Blog | Stealing From Smart People



POLLOUT doesn’t mean write(2) won’t block: Part II

My previous discovery that poll() indicating an fd was writable didn’t mean write() wouldn’t block lead to some interesting discussion on Google+.

It became clear that there is much confusion over read and write; eg. Linus thought read() was like write() whereas I thought (prior to my last post) that write() was like read(). Both wrong…

Both Linux and v6 UNIX always returned from read() once data was available (v6 didn’t have sockets, but they had pipes). POSIX even suggests this:

The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading.

But write() is different. Presumably so simple UNIX filters didn’t have to check the return and loop (they’d just die with EPIPE anyway), write() tries hard to write all the data before returning. And that leads to a simple rule.  Quoting Linus:

Sure, you can try to play games by knowing socket buffer sizes and look at pending buffers with SIOCOUTQ etc, and say “ok, I can probably do a write of size X without blocking” even on a blocking file descriptor, but it’s hacky, fragile and wrong.

I’m travelling, so I built an Ubuntu-compatible kernel with a printk() into select() and poll() to see who else was making this mistake on my laptop:

cups-browsed: (1262): fd 5 poll() for write without nonblock
cups-browsed: (1262): fd 6 poll() for write without nonblock
Xorg: (1377): fd 1 select() for write without nonblock
Xorg: (1377): fd 3 select() for write without nonblock
Xorg: (1377): fd 11 select() for write without nonblock

This first one is actually OK; fd 5 is an eventfd (which should never block). But the rest seem to be sockets, and thus probably bugs.

What’s worse, are the Linux select() man page:

       A file descriptor is considered ready if it is possible to
       perform the corresponding I/O operation (e.g., read(2)) without
       ... those in writefds will be watched to see if a write will
       not block...

And poll():

		Writing now will not block.

Man page patches have been submitted…

RSS Feed

5 Comments for POLLOUT doesn’t mean write(2) won’t block: Part II

sara | August 20, 2014 at 3:36 am

Is this specific to Linux? Does FreeBSD have the same behaviour?

Author comment by rusty | August 20, 2014 at 3:38 am

Yes, since POSIX specifies it. You’re welcome to test it, however!

Helge | August 20, 2014 at 7:32 pm

select() or poll() for writability without O_NONBLOCK is fine if you just use it for sendmsg(…, MSG_DONTWAIT) — no?

Jan | August 20, 2014 at 11:14 pm

You are only guaranteed to be able to write SO_SNDLOWAT bytes without blocking. See

> If a descriptor refers to a socket, the implied output
> function is the sendmsg() function supplying an amount of
> normal data equal to the current value of the SO_SNDLOWAT
> option for the socket. If a non-blocking call to the
> connect() function has been made for a socket, and the
> connection attempt has either succeeded or failed leaving a
> pending error, the socket shall be marked as writable.

Sadly, SO_SNDLOWAT is hardcoded to 1 on Linux, forcing you to use non-blocking sockets.

Sam Watkins | February 4, 2015 at 11:40 am

Thanks, this is good to know. I am using non-blocking sockets in my libraries (and web server), so hopefully I am not affected by this.



Find it!

Theme Design by

Tag Cloud