My previous discovery that poll() indicating an fd was writable didn’t mean write() wouldn’t block lead to some interesting discussion on Google+.
It became clear that there is much confusion over read and write; eg. Linus thought read() was like write() whereas I thought (prior to my last post) that write() was like read(). Both wrong…
Both Linux and v6 UNIX always returned from read() once data was available (v6 didn’t have sockets, but they had pipes). POSIX even suggests this:
The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading.
But write() is different. Presumably so simple UNIX filters didn’t have to check the return and loop (they’d just die with EPIPE anyway), write() tries hard to write all the data before returning. And that leads to a simple rule. Quoting Linus:
Sure, you can try to play games by knowing socket buffer sizes and look at pending buffers with SIOCOUTQ etc, and say “ok, I can probably do a write of size X without blocking” even on a blocking file descriptor, but it’s hacky, fragile and wrong.
I’m travelling, so I built an Ubuntu-compatible kernel with a printk() into select() and poll() to see who else was making this mistake on my laptop:
cups-browsed: (1262): fd 5 poll() for write without nonblock cups-browsed: (1262): fd 6 poll() for write without nonblock Xorg: (1377): fd 1 select() for write without nonblock Xorg: (1377): fd 3 select() for write without nonblock Xorg: (1377): fd 11 select() for write without nonblock
This first one is actually OK; fd 5 is an eventfd (which should never block). But the rest seem to be sockets, and thus probably bugs.
What’s worse, are the Linux select() man page:
A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
... those in writefds will be watched to see if a write will not block...
And poll():
POLLOUT Writing now will not block.
Man page patches have been submitted…
Is this specific to Linux? Does FreeBSD have the same behaviour?
Yes, since POSIX specifies it. You’re welcome to test it, however!
select() or poll() for writability without O_NONBLOCK is fine if you just use it for sendmsg(…, MSG_DONTWAIT) — no?
You are only guaranteed to be able to write SO_SNDLOWAT bytes without blocking. See http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html:
> If a descriptor refers to a socket, the implied output
> function is the sendmsg() function supplying an amount of
> normal data equal to the current value of the SO_SNDLOWAT
> option for the socket. If a non-blocking call to the
> connect() function has been made for a socket, and the
> connection attempt has either succeeded or failed leaving a
> pending error, the socket shall be marked as writable.
Sadly, SO_SNDLOWAT is hardcoded to 1 on Linux, forcing you to use non-blocking sockets.
Thanks, this is good to know. I am using non-blocking sockets in my libraries (and web server), so hopefully I am not affected by this.