仔细看下他的测试方式,这个比较是非常不公平的
what do we measure and how?
we use a 16 node cluster running at sics. we plot throughput vs. parallel load.
machine 1 has a server (apache or yaws).
machine 2 requests 20 kbyte pages from machine 1. it does this in tight a loop requesting a new page as soon as it has received a page from the server. from this we derive a throughput figure, which is plotted in the horizontal scale on the graph. a typical value (800) means the throughput is 800 kbytes/sec.
machines 3 to 16 generate load.
each machine starts a large number of parallel sessions.
each session makes a very slow request to fetch a one byte file from machine 1. this is done by sending very slow http get requests (we break up the get requests and send them character at a time, with about ten seconds between each character)
apache的链接处理机制是 开线程或者进程来处理请求 按它的测试方法 你非常慢速的8w请求 导致apache开大量的线程来处理。而能开多少线程取决于操作系统的能力 这还是其次 大量的线程处理活跃的链接导致大量的thread content switch。 apache 挂了不奇怪。 而erlang的线程相大于c语言的一个数据结构 erl_process你开多少取决于你的内存 大量的但是慢速的链接刚好适合poll事件dispatch, 以epoll的能力(俺测试过epoll30w)能够轻松处理。 这个测试与其说测试web服务器的性能 不如说 测试服务器的进程生成能力。
俺的测试是这样的:.
./yaws --conf yaws.conf --erlarg "+k true +p 1024000" #epoll 最多1024000个进程 内核都已经调优过
yaws.conf 的内容:
auth_log = false
max_num_cached_files = 8000
max_num_cached_bytes = 6000000
大家都用 ab -c 1000 -n 1000000 http://192.168.0.98:8000/bomb.gif 来测
果然发现yaws的性能也是非常一般 大概也就是3k左右.
各位看下 strace 的结果就知道了:
accept(10, {sa_family=af_inet, sin_port=htons(5644), sin_addr=inet_addr("192.168.0.97")}, [16]) = 11
fcntl64(11, f_getfl) = 0x2 (flags o_rdwr)
fcntl64(11, f_setfl, o_rdwr|o_nonblock) = 0
getsockopt(10, sol_tcp, tcp_nodelay, [0], [4]) = 0
getsockopt(10, sol_socket, so_keepalive, [0], [4]) = 0
getsockopt(10, sol_socket, so_priority, [0], [4]) = 0
getsockopt(10, sol_ip, ip_tos, [0], [4]) = 0
getsockopt(11, sol_socket, so_priority, [0], [4]) = 0
getsockopt(11, sol_ip, ip_tos, [0], [4]) = 0
setsockopt(11, sol_ip, ip_tos, [0], 4) = 0
setsockopt(11, sol_socket, so_priority, [0], 4) = 0
getsockopt(11, sol_socket, so_priority, [0], [4]) = 0
getsockopt(11, sol_ip, ip_tos, [0], [4]) = 0
setsockopt(11, sol_socket, so_priority, [0], 4) = 0
getsockopt(11, sol_socket, so_priority, [0], [4]) = 0
getsockopt(11, sol_ip, ip_tos, [0], [4]) = 0
setsockopt(11, sol_socket, so_keepalive, [0], 4) = 0
setsockopt(11, sol_ip, ip_tos, [0], 4) = 0
setsockopt(11, sol_socket, so_priority, [0], 4) = 0
getsockopt(11, sol_socket, so_priority, [0], [4]) = 0
getsockopt(11, sol_ip, ip_tos, [0], [4]) = 0
setsockopt(11, sol_tcp, tcp_nodelay, [0], 4) = 0
setsockopt(11, sol_socket, so_priority, [0], 4) = 0
recv(11, "get /bomb.gif http/1.0\r\nuser-age"..., 8192, 0) = 100
getpeername(11, {sa_family=af_inet, sin_port=htons(5644), sin_addr=inet_addr("192.168.0.97")}, [16]) = 0
clock_gettime(clock_monotonic, {110242, 326908594}) = 0
stat64("/var/www/html/bomb.gif", {st_mode=s_ifreg|0644, st_size=4096, ...}) = 0
access("/var/www/html/bomb.gif", r_ok) = 0
access("/var/www/html/bomb.gif", w_ok) = 0
clock_gettime(clock_monotonic, {110242, 327135982}) = 0
time(null) = 1185894828
clock_gettime(clock_monotonic, {110242, 327222643}) = 0
stat64("/etc/localtime", {st_mode=s_ifreg|0644, st_size=405, ...}) = 0
writev(11, [{null, 0}, {"http/1.1 200 ok\r\nconnection: clo"..., 231}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...
, 4096}], 3) = 4327
close(11
这里面充斥着大量的无用的昂贵的系统调用 (至少有20个*10us = 200us 的系统调用是无效的)
对文件的access 2 次 连文件的cache都没有 每次 打开文件 读文件 然后写到socket去 。
这个case是小文件(4k)的情况。 看下大文件(40k)的情况
open("/var/www/html/bomb.gif", o_rdonly|o_largefile) = 19
read(19, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
writev(16, [{null, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240}], 2) = 10240
read(19, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
writev(16, [{null, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240}], 2) = 10240
read(19, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
writev(16, [{null, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240}], 2) = 10240
read(19, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
writev(16, [{null, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240}], 2) = 7240
read(19, "", 10240) = 0
close(19) = 0
clock_gettime(clock_monotonic, {110574, 856508319}) = 0
epoll_ctl(3, epoll_ctl_del, 11, {0, {u32=11, u64=581990243524149259}}) = 0
epoll_ctl(3, epoll_ctl_del, 12, {0, {u32=12, u64=581990243524149260}}) = 0
epoll_ctl(3, epoll_ctl_add, 16, {epollout, {u32=16, u64=581990243524149264}}) = 0
epoll_wait(3, {}, 256, 0) = 0
clock_gettime(clock_monotonic, {110574, 856677411}) = 0
clock_gettime(clock_monotonic, {110574, 856729274}) = 0
大量的epoll_ctl 调用 clock_gettime的调用 足够让系统的速度变的非常慢。
比对下lighttpd的性能。 lighttpd用到了cache,用到了aio,还是完全用c语言小心编写, 他处理小文件大概是并发1w. 而yaws这个的处理方式打个3折还差不多。
闽公网安备 35060202000074号