# First round of timing results # Single processor on 300 MHz master ganesh, no PVM 0.880user 21.220sys 99.9%, 0ib 0ob 0tx 0da 0to 0swp 0:22.11 0.280user 21.760sys 100.0%, 0ib 0ob 0tx 0da 0to 0swp 0:22.04 # Single processor on 400 MHz slave b4 using PVM 0.540user 11.280sys 31.3%, 0ib 0ob 0tx 0da 0to 0swp 0:37.65 0.700user 11.010sys 31.1%, 0ib 0ob 0tx 0da 0to 0swp 0:37.62 # 2x400 MHz (b4, b9) with PVM 1.390user 14.530sys 38.3%, 0ib 0ob 0tx 0da 0to 0swp 0:41.48 # 3x400 MHz (b4, b9, b11) with PVM 1.800user 18.050sys 46.5%, 0ib 0ob 0tx 0da 0to 0swp 0:42.60 # OK, this is terrible. We make two changes. First, we eliminate # printout statements in slaves (which were logging to a file). There # are a LOT of calls -- this adds significant overhead and corrupts # the timing runs. Second, we no longer send slices to each host -- # we just send the slice structure along with the weights and await # the results. Fewer calls, and above all, ONLY multicasts from the # master. The only REAL question is whether we want to deliberately # desynchronize the slaves by a tiny bit to keep their returns from # colliding! We'll do the single processor again because it might # have changed... # # Single processor on 300 MHz master ganesh, no PVM. Guess not. 1.250user 20.630sys 99.9%, 0ib 0ob 0tx 0da 0to 0swp 0:21.90 # Single processor on 400 MHz slave b4 using PVM. Better. 0.350user 10.460sys 32.9%, 0ib 0ob 0tx 0da 0to 0swp 0:32.79 2.380user 8.410sys 32.5%, 0ib 0ob 0tx 0da 0to 0swp 0:33.11 # 2x400 MHz (b4, b9) with PVM 2.260user 11.140sys 37.7%, 0ib 0ob 0tx 0da 0to 0swp 0:35.53 # 3x400 MHz (b4, b9, b11) with PVM 1.630user 11.160sys 40.3%, 0ib 0ob 0tx 0da 0to 0swp 0:31.67 # 4x400 MHz (b4, b9, b11, b12) with PVM 2.720user 14.720sys 42.9%, 0ib 0ob 0tx 0da 0to 0swp 0:40.61 # Still no gain, but close. We try increasing the granularity a # bit by using a bigger dataset. # # Single processor on 300 MHz master ganesh, no PVM. Takes longer. 9.270user 207.020sys 99.9%, 0ib 0ob 0tx 0da 0to 0swp 3:36.32 # Single processor on 400 MHz slave b4 using PVM. Better. 4.380user 61.410sys 28.3%, 0ib 0ob 0tx 0da 0to 0swp 3:51.67 # 2x400 MHz (b4, b9) with PVM. At last a distinct benefit! 3.080user 71.420sys 51.1%, 0ib 0ob 0tx 0da 0to 0swp 2:25.73 # 3x400 MHz (b4, b9, b11) with PVM. Still better. 1.270user 70.570sys 58.9%, 0ib 0ob 0tx 0da 0to 0swp 2:01.89 # 4x400 MHz (b4, b9, b11, b12) with PVM. And peak. 6.000user 71.820sys 63.3%, 0ib 0ob 0tx 0da 0to 0swp 2:02.83 # More processors would actually cost speedup at this granularity.