GitList

doc/comparison
 
          Different Versions of Yale Haskell Compared
          -------------------------------------------
 
 
 There are currently three different platforms running Yale Haskell.
 Yale Haskell runs on Lucid Common Lisp, CMU Common Lisp, and AKCL.  This
 document describes the differences between these systems.
 
 Differences in performance between the different versions of Yale
 Haskell reflect the underlying Lisp systems.  The better the Lisp
 system, the better the Haskell system built on it.  However, getting
 optimal performance from our Haskell system on top of a Common Lisp
 system requires careful attention to the underlying compiler.  Small
 changes in the optimization settings or the addition of crucial
 declarations can make significant differences in performance.  We have
 been doing most of our work using the Lucid system and have tuned it
 more than the others.  These comparisons are greatly influenced by the
 amount of time we have spent tuning the system: the CMU version has
 been tuned only a little and the AKCL version hardly at all.
 
 
   Methodology
 
 The following timings are only approximate.  They were obtained using
 the timing functions provided by the Common Lisp system.  All timings
 were done on an unloaded Sparc 1.  No attempt was made to account for
 garbage collection, differences in heap size, or similar factors.  We
 don't intend these benchmark results to be taken as an exhaustive
 comparison of the different Lisp implementations involved.
 
 
   Portability
 
 We have had no trouble moving our system to different hardware
 platforms under the same Lisp system.  Since the release is in source
 form, we expect that users will be able to build on any hardware
 platform supported by one the Lisps we have ported to.  Probably the
 only real constraint on portability is the requirement for a large
 virtual memory space.  
 
 From the comp.lang.lisp FAQ:
 
   Lucid Common Lisp runs on a variety of platforms, including PCs (AIX),
   Apollo, HP, Sun-3, Sparc, IBM RT, IBM RS/6000, Decstation 3100,
   Silicon Graphics, and Vax.
 
   CMU Common Lisp is free, and runs on Sparcs (Mach and SunOs),
   DecStation 3100 (Mach), IBM RT (Mach) and requires 16mb RAM, 25mb disk.
 
   Kyoto Common Lisp (KCL) is free, but requires a license. Conforms to CLtL1.
   It is available by anonymous ftp from rascal.ics.utexas.edu [128.83.138.20],
   cli.com [192.31.85.1], or [133.11.11.11] (a machine in Japan)
   in the directory /pub.  AKCL is in the file akcl-xxx.tar.Z (take the
   highest value of xxx).  To obtain KCL, one  must first sign and mail a
   copy of the license agreement to: Special Interest Group in LISP,
   c/o Taiichi Yuasa, Department of Computer Science,  Toyohashi
   University of Technology, Toyohashi 441, JAPAN. Runs on Sparc,
   IBM RT, RS/6000, DecStation 3100, hp300, hp800, Macintosh II (under AUX),
   mp386, IBM PS2, Silicon Graphics 4d, Sun3, Sun4, Sequent Symmetry,
   IBM 370, NeXT and Vax. A port to DOS is in beta test as
   math.utexas.edu:pub/beta2.zip.
 
 We have not yet completed ports of Yale Haskell to any other Lisp
 implementations, although we are likely to do so in the future.
 
 
  System Size
 
 The overall size of the Haskell system depends on the size of the
 underlying Lisp system and how much unnecessary Lisp overhead has been
 removed for the system.  We have removed large Lisp packages (like
 CLOS or CLX), but have not attempted to do any tree shaking.  The size
 of the saved images (including the Lisp system, the Haskell compiler,
 and the compiled prelude) is
 
 Image Size:
 
 Lucid   10 meg
 CMU     18 meg
 AKCL    11 meg
 
 The larger size of the CMU system is probably an artifact of their way
 of saving the system.
 
 
   Compilation Time
 
 There are three possible ways to compile a Haskell program.  All
 Haskell programs must be translated into Lisp.  The generated Lisp can
 then be interpreted, using no additional compilation time; compiled
 with a `fast' but nonoptimizing Lisp compiler; or compiled with the
 `slow' compiler that aggressively attempts to perform as many
 optimizations as possible.  
 
 To time the `fast', nonoptimizing compiler, we have been using
 
 (PROCLAIM '(OPTIMIZE (SPEED 3) (SAFETY 0) (COMPILATION-SPEED 3)))
 
 and for the `slow', fully optimizing compiler, we have been using
 
 (PROCLAIM '(OPTIMIZE (SPEED 3) (SAFETY 0) (COMPILATION-SPEED 0)))
 
 so that the only difference is in the COMPILATION-SPEED quality.
 Lucid does, in fact, provide two completely different compilers that
 correspond to these optimize settings.  For all three implementations,
 it appears that that the effect of a higher compilation speed setting
 is primarily in being less aggressive about inlining and making use of
 type declarations.
 
 The Haskell system itself (including the Prelude) is normally built
 with the fully optimizing compiler.
 
 To show just the Haskell to Lisp compilation time, here are the times
 needed to compile the Prelude (about 2500 lines of Haskell code).
 This does not include the time in the Lisp compiler or starting up the
 system.
 
 Time to compile the Prelude into Lisp: (CPU times)
 
 Lucid     111 sec
 CMU       87  sec
 AKCL      576 sec
 
 Running the Lisp compiler on the generated code takes far longer than
 running the Haskell compiler to produce the Lisp code.  For example,
 the optimizing Lucid compiler takes 47 minutes to compile the Prelude
 (about x20 slower than Haskell -> Lisp).  The nonoptimizing compiler
 is significantly faster but generates poorer code.
 
 The following times are the Lisp compilation time for the Prolog
 interpreter (found in the demo directory of our release):
 
 Lucid - interpreted     8.8 sec  Haskell -> Lisp
 Lucid - nonopt         20.0 sec  Lisp -> Machine code
 Lucid - optimizing    320.0 sec  Lisp -> Machine code
 CMU - interpreted      12.4 sec  Haskell -> Lisp
 CMU - nonopt          121.0 sec  Lisp -> Machine code
 CMU - optimizing      152.8 sec  Lisp -> Machine code
 AKCL - interpreted     47.8 sec  Haskell -> Lisp
 AKCL - nonopt          ~180 sec  Lisp -> Machine code
 AKCL - optimizing      ~360 sec  Lisp -> Machine code
 
 The AKCL timings are only approximate, because the Lisp timing
 functions do not capture the time spent in the C compiler.
 
 
 Code Speed
 
 The speed of the Haskell program depends on whether the Lisp code
 has been compiled with the optimizing or nonoptimizing compiler, or
 is running interpretively.
 
 The first benchmark is nfib, which indicates the basic speed of
 function calling and Int arithmetic.
 
 module Main where
 
 nfib :: Int -> Int
 nfib 0 = 1
 nfib 1 = 1
 nfib n = nfib (n-1) + nfib (n-2)
 
 
                              nfib 20            nfib 30    
 Lucid (Interpreted)          116 sec            *
 Lucid (nonopt)               0.14 sec           9.4 sec
 Lucid (optimizing)           0.08 sec           4.8 sec
 CMU (Interpreted)            23.8 sec           *
 CMU (nonopt)                 0.24 sec           6.9 sec
 CMU (optimizing)             0.11 sec           7.0 sec
 AKCL (Interpreted)           141 sec            *
 AKCL (nonopt)                0.20 sec           21.3 sec
 AKCL (optimizing)            0.15 sec           18.2 sec
 
 * Too slow to benchmark
 
 For other data types, there was no significant difference betwen
 optimizing and nonoptimizing compilation in any of the systems.
 Changing the signature of nfib to Integer -> Integer:
 
                              nfib 20            nfib 30    
 Lucid (interpreted)          140 sec            *
 Lucid (compiled)             0.18 sec           10.2 sec
 CMU (interpreted)            24.2 sec           *
 CMU (compiled)               0.16 sec           10.5 sec
 AKCL (interpreted)           145 sec            *
 AKCL (compiled)              1.07 sec           127 sec
 
 Nfib with signature Float -> Float:
 
                              nfib 20            nfib 30    
 Lucid (interpreted)          222  sec            *
 Lucid (compiled)             16.4 sec           2416 sec
 CMU (interpreted)            44.2 sec           *
 CMU (compiled)               1.61 sec           352 sec
 AKCL (interpreted)           161 sec            *
 AKCL (compiled)              103 sec            *
 
 Overloaded functions run considerably slower than nonoverloaded
 functions.  By allowing nfib to remain overloaded, Num a => a -> a,
 and using the Int overloading the same benchmarks run much slower.
 Again, there is no real difference between the different compiler
 optimization settings.
 
                              nfib 15            nfib 20    
 Lucid (interpreted)          14.2 sec           156 sec
 Lucid (compiled)             0.97 sec           9.3 sec
 CMU (interpreted)            23.8 sec           155 sec
 CMU (compiled)               0.89 sec           15.6 sec
 AKCL (interpreted)           30.8 sec           387 sec
 AKCL (compiled)              10.3 sec           119 sec
 
 Basic Haskell data structuring operations (pattern matching and
 construction) can be tested using another version of nfib which uses
 natural numbers:
 
   data Nat = Z | S Nat
 
 The difference betwen CMU and Lucid here is consistent with other
 benchmarks that manipulate structures.
 
                              nfib 10            nfib 15
 Lucid (Interpreted)          1.39 sec           26.7 sec
 Lucid (compiled)             0.26 sec           2.28 sec
 CMU (interpreted)            3.1 sec            <stack overflow>
 CMU (compiled)               0.16 sec           0.54 sec
 AKCL (Interpreted)           4.25 sec           <stack overflow>
 AKCL (compiled)              0.21 sec           13.9 sec
 
 
  A Large Program
 
 For a final benchmark, we use the Prolog interpreter as a way of
 getting a feel for general performance of larger programs.  This
 program is typical of symbolic (as opposed to numeric) computation.
 
 Time to solve append(X,Y,cons(a,cons(b,cons(c,nil)))):
 
 Lucid    12.2 sec
 CMU      12.0 sec
 AKCL     69.1 sec
 
 My interpretation of this result is that although Lucid is a bit
 slower on the previous small benchmarks, it makes up for this is
 larger programs where advantages like better instruction scheduling,
 register allocation, or memory usage may make a difference.  In
 general, Lucid and CMU are very similar in performance for larger
 programs.
 
 
  Conclusions
 
 Briefly stated, the pluses and minuses of each system are as follows:
 
 Lucid (4.0.0):
  + Development (nonoptimizing) compiler is very fast
  + Fast Haskell -> Lisp compilation
  + Generates good code
  + Very robust
  - Costs money
  - Slow floating point code
  - Fairly slow interpreter
  - The production (optimizing) compiler is extremely slow.
 
 CMU (16e):
  + Free
  + As fast as Lucid for Haskell -> Lisp
  + Good floating point performance
  + Generated code is very fast
  + Fast interpreter
  - Slow Lisp -> machine code compilation
  - Doesn't run on many systems
 
 AKCL (1.615):
  + Free
  + Widely portable
  - Slow (generally 3 - 5 times slower, sometimes much worse)
  - Flakey (tends to core dump on errors, choke on large programs, etc.)
 
 Generally, using the fully optimizing compiler seems to be useful only
 in code involving Int arithmetic.
 
 The fast compiler for Lucid is a big advantage, delivering by far the
 fastest compilation to machine code with relatively little loss in
 speed compared to the optimizing compiler.
 
 
             Yale Haskell Group
             September 25, 1992