This comes following Michael Croucher’s post MATLAB: Repeated multiplication is faster than integer powers.

He noticed that in MATLAB doing element-wise exponentiation of matrices (using .^ operator) was slower than explicitly (element-wisely) multiplying the matrix by itself. I’m pretty sure this is only true for small exponents, anyway. He also wondered how was it in other languages.

Well, I did my job and tried it in Julia:

function testpow()
    #% function to compare integer powers with repeated multiplication

    a=rand(1,10000000)*10;

    println("speed of ^4 using pow")
    tic(); test4p = a.^4; toc()
    println("speed of ^4 using multiply")
    tic(); test4m = a.*a.*a.*a; toc()

    println("maximum difference in results");
    max_diff = maximum(abs(test4p-test4m))
end

Calling testpow(), on my Intel(R) Core(TM) i3 M380 @ 2.53GHz laptop, it outputs (after warm-up)

speed of ^4 using pow
elapsed time: 1.074429726 seconds
speed of ^4 using multiply
elapsed time: 0.40391358 seconds
maximum difference in results
1.8189894035458565e-12

So it is the same, except for a different slowness factor and for the absolute values.

But then I wondered — what’s if I just (re)define the .^ operator for integer exponents to perform chain multiplications? So I wrote this “Cristóvão’s style” Julia code:

import Base.(.^)
 
function .^{T<:FloatingPoint, N}(A::Array{T, N}, x::Integer)
    if abs(x) > 42 # the "Answer to the Ultimate Question of Life, the Universe, and Everything"
        A.^float(x)
    elseif x > 1
        B = similar(A)
        @inbounds for i in 1:length(A)
            B[i] = A[i]
            for k in 1:x-1 B[i] *= A[i] end
        end
        B
    elseif x < 0
        B = similar(A)
        @inbounds for i in 1:length(A)
            B[i] = one(T)
            for k in 1:abs(x) B[i] *= A[i] end
            B[i] \= one(T)
        end
        B
    elseif x == 1
        copy(A)
    else   #  x == 0
        ones(A)
    end
end

By running this before the testpow() function definition (and not touching it), I get the following output

speed of ^4 using pow
elapsed time: 0.100119984 seconds <— notice this
speed of ^4 using multiply
elapsed time: 0.40944625 seconds
maximum difference in results
0.0

Cool! A 10x speed-up comparing to default .^.

I think it is really nice to be able to write improveddifferent algorithms in the very same language and then get improvementschanges on other code without having to touch it.
Although I didn’t tried it, I guess one cannot achieve this in MATLAB, or Python 😛

UPDATE: Using real world real numbers (floating point numbers), precision can be lost in chain multiplication exponential implementation.


I’ve developed a Julia package to manipulate symbolic linear expressions with both scalar and matrix coefficients: https://github.com/cdsousa/LinearExpressions.jl

I created it mainly to manipulate large linear matrix inequalities (LMI) for SDP optimization, and I’ve been using it whenever PyLMI-SDP is too slow for the job.

Over the past months I’ve been studying  how to improve performance of the Common Subexpression Elimination (CSE) routine of SymPy, so it can be used in SymPyBotics with acceptable computing times.

This resulted in pull request #2355, which had already been merged into SymPy master branch (for release in version 0.7.4).

Here is an example comparing both new and previous CSE implementations when applied to the expressions of the “generic torque”, computed with SymPyBotics, for the first 3 joints of a 7-DOF WAM Arm robot.

Cache has influence in times, but we can notice an average performance improvement of about 25x when external (pre and post) optimizations are used. When no external optimizations are used, the performance has an average improvement of 90x. With the new `order=’none’` option, the improvement rises to 500x for the non cached case, and to 1000x for the cached one!

For this particular case, the CSE is less optimized when external optimizations are done (output has more operations) than when they are not.

How it works now

First, two remarks:

  • expressions are not trees but rather directed acyclic graphs (DAG).  E.g., in the expression sin(x+1)+cos(x+1), the arguments of sin and cos are the same, x+1; indeed the node x+1 has two parents;
  • SymPy (sub)expressions are nicely and fastly hashable, thus great to use in sets and dictionaries.

The CSE core/raw algorithm:

  1. The core of the new CSE parses the expression adding each seen subexpression to the seen set. If a subexpression was already seen, it is added to the repeated set and its children nodes are not parsed (there is no need to).
  2. After knowing the repeated subexpressions (nodes with more than one parent), the core CSE rebuilds the whole tree using intermediate variables in place of repeated subexpressions.

The internal optimizations:

  • Before the core CSE algorithm is performed the expression is parsed to find optimization opportunities; when an optimizable subexpression is found it is added to the opt_subs substitutions dictionary.
  • When the core algorithm parses a subexpressions it looks for it in the opt_subs dictionary, if it is there is parses the substitution instead.
  • The currently implemented internal optimizations are the following:
    • negative signs are striped out from multiplications, e.g., -2*x is substituted by -1*(2*x)
    • negative signs are striped out from exponents, e.g., x**(-2*y) is substituted by (x**(2*y))**-1
    • common Add and Mul terms are grouped, e.g., in cos(a+b+c)+sin(a+b+d)),  a+b+c is substituted by (a+b)+c and a+b+d is substituted by (a+b)+d, so that a+b is a single and repeated node

Future work

In my opinion three things could further improve CSE:

  1. add support for MatrixExprs;
  2. use replacement Symbols which could somehow clone the assumptions of the subexpressions they represent;
  3. implement an optimal Mul/Add term matching system (maybe using Matthew Rocklin’s logpy package).

PyLMI-SDP

30Aug13

PyLMI-SDP

Symbolic linear matrix inequalities (LMI) and semi-definite programming (SDP) tools for Python


SymPyBotics

29Mar13

Today I’ve released SymPyBotics software in GitHub.

This is a python toolbox to generate and manipulate robot dynamic equations. It depends on the great SymPy library and on SymCode, another library I’m developing.


I’m starting (1 year already done) my PhD so I have to make a thesis proposal (they call it thesis project) for the university.

As a fan of typography, I produce almost all my documents using LaTeX type system. And when they are long I put the source code into git versioning. This time, and following Will Robertson‘s idea, I decided to place the code in a public repository: https://gitorious.org/csousa-phd-thesis-project.

I’m not completely sure if this is a good (secure) idea, but quoting Will:

I also believe that all academic research should be made more open. This is a small step towards such a philosophy.

Notice, however, that the work is copyrighted and no rights are given to reproduce or modify it.

Anyway, I don’t expect the content of my thesis proposal to have value for outside. Instead, I’m making it public because of Latex source structure.

The key points of this document latex source are:

  • the use of memoir class, with a separated package for customization;
  • the use of glossaries package for acronym list;
  • the use of biblatex (yes, not bibtex) for bibliography (configured to use biber);
  • the use of microtypography;
  • the use of latexmk, with a local rc file for glossaries files; and
  • being split across several .tex and .sty (package) files.

For this document pdflatex is used. For others, when I want to play with fonts, I use lualatex with Will’s fontspec package.

As a note, I use bibliography .bib file from Mendeley. I’ve configured it to generate one file per collection, so that for each document I write I create a dedicated collection and copy relevant references to it.


Hello world!

23Jun11

I’m starting this because of http://matt.might.net/articles/how-to-blog-as-an-academic/.