Chapter xx Math & Algorithms


Contents


From:jnhall@sat.mot.com (Joseph Hall)
Subject: Re: long integer square root

In article <rayCA61B2.6nA@netcom.com writeth:

estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes ... Does somebody has a code or algorithm for extracting a long integer square root and returning a integer. I suggest looking at Newton's iteration in any decent CS book on the subject. With a sufficiently good first guess you can do it in about 12 instructions on a 68020+. And yes, I have done it, and no, you can't have it. It belong to the company I work for.

I pulled this one from my personal inventory. I hereby assign it to the public domain. Enjoy.

/*

* ISqrt-- * Find square root of n. This is a Newton's method approximation, * converging in 1 iteration per digit or so. Finds floor(sqrt(n) + 0.5). */

int ISqrt(register int n) { register int i; register long k0, k1, nn; for (nn = i = n, k0 = 2; i > 0; i >>= 2, k0 <<= 1) ; nn <<= 2; for (;;) { k1 = (nn / k0 + k0) >> 1; if (((k0 ^ k1) & ~1) == 0) break; k0 = k1; } return (int) ((k1 + 1) >> 1); }


From: jimc@tau-ceti.isc-br.com (Jim Cathey)
Subject: Re: long integer square root

In article <220o6u$cn@email.tuwien.ac.at> estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes:

>Does somebody have code or an algorithm for extracting a long integer
>square root and returning an integer?

There was a long series of letters in Dr. Dobbs' Journal a few years back that I was a part of. Here are two 'competing' subroutines for the 68000 that I wrote. One is a Newton-Raphson iterator, the other a hybrid of three different subroutines using two different methods, heavily optimized for speed. The non-Newton one is faster in some cases and slower in others. Choosing one as 'best' depends on what kind of input ranges you expect to see most often. Newton's time doesn't vary much on the average based on the input argument. The non-Newton one ranges from a lot better (small arguments) to a little worse (large arguments), but is always better than Newton's worst case. Don't neglect the fact that on a FPU-equipped machine it may be faster to use floating point. I've done no research into this possibility, nor have I timed this stuff on any 68020+ system. The speed balance is no doubt different then. (For that matter, the 68000 testbed had no wait states nor interrupt system overhead, which doesn't necessarily apply to a 68000 PowerBook, and certainly doesn't apply to a Mac+ or earlier.)

I'm personally fond of the non-Newton version, because the algorithm only uses shifts and adds, so it could be implemented in microcode with about same speed as a divide.

*************************************************************************
*                                                                       *
*       Integer Square Root (32 to 16 bit).                             *
*                                                                       *
*       (Newton-Raphson method).                                        *
*                                                                       *
*       Call with:                                                      *
*               D0.L = Unsigned number.                                 *
*                                                                       *
*       Returns:                                                        *
*               D0.L = SQRT(D0.L)                                       *
*                                                                       *
*       Notes:  Result fits in D0.W, but is valid in longword.          *
*               Takes from 338 cycles (1 shift, 1 division) to          *
*               1580 cycles (16 shifts, 4 divisions)  (including rts).  *
*               Averages 854 cycles measured over first 65535 roots.    *
*               Averages 992 cycles measured over first 500000 roots.   *
*                                                                       *
*************************************************************************

        .globl lsqrt
*                       Cycles
lsqrt   movem.l d1-d2,-(sp) (24)
        move.l d0,d1    (4)     ; Set up for guessing algorithm.
        beq.s return    (10/8)  ; Don't process zero.
        moveq #1,d2     (4)

guess   cmp.l d2,d1     (6)     ; Get a guess that is guaranteed to be
        bls.s newton    (10/8)  ; too high, but not by much, by dividing the
        add.l d2,d2     (8)     ; argument by two and multiplying a 1 by 2
        lsr.l #1,d1     (10)    ; until the power of two passes the modified
        bra.s guess     (10)    ; argument, then average these two numbers.

newton  add.l d1,d2     (8)     ; Average the two guesses.
        lsr.l #1,d2     (10)
        move.l d0,d1    (4)     ; Generate the next approximation(s)
        divu d2,d1      (140)   ; via the Newton-Raphson method.
        bvs.s done      (10/8)  ; Handle out-of-range input (cheats!)
        cmp.w d1,d2     (4)     ; Have we converged?
        bls.s done      (10/8)
        swap d1         (4)     ; No, kill the remainder so the
        clr.w d1        (4)     ; next average comes out right.
        swap d1         (4)
        bra.s newton    (10)

done    clr.w d0        (4)     ; Return a word answer in a longword.
        swap d0         (4)
        move.w d2,d0    (4)
return  movem.l (sp)+,d1-d2  (28)
        rts             (16)

        end
*************************************************************************
*                                                                       *
*       Integer Square Root (32 to 16 bit).                             *
*                                                                       *
*       (Exact method, not approximate).                                *
*                                                                       *
*       Call with:                                                      *
*               D0.L = Unsigned number.                                 *
*                                                                       *
*       Returns:                                                        *
*               D0.L = SQRT(D0.L)                                       *
*                                                                       *
*       Notes:  Result fits in D0.W, but is valid in longword.          *
*               Takes from 122 to 1272 cycles (including rts).          *
*               Averages 610 cycles measured over first 65535 roots.    *
*               Averages 1104 cycles measured over first 500000 roots.  *
*                                                                       *
*************************************************************************

        .globl lsqrt
*                       Cycles
lsqrt   tst.l d0        (4)     ; Skip doing zero.
        beq.s done      (10/8)
        cmp.l #$10000,d0 (14)   ; If is a longword, use the long routine.
        bhs.s glsqrt    (10/8)
        cmp.w #625,d0   (8)     ; Would the short word routine be quicker?
        bhi.s gsqrt     (10/8)  ; No, use general purpose word routine.
*                               ; Otherwise fall into special routine.
*
*  For speed, we use three exit points.
*  This is cheesy, but this is a speed-optimized subroutine!

        page
*************************************************************************
*                                                                       *
*       Faster Integer Square Root (16 to 8 bit).  For small arguments. *
*                                                                       *
*       (Exact method, not approximate).                                *
*                                                                       *
*       Call with:                                                      *
*               D0.W = Unsigned number.                                 *
*                                                                       *
*       Returns:                                                        *
*               D0.W = SQRT(D0.W)                                       *
*                                                                       *
*       Notes:  Result fits in D0.B, but is valid in word.              *
*               Takes from 72 (d0=1) to 504 (d0=625) cycles             *
*               (including rts).                                        *
*                                                                       *
*       Algorithm supplied by Motorola.                                 *
*                                                                       *
*************************************************************************

* Use the theorem that a perfect square is the sum of the first
* sqrt(arg) number of odd integers.

*                       Cycles
        move.w d1,-(sp) (8)
        move.w #-1,d1   (8)
qsqrt1  addq.w #2,d1    (4)
        sub.w d1,d0     (4)
        bpl qsqrt1      (10/8)
        asr.w #1,d1     (8)
        move.w d1,d0    (4)
        move.w (sp)+,d1 (12)
done    rts             (16)

        page
*************************************************************************
*                                                                       *
*       Integer Square Root (16 to 8 bit).                              *
*                                                                       *
*       (Exact method, not approximate).                                *
*                                                                       *
*       Call with:                                                      *
*               D0.W = Unsigned number.                                 *
*                                                                       *
*       Returns:                                                        *
*               D0.L = SQRT(D0.W)                                       *
*                                                                       *
*       Uses:   D1-D4 as temporaries --                                 *
*               D1 = Error term;                                        *
*               D2 = Running estimate;                                  *
*               D3 = High bracket;                                      *
*               D4 = Loop counter.                                      *
*                                                                       *
*       Notes:  Result fits in D0.B, but is valid in word.              *
*                                                                       *
*               Takes from 544 to 624 cycles (including rts).           *
*                                                                       *
*               Instruction times for branch-type instructions          *
*               listed as (X/Y) are for (taken/not taken).              *
*                                                                       *
*************************************************************************

*                       Cycles
gsqrt   movem.l d1-d4,-(sp) (40)
        move.w #7,d4    (8)     ; Loop count (bits-1 of result).
        clr.w d1        (4)     ; Error term in D1.
        clr.w d2        (4)
sqrt1   add.w d0,d0     (4)     ; Get 2 leading bits a time and add
        addx.w d1,d1    (4)     ; into Error term for interpolation.
        add.w d0,d0     (4)     ; (Classical method, easy in binary).
        addx.w d1,d1    (4)
        add.w d2,d2     (4)     ; Running estimate * 2.
        move.w d2,d3    (4)
        add.w d3,d3     (4)
        cmp.w d3,d1     (4)
        bls.s sqrt2     (10/8)  ; New Error term > 2* Running estimate?
        addq.w #1,d2    (4)     ; Yes, we want a '1' bit then.
        addq.w #1,d3    (4)     ; Fix up new Error term.
        sub.w d3,d1     (4)
sqrt2   dbra d4,sqrt1   (10/14) ; Do all 8 bit-pairs.
        move.w d2,d0    (4)
        movem.l (sp)+,d1-d4 (44)
        rts             (16)

        page
*************************************************************************
*                                                                       *
*       Integer Square Root (32 to 16 bit).                             *
*                                                                       *
*       (Exact method, not approximate).                                *
*                                                                       *
*       Call with:                                                      *
*               D0.L = Unsigned number.                                 *
*                                                                       *
*       Returns:                                                        *
*               D0.L = SQRT(D0.L)                                       *
*                                                                       *
*       Uses:   D1-D4 as temporaries --                                 *
*               D1 = Error term;                                        *
*               D2 = Running estimate;                                  *
*               D3 = High bracket;                                      *
*               D4 = Loop counter.                                      *
*                                                                       *
*       Notes:  Result fits in D0.W, but is valid in longword.          *
*                                                                       *
*               Takes from 1080 to 1236 cycles (including rts.)         *
*                                                                       *
*               Two of the 16 passes are unrolled from the loop so that *
*               quicker instructions may be used where there is no      *
*               danger of overflow (in the early passes).               *
*                                                                       *
*               Instruction times for branch-type instructions          *
*               listed as (X/Y) are for (taken/not taken).              *
*                                                                       *
*************************************************************************

*                       Cycles
glsqrt  movem.l d1-d4,-(sp) (40)
        moveq #13,d4    (4)     ; Loop count (bits-1 of result).
        moveq #0,d1     (4)     ; Error term in D1.
        moveq #0,d2     (4)
lsqrt1  add.l d0,d0     (8)     ; Get 2 leading bits a time and add
        addx.w d1,d1    (4)     ; into Error term for interpolation.
        add.l d0,d0     (8)     ; (Classical method, easy in binary).
        addx.w d1,d1    (4)
        add.w d2,d2     (4)     ; Running estimate * 2.
        move.w d2,d3    (4)
        add.w d3,d3     (4)
        cmp.w d3,d1     (4)
        bls.s lsqrt2    (10/8)  ; New Error term > 2* Running estimate?
        addq.w #1,d2    (4)     ; Yes, we want a '1' bit then.
        addq.w #1,d3    (4)     ; Fix up new Error term.
        sub.w d3,d1     (4)
lsqrt2  dbra d4,lsqrt1  (10/14) ; Do first 14 bit-pairs.

        add.l d0,d0     (8)     ; Do 15-th bit-pair.
        addx.w d1,d1    (4)
        add.l d0,d0     (8)
        addx.l d1,d1    (8)
        add.w d2,d2     (4)
        move.l d2,d3    (4)
        add.w d3,d3     (4)
        cmp.l d3,d1     (6)
        bls.s lsqrt3    (10/8)
        addq.w #1,d2    (4)
        addq.w #1,d3    (4)
        sub.l d3,d1     (8)

lsqrt3  add.l d0,d0     (8)     ; Do 16-th bit-pair.
        addx.l d1,d1    (8)
        add.l d0,d0     (8)
        addx.l d1,d1    (8)
        add.w d2,d2     (4)
        move.l d2,d3    (4)
        add.l d3,d3     (8)
        cmp.l d3,d1     (6)
        bls.s lsqrt4    (10/8)
        addq.w #1,d2    (4)
lsqrt4  move.w d2,d0    (4)
        movem.l (sp)+,d1-d4 (44)
        rts             (16)

        end


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: long integer square root

jnhall@sat.mot.com (Joseph Hall) writes:

> In article <rayCA61B2.6nA@netcom.com> writeth:
estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes ...
>Does somebody has a code or algorithm for extracting a long integer
> square root and returning a integer.

I suggest looking at Newton's iteration in any decent CS book on the
subject. With a sufficiently good first guess you can do it in about
12 instructions on a 68020+.

And yes, I have done it, and no, you can't have it. It belong to the
company I work for.

I pulled this one from my personal inventory. I hereby assign it to the
public domain. Enjoy.

/*
 * ISqrt--
 *     Find square root of n.  This is a Newton's method approximation,
 * converging in 1 iteration per digit or so.  Finds floor(sqrt(n) +
0.5).
 */
int ISqrt(register int n)
{
    register int i;
    register long k0, k1, nn;

    for (nn = i = n, k0 = 2; i > 0; i >>= 2, k0 <<= 1)
        ;
    nn <<= 2;
    for (;;) {
        k1 = (nn / k0 + k0) >> 1;
        if (((k0 ^ k1) & ~1) == 0)
            break;
        k0 = k1;
    }
    return (int) ((k1 + 1) >> 1);	
}

--

Here's an article I saved from c.s.m.p four months ago. Strangely, it was only distributed in North America. I don't know how fast it works in practice, but there are no multiplications or divisions in its inner loop, which is promising.

From: bruner@sp10.csrd.uiuc.edu (John Bruner)
Subject: Re: Fixed Math sqrt, asin, acos

Here's an integer square root routine.

-----------------------------------------------------------------------

#include <math.h>
 
/*
 * It requires more space to describe this implementation of the manual
 * square root algorithm than it did to code it.  The basic idea is that
 * the square root is computed one bit at a time from the high end.  Because
 * the original number is 32 bits (unsigned), the root cannot exceed 16 bits
 * in length, so we start with the 0x8000 bit.
 *
 * Let "x" be the value whose root we desire, "t" be the square root
 * that we desire, and "s" be a bitmask.  A simple way to compute
 * the root is to set "s" to 0x8000, and loop doing the following:
 *
 *	t = 0;
 *	s = 0x8000;
 *	do {
 *		if ((t + s) * (t + s) <= x)
 *			t += s;
 *		s >>= 1;
 *	while (s != 0);
 *
 * The primary disadvantage to this approach is the multiplication.  To
 * eliminate this, we begin simplying.  First, we observe that
 *
 *	(t + s) * (t + s) == (t * t) + (2 * t * s) + (s * s)
 *
 * Therefore, if we redefine "x" to be the original argument minus the
 * current value of (t * t), we can determine if we should add "s" to
 * the root if
 *
 *	(2 * t * s) + (s * s) <= x
 *
 * If we define a new temporary "nr", we can express this as
 *
 *	t = 0;
 *	s = 0x8000;
 *	do {
 *		nr = (2 * t * s) + (s * s);
 *		if (nr <= x) {
 *			x -= nr;
 *			t += s;
 *		}
 *		s >>= 1;
 *	while (s != 0);
 *
 * We can improve the performance of this by noting that "s" is always a
 * power of two, so multiplication by "s" is just a shift.  Also, because
 * "s" changes in a predictable manner (shifted right after each iteration)
 * we can precompute (0x8000 * t) and (0x8000 * 0x8000) and then adjust
 * them by shifting after each step.  First, we let "m" hold the value
 * (s * s) and adjust it after each step by shifting right twice.  We
 * also introduce "r" to hold (2 * t * s) and adjust it after each step
 * by shifting right once.  When we update "t" we must also update "r",
 * and we do so by noting that (2 * (old_t + s) * s) is the same as
 * (2 * old_t * s) + (2 * s * s).  Noting that (s * s) is "m" and that
 * (r + 2 * m) == ((r + m) + m) == (nr + m):
 *
 *	t = 0;
 *	s = 0x8000;
 *	m = 0x40000000;
 *	r = 0;
 *	do {
 *		nr = r + m;
 *		if (nr <= x) {
 *			x -= nr;
 *			t += s;
 *			r = nr + m;
 *		}
 *		s >>= 1;
 *		r >>= 1;		
 *		m >>= 2;
 *	} while (s != 0);
 *
 * Finally, we note that, if we were using fractional arithmetic, after
 * 16 iterations "s" would be a binary 0.5, so the value of "r" when
 * the loop terminates is (2 * t * 0.5) or "t".  Because the values in
 * "t" and "r" are identical after the loop terminates, and because we
 * do not otherwise use "t"  explicitly within the loop, we can omit it.
 * When we do so, there is no need for "s" except to terminate the loop,
 * but we observe that "m" will become zero at the same time as "s",
 * so we can use it instead.
 *
 * The result we have at this point is the floor of the square root.  If
 * we want to round to the nearest integer, we need to consider whether
 * the remainder in "x" is greater than or equal to the difference
 * between ((r + 0.5) * (r + 0.5)) and (r * r).  Noting that the former
 * quantity is (r * r + r + 0.25), we want to check if the remainder is
 * greater than or equal to (r + 0.25).  Because we are dealing with
 * integers, we can't have equality, so we round up if "x" is strictly
 * greater than "r":
 *
 *	if (x > r)
 *		r++;
 */
 
unsigned int isqrt(unsigned long x)
{
	unsigned long r, nr, m;
 
	r = 0;
	m = 0x40000000;
	do {
		nr = r + m;
		if (nr <= x) {
			x -= nr;
			r = nr + m;
		}
		r >>= 1;
		m >>= 2;
	} while (m != 0);
 
	if (x > r)
		r++;
	return(r);
}
>

--

Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy


From: kourakos@cardinal.ncsc.org (Alexander W. Kourakos)
Subject: Re: Area of a traingle formula wanted (not the easy one)

Area = 0.5 * (x1 * y2 + x2 * y3 + x3 * y1 - y1 * x2 - y2 * x3 - y2 * x1)

(in standard orthagonal coordinates)

This comes out of a formula for finding the area of any polygon, given the points.

awk


From: cpruett@holonet.net
Subject: Re: Area of a traingle

Originator: cpruett@zen.holonet.net

In a message dated 07-05-93 22:34 jeremy@belsize.demon.co.u wrote to comp.sys.mac.programmer:

Here!s a easy one (but read the WHOLE question before flaming): Anyone have a general purpose formula to calc the area of a triangle - anyone who says 0.5*base length*perp height gets no prize, but anyone who can give me a formula using only the 3 co-ords of the end points (say, (x1,y1),(x2,y2),(x3,y3)) does get a big smile. Ideally, the value returned can also be -ve for a triangle with the end points in a different order (see what I mean ?) Thanks, Jeremy Doig

Here's what I came up with:

                                                    2
Sqrt[(x2 y1 - x3 y1 - x1 y2 + x3 y2 + x1 y3 - x2 y3) ]
------------------------------------------------------
                          2

I tried it out on a couple of test-triangles and it seems to work but you will want to verify it for yourself. I used Heron's formula: A = Sqrt[s(s-a)(s-b)(s-c)] where s = 0.5(a + b + c) and a,b, & c are the lengths of the three sides.

Chris Pruett



Subject: Re: Point in Polygon routine needed

From: fry@zariski.harvard.edu (David Fry)

In article <8fxkxAO00WB7M=nOFO@andrew.cmu.edu> Andrew Lewis Tepper <at15+@andrew.cmu.edu> writes:

>I don't know if this routine is "standard", I just came up with it recently:
>
>For a polygon of points p1...pn, and a point P, make a table as follows:
>
>T(1)= angle from p1 to P to p2
>T(2)= angle from p2 to P to p3
>...
>T(n)= angle from pn to P to p1
>
>express all angles as: -PI < angle < PI.
>
>Add all entries in the table. If the sum = 0, the point is outside. If
>the sum is +/- PI, the point is inside. If the point is +/- xPI, you
>have a strange polygon. If ANY angle was = +/-PI, the point is on the
>border.
>
>Does anyone know if this is considered a good (known?) algorithm? It
>took a long time to come up with!

You have the essence of the "good," standard algorithm, but if you're working with thousands of polygons, those floating point calculations will be too slow and will probably lead to round-off error at some point. Round-off error is a huge pain in computational geometry problems. The hard part is finding ways to avoid it.

The trick is that you don't need to know the angle between the lines, you just need to know its sign, called the *circulation*. For every line AB in the polygon, form the triangle PAB and you want to know does it lie inside the polygon or not. That is, if you were riding a bike counterclockwise around the boundary of the polygon, would P lie on your left or your right when you crossed AB? You can find this by calculating PA x PB = (A.x-P.x)*(B.y-P.y) - (A.y-P.y)*(B.x-P.x). The circulation (the sign of PA x PB) can be found without doing a multiplication in many cases, but this is good enough. Remember, we've oriented the polygon so that you come to A before coming to B as your traverse its boundary.

Then the algorithm is:

In = 0

foreach edge AB in the polygon

if A.y < P.y and B.y >= P.y and PA x PB > 0

In = In + 1

if A.y >= P.y and B.y < P.y and PA x PB < 0

In = In - 1

Then if In = 1, P is inside the polygon, and is outside if I = 0. This doesn't handle the case where P lies on some edge AB of the polygon.

David Fry


From: fixer@faxcsl.dcrt.nih.gov (Chris Mr. Tangerine Man Tate)
Subject: Re: card shuffling algorithm

Reply-To: fixer@faxcsl.dcrt.nih.gov

Here's a little snippet to shuffle a deck of N cards in Theta(N) time (written in C for the Macintosh):

void Shuffle(short deck[N])
{
    short i, j, temp;

    for (i = N; i > 1; i--)
    {
        j = abs(Random()) % i;      // random # from 0 to i-1
        temp = deck[j];
        deck[j] = deck[i - 1];
        deck[i - 1] = temp;         // swap cards at positions i and j
    }
}

Explanation: working backwards from the "top" of the deck, swap each card with a random card from the as-yet-unvisited portion of the deck. Note that since it's possible to "swap" a card with itself, this algorithm ensures that you can generate the null shuffle, i.e. all cards still in their starting order.

If you don't need to "deal out" all the cards at once (e.g. you're shuffling for a poker game rather than for bridge), you can simply do the random swap as each card is needed, and distribute the shuffling time across the dealing.


From: cklarson@flauta.engr.ucdavis.edu (Christopher Klaus Larson)
Subject: Re: Reversing the bytes in a long...

In article <absurd-150293001044@seuss.apple.com> absurd@apple.apple.com (Tim Dierks) writes:

>In article <1993Feb13.175719.4309@kth.se>, d88-jwa@hemul.nada.kth.se (Jon
>Wtte) wrote:
> In <C2DF3w.Guz@ns1.nodak.edu> nicolai@plains.NoDak.edu (Steven Nicolai) writes:
>
> >I've encountered an interesting brain-teaser working on a program. What's
> >the shortest sequence of 68K assembly that will reverse the order of the
> >bytes in a long word. (ie convert a little endian number to big endian.)
>
> Hmm, wouldn't that just be 3 SWAP instructions?
>
[Stuff deleted]

Ain't no SWAP.B; all you can do is swap words. Here's my solution:

And here is a register-parameter inline version of Tim's solution (slow night in Davis):

/* Code Begins */

#pragma parameter __D0 SwapLong (__D1)
unsigned long SwapLong (unsigned long filpMe) =
{
    0x2001, /*  move.l  d1,d0   */
    0x4840, /*  swap    d0      */
    0x1001, /*  move.b  d1,d0   */
    0x4840, /*  swap    d0      */
    0x4841, /*  swap    d1      */
    0x1001, /*  move.b  d1,d0   */
    0xE198  /*  rol.l   #8,d0   */
};

/* Code Ends */

--Chris


From: cconstantine@galaxy.gov.bc.ca
Subject: Re: Reversing the bytes in a long...

In article <1993Feb17.151416.868@aio.jsc.nasa.gov>, mark@pokey.jsc.nasa.gov (Mark Manning) writes:

Keith Rollin of Taligent was kind enough to donate some C code to me for a database program I am writing (it reads PC files & FoxBas+/Mac files) that would do exactly that. Here it is in C:

void ConvertLong(long *num)
{
    long tempLong;
    char* sourcePtr = (char*) num;
    char* destPtr = (char*) &tempLong;

    destPtr[0] = sourcePtr[3];
    destPtr[1] = sourcePtr[2];
    destPtr[2] = sourcePtr[1];
    destPtr[3] = sourcePtr[0];

    *num = tempLong;
}

and here it is in ASM:

ConvertLong1:
00000000: 4E56 FFFC          LINK      A6,#$FFFC
00000004: 48E7 0018          MOVEM.L   A3/A4,-(A7)
00000008: 286E 0008          MOVEA.L   num(A6),A4
0000000C: 47EE FFFC          LEA       tempLong(A6),A3
00000010: 16AC 0003          MOVE.B    $0003(A4),(A3)
00000014: 176C 0002 0001     MOVE.B    $0002(A4),$0001(A3)
0000001A: 176C 0001 0002     MOVE.B    $0001(A4),$0002(A3)
00000020: 1754 0003          MOVE.B    (A4),$0003(A3)
00000024: 206E 0008          MOVEA.L   num(A6),A0
00000028: 20AE FFFC          MOVE.L    tempLong(A6),(A0)
0000002C: 4CDF 1800          MOVEM.L   (A7)+,A3/A4
00000030: 4E5E               UNLK      A6
00000032: 4E75               RTS
00000034

According to Keith, this is also a faster routine than doing a 1 line bitshift operation such as:

void ConvertLong(long *num)
{
    *num = (*num >> 24) | ((*num >> 8) & 0xFF00) | ((*num
<< 8) & 0xFF0000) |
(*num << 24);
}

that one line produces the following in ASM:

00000000: 4E56 0000          LINK      A6,#$0000
00000004: 2F0C               MOVE.L    A4,-(A7)
00000006: 286E 0008          MOVEA.L   $0008(A6),A4
0000000A: 2014               MOVE.L    (A4),D0
0000000C: 7218               MOVEQ     #$18,D1
0000000E: E2A0               ASR.L     D1,D0
00000010: 2214               MOVE.L    (A4),D1
00000012: E081               ASR.L     #$8,D1
00000014: 0281 0000 FF00     ANDI.L    #$0000FF00,D1
0000001A: 8081               OR.L      D1,D0
0000001C: 2214               MOVE.L    (A4),D1
0000001E: E189               LSL.L     #$8,D1
00000020: 0281 00FF 0000     ANDI.L    #$00FF0000,D1
00000026: 8081               OR.L      D1,D0
00000028: 2214               MOVE.L    (A4),D1
0000002A: 7418               MOVEQ     #$18,D2
0000002C: E5A9               LSL.L     D2,D1
0000002E: 8081               OR.L      D1,D0
00000030: 2880               MOVE.L    D0,(A4)
00000032: 285F               MOVEA.L   (A7)+,A4
00000034: 4E5E               UNLK      A6
00000036: 4E75               RTS
00000038

NOTE: it's almost twice as long (if not longer) and therefore about 1/2 as fast!.

Just my (Keith's really) $0.02 worth.

Carl B. Constantine

CCONSTANTINE@galaxy.gov.bc.ca


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: void comparePStrings();

Here's a little function I wrote that I thought I'd share with y'all. It compares Pascal strings in a manner suitable for sorting. It doesn't actually do alphabetical order. (I figure, in most cases, if you're going to try to do alphabetical order, you should use the Toolbox and properly handle diacriticals.)

I wrote this because I have an app that processes lists of strings that have to be sorted so I can access them with a binary search. I decided IUComparePStrings() or whatever it's called would be too slow, and that I was going to have to roll my own. Here it is.

Public domain.

short comparePStrings(register unsigned char *str1,
   register unsigned char *str2,
   Boolean caseSen)
{
      /*
       * This comparison routine returns 0 if the two strings passed to it
       * are equal, and a negative or positive number otherwise.  The
       * values can be used to sort strings, i.e. the transitivity law
       * applies to this function, i.e. if cPS(s1, s2)<0 and
       * cPS(s2, s3)<0, then it's guaranteed that cPS(s1, s3) is also
<0.
       *
       * But it doesn't exactly sort in alphabetical order.  Specifically,
       * length is more important than the characters in the string.
       * For example, cPS("\pa", "\pb") == -1 because 'a' precedes 'b',
       * but cPS("\paa", "\pb") == 1 because "b" is shorter than "a".
       */
   asm {
         move.b   (str1), D0   // get 1st string's length
         sub.b    (str2)+, D0  // subtract 2nd string's length
         bne      @done2       // if unequal, we're done; return len diff
         movem.l  D1-D3, -(A7) // save registers we'll need
         moveq    #0, D2       // clear out length
         moveq    #32, D3      // store ('a'-'A') in a handy register
         move.b   (str1)+, D2  // put length in its register
         beq.s    @done1       // if length is zero, we're done; return 0
         subq     #1, D2       // subtract one from length, to set up dbra
         tst.b    caseSen      // is this a case-sensitive compare?
         beq.s    @nsLoop      // no, go to that nonsensitive loop
@sLoop:                        // yes, do this nonsensitive loop
         move.b   (str1)+, D0  // get char from 1st string
         sub.b    (str2)+, D0  // subtract off corresponding char from 2nd
         bne      @done1       // if unequal, we're done; return char diff
         dbra     D2, @sLoop   // loop on length
         bra.s    @done1       // we're done looping; return 0
@nsLoop: 
         move.b   (str1)+, D0  // get char from 1st string
         cmp.b    #'Z', D0     // is it greater than 'Z'?
         bgt.s    @nonUC0      // if so, jump to non-upper-case section
         cmp.b    #'A', D0     // is it less than 'A'?
         blt.s    @nonUC0      // if so, jump to non-upper-case section
         add      D3, D0       // it's uppercase;  make it lowercase
@nonUC0: move.b   (str2)+, D1  // get char from 2nd string
         cmp.b    #'Z', D1     // is it greater than 'Z'?
         bgt.s    @nonUC1      // if so, jump to non-upper-case section
         cmp.b    #'A', D1     // is it less than 'A'?
         blt.s    @nonUC1      // if so, jump to non-upper-case section
         add      D3, D1       // it's uppercase;  make it lowercase
@nonUC1: sub.b    D1, D0       // subtract 2nd char from 1st
         bne      @done1       // if unequal, we're done; return char diff
         dbra     D2, @nsLoop  // loop on length
                               // length is zero, we're done; return 0
@done1:  movem.l  (A7)+, D1-D3 // restore registers we used
@done2:  ext.w    D0           // extend byte to a short and return it
   }
}

--

Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: C inline swapping functions

After seeing Jon and Matthias' three-instruction code for reversing the order of bytes in a long, I decided to make it a part of my personal library. I also decided to play around with that "#pragma parameter" thing, to make the functions inline. Here's what I ended up with; I'm posting it here because maybe I'll save someone the ten minutes it takes to assemble the code by hand. And also so that, if I did something stupid, someone will point it out. :-)

Public domain, of course.

   /*
    * swapBytes() swaps the bytes of the word at the given location.
    * It doesn't usually need to return a value, but it's easy to do so
    * because of the implementation.  The value returned is equal to
    * the 16-bit word, after it's been swapped.
    */
#pragma parameter _D0 swapBytes(__A0)
short swapBytes(void *theShort) = {
   0x3010,        // move.w (A0), D0
   0xE058,        // ror.w D0
   0x3080         // move.w D0, (A0)
} ;
 
   /*
    * getIBMShort() returns the value of the 16-bit word at the given
    * location, with the lo and hi bytes reversed.  It does the same
    * thing as swapBytes() except it doesn't store the new value into
    * the memory location.
    */
#pragma parameter __D0 getIBMShort(__A0)
short getIBMShort(void *thePtr) = {
   0x3010,        // move.w (A0), D0
   0xE058         // ror.w D0
} ;
 
   /*
    * getIBMLong() returns the value of the 32-bit longword at the
    * given location, with the byte order reversed.
    */
#pragma parameter __D0 getIBMLong(__A0)
long getIBMLong(void *thePtr) = {
   0x2010,        // move.l (A0), D0
   0xE058,        // ror.w D0
   0x4840,        // swap D0
   0xE058,        // ror.w D0
} ;

--

Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: Preloading regions? Is that a good idea?

dmoney@magnus.acs.ohio-state.edu (Dean R Money) writes:

>I'm writing a game that's played on a board with 150 small hexagons.

I'm assuming these hexes are laid out in a regular grid...

>Like Risk, when a player moves/attacks, he clicks on a region. I'm
>using polygons when I draw the board in the first place, then when I
>have to identify which hexagon is clicked on, I'm having to open 150
>regions, one by one, and use PtInRgn.

150's a little excessive. You can narrow it down to seven if you want to work hard, or nine if you don't. (I don't.)

I just finished an app that uses a hex grid. As it happens, I needed to outline the hexes at times, as well at detecting clicks in them. So I decided to use regions.

I built one region, the size and shape of one hex. (Wasn't easy, since it had to conform exactly to the bitmapped image our artist drew. I gave up on MoveTo()/LineTo() and went with FrameRect(). If your hexes are large, lines might work best for you; mine were 20x18.) I kept this guy in a global region handle.

When I needed to outline a hex, I'd OffsetRgn() it to the appropriate place and FrameRgn() it. Since OffsetRgn() works relative, not absolute, you have to peek at the region's rgnBBox to move it to an absolute location. Since I did this all the time, I made a routine that would return me the global region, offset to the proper point, and would build the region if it happened to be the first time it was being called:

RgnHandle CSkinPixels::getHexRgn(short offsetV, short offsetH)
{
   if (theOneTrueHexRgn == NULL) {
      Rect r;
      theOneTrueHexRgn = NewRgn();
      OpenRgn();
      // Region-building code deleted.
      // (You don't care how my hexes look.)
      CloseRgn(theOneTrueHexRgn);
   }
   
   OffsetRgn(theOneTrueHexRgn,
      offsetH - (**theOneTrueHexRgn).rgnBBox.left,
      offsetV - (**theOneTrueHexRgn).rgnBBox.top);
   
   return theOneTrueHexRgn;
}

I also needed a routine to turn "hex grid" coordinates into QuickDraw h,v coordinates. The former, I call "loc," the latter, "offset." You probably already have a routine like this, though yours may look different (there's two schools of thought for laying out hex grids).

void CSkinPixels::getHexOffset(short locV, short locH,
   short *vOffset, short *hOffset)
{
   *hOffset = -40 + (locH+locV)*15;
   *vOffset = 92 + (locH-locV)*9;
}

But what you need is a routine that converts from _offset_ to _loc_. The first thing is to solve those equations in reverse. See those two got to turn them around and express them in terms of the other two variables. Yeah, that algebra stuff--you have to add equations together and everything. (I learned how to do this quickly in linear algebra, but I'm afraid I've forgotten it all. Sorry, Dr. Fink.)

My two "inverse" functions ended up being:

   *locH = ((long) 3*hOffset + 5*vOffset - 340) / 90;
   *locV = ((long) 3*hOffset - 5*vOffset + 580) / 90;
But that just gives you a point that's close to the one you want. After you know that, you have to go through all the adjacent hexes and test them to see if they're the proper ones. Here's what my "offset to loc" function ended up looking like:
void CSkinPixels::getHexLocation(short vOffset, short hOffset,
   short *locV, short *locH)
{
   short i, j;
   Point thePt;
   
   *locH = ((long) 3*hOffset + 5*vOffset - 340) / 90;
   *locV = ((long) 3*hOffset - 5*vOffset + 580) / 90;
   
   thePt.v = vOffset;  thePt.h = hOffset;
   
   for (j = -1; j <= 1; ++j) {
      for (i = -1; i <= 1; ++i) {
         RgnHandle theRgn;
         short vO, hO;
         getHexOffset(*locV+j, *locH+i, &vO, &hO);
         theRgn = getHexRgn(vO, hO);
         if (PtInRgn(thePt, theRgn)) {
            goto foundIt;
         }
      }
   }
   DebugStr("\pCouldn't find it!");
   
foundIt:
   *locV += j;  *locH += i;
}

A little more complicated, but it'll take you 150 times less storage, and will run up to 15 times as fast. (My app never hit the debug line, in case you were wondering. :-)

Dean, if you want to chat more about this, email me; the details of this probably don't interest c.s.m.p...

--

Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy


From: chris@benton.prepress.com (christopher m. knox)
Subject: Re: Bit Manipulation help needed.

In article <rudolph.728896597@unixg.ubc.ca> Christopher E Rudolph, rudolph@unixg.ubc.ca writes:

>First off I'm weak on and'ing and or'ing bits so bear with me if I....

Hi, (we Chris's need to stick together, eh ? :-))

A long integer is 32 bits, a short integer is 16 bits.

Using toolbox calls which allow a long as a bit number, you can set and clear bits in any structure by passing an address to the structure: counting from LEFT to RIGHT and starting at 0 you can set/clear/test bit N in your long value V as follows:

BitSet((Ptr) &V,(long) N);
BitClr((Ptr) &V,(long) N);
bitIsSet = BitTst((Ptr) &V,(long) N);
If you want to go RIGHT to LEFT (as in your example), use (31 - N) instead of N. If it has to be fast, don't use toolbox calls: (from RIGHT to LEFT):

t = 1L << N; /* N = 0 => 1, N = 2 => 2, N = 3 => 4 .... */


V |= t;                                /* set */
V &= ~t;                            /* clear */
bitIsSet = (V & t) ? 1 : 0;  /* test */

If it has to be still faster, then you should READ the value from

LEFT to RIGHT and fill it easily as follows:

BTW, V <<= N shifts V N bits to the left and shifts in 0s at the right.

> right 0000 0000 0000 0001

> right 0000 0000 0000 0011

> left 0000 0000 0000 0011

> right 0000 0000 0000 1011

> left 0000 0000 0000 1011

#define	PushRIGHT(_V)	{(_V) <<= 1; (_V) |= 1;}
#define	PushLEFT(_V)	{(_V) <<= 1; /* (_V) |= 0; */}

V = 0L;			          /* ... 0000 0000 0000 0000 */

PushRIGHT(V);		 /* ... 0000 0000 0000 0001 */
PushRIGHT(V);		 /* ... 0000 0000 0000 0011 */
PushLEFT(V);		   /* ... 0000 0000 0000 0110 */
PushRIGHT(V);		 /* ... 0000 0000 0000 1101 */
PushLEFT(V);		   /* ... 0000 0000 0001 1010 */

and read it back (albeit in reverse order):

#define	PopBIT(_V,_bit)	{_bit = (_V) & 1L; (_V) >>= 1;}

PopBIT(V,isRight); 
if (isRight) {...}
else            {...}

and so on... I hope this helps you some in 'bitsing' around...

chris knox, primary programmer, SpectreTouch

"Omne animal triste est post (or without) coitum" Umberto Eco


From: salmon@cwis.unomaha.edu (David Salmon)
Subject: Re: faster division routine wanted.

bayes@hplvec.LVLD.HP.COM writes:

> I wrote a routine that seems almost correct, and that allows you to
> quickly calculate a set of shifts and add/subs that approximate the
> division by any divisor. I know there's some stuff wrong with it, but
> thought it's worth sending out now, while the iron is hot.

The above mentioned algorithm is essentially correct. However, the implementation:

> 230 ! X/1000 = X>>10 + X>>15 - X>>17 + X>>21
> 240 ! The actual division this performs is very close to:
> 250 ! X/1000.0724845 (as shown by "approximation" printout.)

Does not really reporduce the divide by 1000 correctly. The higher order shifts lose the precision you need for accuracy. You need to shift-add- shift-add as in the following definitions:

/* This definition gives accurate results for n < 1 764 866 177 */

#define	divNum8(n)	((((((((((((((((-n)>>4)-n)>>3) \

+n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10

/* This definition gives accurate results for n < 1 762 037 711 */

#define	divNum7(n)	((((((((((((((-n)>>3) \

+n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10

/* This definition gives accurate results for n < 492 965 001 */

#define	divNum6(n)
(((((((((((n>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)&t;>10)

/* Use of fewer terms yields unacceptable results */

The definition divNum6 is roughly the equivalent of the x/1000 statement given above, but maintains the precision. The numbers listed above represents the limits for which the macros give the same value as the divide instruction. You must use full registers (32 bits) to get the correct results. On a IIfx, the above macro (divNum8) is about 25% faster than the divide instruction. divNum6 which should be sufficient for your purposes should be faster.

Hope this helps.

ds

--

David C. Salmon

salmon@unomaha.edu


From: bruner@sp10.csrd.uiuc.edu (John Bruner)
Subject: Re: faster division routine wanted.

An alternate way to do division is to multiply by the fixed-point reciprocal. In this case, you can do it with

quotient = (0x8312 * numerator + 0x8000) >> 25;


From: bayes@hplvec.LVLD.HP.COM (Scott Bayes)
Subject: Re: (Q) Need bit manipulation algorithm

> name, but whose initials are M.A.C. And if you want a solution for

> a word, and you don't want a 32K lookup table, you actually have to

> _shift_! Shifting is for weenies.

But for a word, it's just the sum of the bits in each byte. Code something like this (untested) should do it PDQ.

---

* at beginning, d0 contains a word whose "1" bits we wish to count
* at end, d2 contains number of 1 bits in its lower byte

        moveq   #0,d1                   - clear table index register
        move.b  d0,d1                   - lower byte of data word
        move.b  btable(d1.w),d2         - nbits for lower byte (PC relative)
        lsr.w   #8,d0                   - access upper byte of data word
        move.b  d0,d1                   - upper byte of data word
        add.b   btable(d1.w),d2         - d2 contains total bits of word

---

ScottB


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: faster division routine wanted.

sage.cc.purdue.edu (Broc Seib) writes:

>ctvien@cs.concordia.ca (VIEN cam thanh) wrote:

>>

>> I am looking for a division routine that will divide a 2 bytes number

>> by 1000 faster than M68000's DIV instruction.

First of all: are you really on the 68000? If not, you should be aware that the '020 and later chips are much faster at division. My 68000 manual pegs a signed division at 158 cycles, +/- less than 10%. You can get a few dozen shifts and moves in that time, so custom code might help. But on an '020 or later, the chip will almost certainly beat you.

>Replace your DIV operand with an

>LSR (or is it ASR --- which ever one fills in the upper byte with zeros as

>it shifts.)

LSR. Logical shifts fill with zero, arithmetic shifts copy the sign bit.

>With this operand you can _quickly!_ divide by 1024 (shift to

>the right 10 bits). i.e.

>

> move.w #$operand,d0

> lsr.w 10,d0

Actually, you have to put the 10 into a register; you can only shift up to eight bits in immediate mode. You can shift a longword.

>If dividing by 1000 _exactly_ is of essence, then you may accomplish this with

>a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as

>long (or possibly) longer than the DIV operand itself. You would have to

>count operand cycles (and compare to DIV) to determine this.

>

> [sample multiply-by-ten code deleted]

If I'm not mistaken, hardcoded division is a lot harder than hardcoded multiplication. Dividing by 1000 is dividing by eight and then dividing by five three times. How do you divide by five? To divide by five, you divide by four and take four-fifths of the remainder. How do you take four-fifths of something? You divide by five and subtract the result from the original. You recurse like this until the numbers get small, which is not all that efficient. And it's no fun to write either. This is why microcode engineers get paid lots of money.

If you really need speed, and you're on a 68000, build a 65,536-byte lookup table, so that x/1000==table[x]. Fourteen cycles. Can't beat it!

--

Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy


From: sage.cc.purdue.edu (Broc Seib)
Subject: Re: faster division routine wanted.

In article <5662@daily-planet.concordia.ca>, ctvien@cs.concordia.ca (VIEN cam thanh) wrote:

I am looking for a division routine that will divide a 2 bytes number by 1000 faster than M68000's DIV instruction.

I am not sure of the application in which you are doing this, but IF you only need an approximate division by 1000, AND this is an operation on a 2-byte integer (not floating point numbers!), then you may take advantage of the processor's speed in "shifting". Replace your DIV operand with an LSR (or is it ASR --- which ever one fills in the upper byte with zeros as it shifts.) With this operand you can _quickly!_ divide by 1024 (shift to the right 10 bits). i.e.

move.w #$operand,d0 lsr.w 10,d0

- Broc Seib, Purdue University

----------------------------------------------------------------------------

If that solution is inadequate, then this may possibly help...

----------------------------------------------------------------------------

If dividing by 1000 _exactly_ is of essence, then you may accomplish this with a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as long (or possibly) longer than the DIV operand itself. You would have to count operand cycles (and compare to DIV) to determine this.

I will show an analogous application that multiplies a number by 10 (before the days of luxurious MUL operand); I will leave the divide by 1000 story as a homework problem for the reader.

The multiply by 10 story is as follows:

	move.w	#$operand,d1
 move.w d1,d0
	lsl.w		2,d0
 add.w		d0,d1
	lsl.w		1,d0

Verify this multiplication by 10 for yourself w/ pencil & paper.

Hint: The divide by 1000 story is simply 3 divide by 10's; just be careful with what you subtract from the "accumulator" register!

(I am a little rusty on my assembly; please excuse my mistakes)

Good Luck.

- Broc Seib


From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: (Q) Need bit manipulation algorithm

Summary: algorithm analysis & gratuitous MicroSoft-bashing

chris_russo@gateway.qm.apple.com (Christopher Russo) writes:

>Matthew T. Russotto, russotto@eng.umd.edu writes:
>>There is a neat trick for counting the number of bits in a word that
>>a Microsoft geek showed me one time. If there are k bits with 1's,
>>it runs in O(k) time.
>
>I learned a cool way in an interview here at Apple. For the number of
>bits in a byte, all you'd need is a 256 byte look-up table where the
>index designated by, say, 00001010 would be index $A into the table which
>would contain the number 2 (that you put there.) It's pretty fast.

Naturally, however, the cool hackers at MicroSoft wouldn't use that! Heavens, a 256-byte lookup table is 1/2500th of the memory below the 640K barrier! And pointers to global variables are so hard to use anyway, unless you want to use one of those wimpy high-level languages, or those spineless computers with the flat memory models that we won't name, but whose initials are M.A.C. And if you want a solution for a word, and you don't want a 32K lookup table, you actually have to _shift_! Shifting is for weenies.

No, despite being ten to fifty times slower, it's much better to write:

   for (nOneBits=(byte?1:0); byte&=(byte-1); ++nOneBits) ;
...than:
   nOneBits = gOneBitsArray[byte];


From: Ray.Arachelian@f204.n2603.z1.ieee.org (Ray Arachelian)
Subject: (Q) Need Bit Manipulation

On 01-06-93, FELCIANO@SUMMIT.STANFORD. wrote to ALL:

F> If I want to match 00110110, there is no exact match. In this case, I
F> want it to return #6, which is only 1 bit off. If #6 weren't there, I
F> would want it to return #9, which is 2 bits off, then #10, which is
F> also 2 bits off, then #2 (1 bit off), then #1 (default)
F>
F> So: I need an algorithm that will find a consistent fuzzy match by
F> subtracting bits. It needs to deal with ambiguities like #9 and #10.
F> Thus if two matches have 90% of the bits in common, I need both of
F> them back, one after the other (the order doesn't matter).

Since string manipulation is a lot easier than bit manipulation from high level languages in terms of searching and matching, you may want to first convert both the search bit pattern and the pattern you're searching for into two strings and then do an in-string search (strstr() function in C)

If you don't come up with a match, take your search string and flip the first bit, do strstr() again. If it fails, flip the first bit back to what it was and flip the next bit. Do the search again, etc.

Once you get this to work, you can think about a way of doing this with just bits, but you should do it with a bit mask and rotations. Why a bit mask? Because bytes, words, and long words are of fixed size. If you're searching for a fixed word, this is fine, but if you're searching for three bits, let's say, then life is much harder. What you'd want to look for is actually more like *11001* where the *'s are like Unix wild cards.

The way to do this is to keep two words, or two longs for your search patter. The pattern itself, and a bit mask. IE:

*11001* in a long word search would become:

11010000 00000000 00000000 00000000-> search string

11110000 00000000 00000000 00000000-> bit mask.

Then you take your candidate element, perform an AND operation between it and your bit mask. Then you compare the result of this with your search string. If it doesn't match, you rotate both the bit mask and the search string to the right by one bit, and try again.

When you've rotated 32 times (ie: your original search string matches your rotated search string) and you still haven't found a match, you can begin flipping bits within your bit mask. This will now all of a sudden become VERY complex. :-)

Tackle this problem with strings first, then turn it in bits, then re-write the whole thing in assembly because it will be a VERY SLOW search indeed.

Here's a bit of a binary fuzzy search code for you. This is off the top of my head, so I don't know if it will work. Bugs are guaranteed. :-) Play with it once you understand the model.

S -> long word containing search string of bits.
M -> long word containing mask of desired bits to search for.
A[i] -> array of long words to search in.
sr -> rotating search long word.
mr -> rotating mask long word.
sf -> search long word with flipped bits.

int fuzzy_bit_search(long &A[],int last_i,
                     unsigned long S, unsigned long M)
{
 unsigned long sr,mr,sf;
 int i;
 int flip=0;
 
 flip=1<<31// set flip to be 2^31, * check this for correctness *
       
 for (i=0; i<=last_i; i++)   // go through every element in the array
 while (flip!=0)             // go through every bit flip
 { sf=S;
   sr=~sf;
   while(sf!=sr)             // go through every rotation.
    {
    sr=sf;
    mr=M;

    test = ((A[i] & mr)==(sr & mr));  // test for match with mask
    if (test) return i;

    sr=sr>>1;
    mr=mr>>1;
    }
   sf=sf ^ flip;
   flip=flip>>1;     
  }
return(-1);                   // sorry, we couldn't find a match.
}

Here are a couple of problems for the above:

1. The first match will be returned, this may be the wrong thing to do because you may have 111011 match with 111001 when you may have an element later on that is 111011. A sorting routine may help, but this will have to be done with strings as doing this with bits gets even more complex.

2. The code is not efficient. It will flip bits that are outside of the mask.

3. It only works for an exact match, or a match that is off by exactly one bit. You may again wish to build a better function that will build a list of candidates and return the best match, and also provide a way of doing a second and third flipped bit routine. (For each level of flipped bits, you'll need another nested loop and another flip variable.) You'll have to decide how many bits you wish at maximum to be off.

4. You have no way of knowing the closeness of the match. IE: How far off the flipped bits are from the real search string.

5. This code is untested, so it's sure to be buggy. :-)

Please post up your source code when you do the full fuzzy search, and

good luck.

rarachel@ishara.poly.edu

Ray.Arachelian@f204.n2603.z1.fidonet.org

* Freddie 1.3b4 * Do infants have as much fun in infancy as adults in

adultery?

--


From: chris_russo@gateway.qm.apple.com (Christopher Russo)
Subject: Re: (Q) Need bit manipulation algorithm

In article <1993Jan08.012806.14863@eng.umd.edu> Matthew T.Russotto, russotto@eng.umd.edu writes:

>>There is a neat trick for counting the number of bits in a word that
>>a Microsoft geek showed me one time. If there are k bits with 1's,
>>it runs in O(k) time.

I learned a cool way in an interview here at Apple. For the number of bits in a byte, all you'd need is a 256 byte look-up table where the index designated by, say, 00001010 would be index $A into the table which would contain the number 2 (that you put there.) It's pretty fast.


From: reinoud@dutecai.et.tudelft.nl (R. Lamberts)
Subject: Julia's Dream algorithm + example code

Reply-To: reinoud@dutecai.et.tudelft.nl (R. Lamberts)

Originator: reinoud@dutecai.et.tudelft.nl

Hi there!

This is for all of you who remember Julia's Dream (JD), a Mac application which calculates Julia sets really fast. Many people asked me how it works, but I'm not that good at explaining these things, so a lot of you must have been dissappointed. I humbly apologise for never having written a decent piece about the algorithm! However, here's a simplified implementation of JD written in C, and before that another attempt at explaining it.

The idea of Julia's Dream is very simple, even though I think it's not easy to explain. Maybe it's best to have a look at the source below if you don't like vague stories ;-). This code is based on the same idea as the Mac version, though the implementation is quite different.

The idea of JD is that with the escape-time algorithm, each iteration is essentially independent of its context. It does not make any difference whether a certain point in an orbit is an intermediate position or it is a starting position corresponding to a pixel. This is true for Julia sets, but not for the Mandelbrot set, where the (c or mu) parameter varies with the starting point (pixel). Now, because for a Julia set each intermediate point in an orbit may also be considered as a starting point, if a pixel is associated with each intermediate orbit point, the calculation of a pixel value effectively requires just one iteration.

There are various algorithms possible to implement this idea. One way is to calculate an orbit for a certain starting point, and assign pixel locations and values to each point in the orbit afterwards. This is used in the code below. The original Mac version of JD did calculations with each orbit point rounded to a pixel location each iteration, allowing for significant speed-up if the CPU is slow relative to memory access.

If you want to see the Mac code, let me know. It's a lot of hairy 68020 assembly code though...

To improve accuracy around the origin, some pixels are calculated using the good old escape-time algorithm there. There are probably more elegant ways to do this, but I didn't exercise the options.

The code below compiles on a SGI Indigo with entry (8-bit) graphics (sorry, I didn't even initialise the colormap). It uses the GL, not X. This is *not* finished code, especially the user interface is a quick hack. I've included the GL stuff with the hope that a concrete example may be easier to understand for some. Sorry about the lack of comments, this was really an experiment to see how well the JD algorithm would work out in C with straightforward FP calculations.

Well, I hope this is of any help (or interest, if you didn't know JD). Many thanks to all who contacted me about this! If you have ideas for (algorithmic) improvement, spiffy programs, or further questions, I'd very much like to know.

Have fun,

- Reinoud

/*
 *  Julia's Dream II  (XP hack version)
 *  
 *  The nightmare continues...
 * 
 *  Copyright (c) 1992 Reinoud Lamberts, all rights reserved
 * 
 *  You are free to use this stuff as you like, as long as
 *  you give credit where credit is due.
 * 
 * 
 *  Reinoud Lamberts
 *  Oldambt 69
 *  3524 BD Utrecht
 *  The Netherlands
 * 
 *  Phone (int'l code 31) 30 896775
 *  
 *  reinoud@duteca.et.tudelft.nl
 *  reinoud@knoware.ruu.nl (UUCP)
 *  reinoud%knoware.ruu.nl@accucx.cc.ruu.nl
 * 
 */


#include <stdlib.h>
#include <gl/gl.h>
#include <gl/device.h>


#define XRES 256		    /* better be power of 2 */
#define YRES 256
#define XLOG 8			    /* shift amount i.s.o multiply */


typedef unsigned char byte;	    /* should be 8 bits (and int 32) */
typedef double fl;		    /* optimise for performance */


byte *ospmap;			    /* things happen in here */


main()
{
    int xi, yi, ospindex, it, i, ospwsiz, bhsiz, jinx;
    fl deltax, deltay, idx, idy;
    fl xc, yc, sqre, sqim, sqyc, re, im, cre, cim;
    int orb[10000];
    int bound, empty, hitval;
    int pro, con;

/*
 *  yeah, let's arrange for some accomodation or something
 */

    ospmap=malloc(YRES*XRES);
    if (ospmap==NULL)
      exit(1);

/*
 *  let's do the window thing now
 */

    prefsize(XRES, YRES);
    winopen("Julia's Dream II  XP hack version  (c) 1992 Reinoud Lamberts");
    pixmode(PM_SIZE, 8);	    /* 8 bits offscreen*/

/*
 *  now let's start talking business
 */


/*  constants */

    deltax=4.0/(XRES-1);		    /* section of c plane is fixed */
    deltay=4.0/(YRES-1);
    idx=1/deltax;
    idy=1/deltay;
    
    ospwsiz=((YRES*XRES)>>2);
    bhsiz=XRES>>4;
    
    do
    {
    
      /* init */
  
      cre=-2+getvaluator(MOUSEX)*deltax;
      cim=-2+getvaluator(MOUSEY)*deltax;
    
      for (i=((YRES>>1)*XRES)>>2; i<ospwsiz; i++)
	((int *) ospmap)[i]=0;
	
      /* enter the black hole */
  
      yc=deltay*0.5;
  
      for (yi=(YRES>>1); yi<(YRES>>1)+bhsiz; yi++)
      {
        xc=-(bhsiz+0.5)*deltax;
	sqyc=yc*yc;
	ospindex=(yi<<XLOG)+(XRES>>1)-1-bhsiz;
	
	for(xi=(XRES>>1)-1-bhsiz; xi<((XRES>>1)+bhsiz); xi++)
	{
	  sqim=sqyc;
	  sqre=xc*xc;
	  re=xc;
	  im=yc;
	  it=0;
	  
	  while( (bound=((sqre+sqim)<4.0))&&(it<64) )
	  {
	   im=2.0*re*im+cim;
	   re=sqre-sqim+cre;
	   sqre=re*re;
	   sqim=im*im;
	   
	   it++;
	  }
	  
	  if (!bound)
	    ospmap[ospindex]=255-it;
	  else
	    ospmap[ospindex]=255;
	  
	  xc+=deltax;
	  ospindex++;
	}
	
	yc+=deltay;
      }

      /* explore outer space */

      ospindex=(YRES>>1)*XRES;
      yc=deltay*0.5;
      
      /* okay, here it goes */
  
      for (yi=(YRES>>1); yi<YRES; yi++)
      {
	xc=-2.0;
	sqyc=yc*yc;
	
	for (xi=0; xi<XRES; xi++)
	{
	  sqim=sqyc;
	  sqre=xc*xc;
	  re=xc;
	  im=yc;
	  it=0;
	  jinx=ospindex;
	  
	  while ( (bound=((sqre+sqim)<4.0)) &&
	          (empty=((hitval=ospmap[jinx])==0)) && (it<10000)
		)
	  {
	    orb[it]=jinx;
	    ospmap[jinx]=255;
	  
	    im=2.0*re*im+cim;
	    re=sqre-sqim+cre;
	    if (im<0)
	    {
	      im=-im;
	      re=-re;
	    }
	    sqre=re*re;
	    sqim=im*im;
	   
	    jinx=( ((int) ((im+2.0)*idy+0.5)) <<XLOG)+(re+2.0)*idx+0.5;
	    it++;	    
	  }
	  if (it!=0)
	  {
	    if ((!empty)&&(hitval!=255))
	      do
	      {
		if (--hitval==0)
		  hitval=254;
		ospmap[orb[--it]]=hitval;
	      }
	      while (it>0);
	    else
	    if (!bound)
	    {
	      hitval=255;
	      do
	      {
		if (--hitval==0)
		  hitval=254;
		ospmap[orb[--it]]=hitval;
	      }
	      while (it>0);
	    }
	  }
	  
	  xc+=deltax;
	  ospindex++;
	}
	
	yc+=deltay;
      }
      
      pro=0;
      con=YRES*XRES-1;
      do
      {
	ospmap[pro++]=ospmap[con--];
      }
      while(pro<=con);
      
      lrectwrite(0, 0, XRES-1, YRES-1, (unsigned long *) ospmap);

    }
    while (1);

    return 0;			    /* easy on the compiler */
}


From: jmatthews@desire.wright.edu
Subject: Detecting double click?

Source:Pascal

Recently I suggested the following way to check for double clicks:

 var gLastMouseUp: EventRecord;

 procedure SaveLastMouseUp(event: EventRecord);
 begin
   gLastMouseUp := event
 end;

 function CheckDoubleClick (event: EventRecord): Boolean;
   var slop: Rect;
 begin
   with gLastMouseUp.where do
     SetRect(slop, h - 2, v - 2, h + 2, v + 2);
   CheckDoubleClick:= ((event.when - gLastMouseUp.when) <=
     GetDblTime) and (PtInRect(event.where, slop))
 end;

In the main event loop, call SaveLastMouseUp in response to mouseUp events, and call CheckDoubleClick in response to mouseDown events. The latter returns true when a double click (as defined in IM vol.I, p.260) occurs.

This definition of a double click can lead to some confusion in subsequent calls to StillDown or WaitMouseUp. An alternative is to measure time between mouseDown events as follows:

 var gLastMouseDown: EventRecord;
 {should initialize gLastMouseDown.when to 0}

 function CheckDoubleClick (event: EventRecord): Boolean;
   var
     slop: Rect;
     test: Boolean;
 begin
   with gLastMouseDown.where do
     SetRect(slop, h - 2, v - 2, h + 2, v + 2);
   test := ((event.when - gLastMouseDown.when) <= GetDblTime) and
            (PtInRect(event.where, slop));
   if test then
     gLastMouseDown.when := 0 {prevent multiple clicks}
   else
     gLastMouseDown := event; {save this one for future testing}
   CheckDoubleClick := test
 end;

How are others doing it?

John B. Matthews


From: russotto@eng.umd.edu (Matthew T. Russotto)
Subject: Re: (Q) Need bit manipulation algorithm

In article <20751@ucdavis.ucdavis.edu> lim@toadflax.cs.ucdavis.edu (Lloyd Lim) writes:

>
>There is a neat trick for counting the number of bits in a word that
>a Microsoft geek showed me one time. If there are k bits with 1's,
>it runs in O(k) time.

They probably use it for job interviews....

ones = 0;
while (k) {
	k=k&(k-1);
	ones++;
}


From: bowman@reed.edu (BoBolicious)
Subject: Re: (Q) Need bit manipulation algorithm

In article <1993Jan6.022430.19469@leland.Stanford.EDU> felciano@summit.stanford.edu (Ramon M. Felciano) writes:

>I've got an interesting bit-manipulation problem for you. I need to
>know how to match a given bit pattern into a stored array of patterns.
>In particular, I need to come up with a "fuzzy match" if no exact one
>is found.

The best way I can see to do this is to XOR your test pattern with each member of the table. Then the number of "on" bits (the Hamming distance?) is the number of bits by which the two patterns differ. I don't know the most efficient way to count the number of "on" bits; there are several ways to do it, so if you need it to be super-fast, you might want to experiment between looping through each bit in the pattern, shifting each time and testing the lowest (or highest, depending on which way you shift) bit. Or you could maintain a table which contained 1,2,4,8,16,... and AND the result of each XOR with each member of *this* table.

Doing the actual bookkeeping to keep track of which one(s) to return and by how many bits they differ from the test pattern should be pretty trivial.


From: lim@toadflax.cs.ucdavis.edu (Lloyd Lim)
Subject: Re: (Q) Need bit manipulation algorithm

Summary: bit counting and gratuitous Microsoft bashing

In article <1993Jan6.032113.597@planets.risc.rockwell.com> sho@risc.rockwell.com (Sho Kuwamoto) writes:

>
>Maybe not the fastest method, but a simple one:
> 1) come up with an algorithm which counts the number of
> 1's in a given bit pattern. You'll either use a loop
> with shifts and increments in it, or you'll use masks.
>[...]

There is a neat trick for counting the number of bits in a word that a Microsoft geek showed me one time. If there are k bits with 1's, it runs in O(k) time.

I don't remember the algorithm, but it involved subtracting 1 and could be written in one line of C. (Yeah, I know... You could write anything, even Microsoft Word, in one very long line of C code. Come to think of it, that's probably what they did.) The subtraction was used to create instant masks: e.g. 0x0100 - 1 = 0x00FF. It went from the highest 1 bit to the lowest, magically avoiding all of the 0's.

That's all I remember. Anyone remember this algorithm?


From: bwilliam@iat.holonet.net (Bill Williams)
Subject: Re: How do you draw a circle/oval?

Re: Request for filled oval graphics primitive.....

/*=********************************************************************=*/ /*=********************************************************************=*/ #define INTEGER_32 long int static void Draw_Filled_Ellipse_Primitive(INTEGER_32 start_X , INTEGER_32 start_Y ,INTEGER_32 axis_A, INTEGER_32 axis_B , PixPatHandle current_Pattern) { register INTEGER_32 active_X, active_Y; register INTEGER_32 axis_A_Squared, axis_B_Squared, axis_A_Squared_Times_2; register INTEGER_32 axis_B_Squared_Times_2, derivitive_X, derivitive_Y, dd; /* END VAR */ /*=--------------------------------------------------------------------=*/ /* Uses dual differential high efficiency technique */ /*=--------------------------------------------------------------------=*/ active_X = axis_A; active_Y = 0; axis_A_Squared = axis_A*axis_A; axis_B_Squared = axis_B*axis_B; axis_A_Squared_Times_2 = axis_A_Squared + axis_A_Squared; axis_B_Squared_Times_2 = axis_B_Squared + axis_B_Squared; derivitive_X = axis_B_Squared_Times_2*axis_A; derivitive_Y = 0L; dd = (axis_B_Squared / 4L) - (axis_B_Squared * axis_A) + axis_A_Squared; /*=--------------------------------------------------------------------=*/ /* Step along Y axis */ /*=--------------------------------------------------------------------=*/ while ( derivitive_X > derivitive_Y ) { Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X) , (start_Y - active_Y) , (active_X << 1) +1); Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X) ,(start_Y + active_Y) , (active_X << 1) +1); active_Y +=1; derivitive_Y += axis_A_Squared_Times_2; if (dd <= 0L ) dd += (derivitive_Y + axis_A_Squared); else { derivitive_X -= axis_B_Squared_Times_2; active_X -= 1L; dd += (derivitive_Y + axis_A_Squared - derivitive_X ); } } /* while ( derivitive_X > derivitive_Y ) */ /*=--------------------------------------------------------------------=*/ /* Step along X axis */ /*=--------------------------------------------------------------------=*/ dd += ( ( ((3L *(axis_B_Squared-axis_A_Squared)) / 2L) - ( derivitive_X + derivitive_Y ) ) / 2L); while ( active_X > 0L ) { Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X),(start_Y - active_Y), (active_X << 1) +1); Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X),(start_Y + active_Y), (active_X << 1) +1); active_X -=1; derivitive_X -= axis_B_Squared_Times_2; if (dd > 0L ) dd += (axis_B_Squared - derivitive_X); else { derivitive_Y += axis_A_Squared_Times_2; active_Y += 1L; dd += (derivitive_Y + axis_B_Squared - derivitive_X ); } } /* while ( active_X > 0L ) */ while (active_Y <= axis_B) { Draw_Patterned_Eastward_Line_Subprimitive(start_X ,(start_Y + active_Y) , 1L); /* actually only one pixel */ Draw_Patterned_Eastward_Line_Subprimitive(start_X ,(start_Y - active_Y) , 1L); /* actually only one pixel */ active_Y += 1L; } } /* Draw_Filled_Ellipse_Primitive */ /*=********************************************************************=*/ /*=********************************************************************=*/

I Guarantee this puppy is FAAAAST. I designed it from various sources.

(Draw_Patterned_Eastward_Line_Subprimitive should be written as fast as< you can make it.... or just make it a LineTo.)

Enjoy.
Bill Williams


From: jmunkki@vipunen.hut.fi (Juri Munkki)
Subject: Re: 64-bit multiply and divide on 68000

Keywords: 68000 Assembler Math discrete

Reply-To: jmunkki@vipunen.hut.fi (Juri Munkki)

In article <By3BM7.InD@world.std.com> squeegee@world.std.com (Stephen C. Gilardi) writes: >I need to calculate a linear interpolation rapidly. The equation is
>
> y = x * numer / denom,
>
>where y, numer, and denom are 32-bit unsigned longs and x is a 16-bit
>unsigned short.
>
>On the 68020, I can use the built-in "long" multiply and divide
instructions.
>However on the 68000 it's more involved.
>
>I found 68000 code in a book to do the 64-bit unsigned multiply. I'd
>like to have the divide in assembly language as well.
>
>Does anyone have such a routine, or a reference to where I can find one?

I'm also interested in a good division routine for the 68K. Here's a routine that does essentially what you wanted to do and a bit more. It uses 32 bit fixed point numbers. I would very much like to see an improved division routine. I wasn't able to figure out anything better on my own and since my main target processor is the 68030, this version is not used all that often.

Since your x is a short, you will save quite a bit of time by using that knowledge to optimize the code.

Fixed	FMulDiv68000(a,b,c)
Fixed	a,b,c;
{
asm	{
		movem.l	D3-D5,-(sp)
		move.l	a,D0
		bpl.s	@positive_a
		neg.l	c
		neg.l	D0
@positive_a
		move.l	b,D1
		bpl.s	@positive_b
		neg.l	D1
		neg.l	c
@positive_b
		move.w	D1,D2
		mulu.w	D0,D2		;	D2 = Lo * Lo

		move.w	D1,D3
		swap	D0
		mulu.w	D0,D3		;	D3 = Hi * Lo
		
		swap	D1
		move.w	D1,D4
		mulu.w	D0,D4		;	D4 = Hi * Hi
		
		swap	D0
		mulu.w	D0,D1		;	D1 = Lo * Hi
		
		clr.l	D5
		move.w	D3,D5		
		swap	D5
		clr.w	D3
		swap	D3
		add.l	D5,D2		;	64 bit addition Hi*Hi:Lo*Lo += Hi*Lo
		addx.l	D3,D4
		
		clr.l	D5
		move.w	D1,D5
		swap	D5
		clr.w	D1
		swap	D1
		add.l	D5,D2		;	64 bit addition Hi*Hi:Lo*Lo+Hi*Lo += Lo*Hi
		addx.l	D1,D4		;	Result is now in D4:D2
		
		add.l	D2,D2
		addx.l	D4,D4
	
		move.l	c,D1
		bpl.s	@positive_c
		neg.l	D1
@positive_c
		moveq.l	#1,D0
		bra.s	@divloop
@divok	
		lsl.l	#1,D0
		bcs.s	@divdone
		add.l	D2,D2
		addx.l	D4,D4
@divloop
		sub.l	D1,D4
		bcc.s	@divok

		addx.l	D0,D0
		bcs.s	@divdone

		add.l	D1,D4
		add.l	D2,D2
		addx.l	D4,D4
		bra.s	@divloop

@divdone
		move.l	c,D1
		bpl.s	@positive_result
		
		addq.l	#1,D0
		bra.s	@done
@positive_result
		eor.l	#-1,D0
@done
		movem.l	(sp)+,D3-D5
	}
}

--

Juri Munkki Windsurf: fast sailing

jmunkki@hut.fi Macintosh: fast software


From: bwilliam@iat.holonet.net (Bill Williams)
Subject: Re: looking for a REALLY RaNdOM "RANDOM"

Source:Pascal
Summary:Request for GOOD random number generator in Pascal:

I have lots of random number generator source, most are in "C" two are in 68000 assembly but one of them (My ultimate favorite) is actually in Pascal (The language you requested) here it is:

===========

UNIT Portable_Random_Numbers;
INTERFACE
 
	FUNCTION Fetch_Random_Numbers: real;
 
	PROCEDURE Manually_Initialize_Seeds (value_1, value_2: LONGINT);
 
	PROCEDURE Auto_Initialize_Seeds;
 
 
IMPLEMENTATION
 
	VAR
		Random_Number_Seed_1, Random_Number_Seed_2: LONGINT; {
signed 32-bit integers }
 
	FUNCTION Fetch_Random_Numbers: real;
 
		VAR
			zz, kk: longint;
 
	BEGIN
 
		kk := Random_Number_Seed_1 DIV 53668;
		Random_Number_Seed_1 := 40014 * (Random_Number_Seed_1 - kk
* 53668) - kk * 12211;
		IF Random_Number_Seed_1 < 0 THEN
			Random_Number_Seed_1 := Random_Number_Seed_1 + 2147483563;
 
		kk := Random_Number_Seed_2 DIV 52774;
		Random_Number_Seed_2 := 40692 * (Random_Number_Seed_2 - kk
* 52774) - kk * 3791;
		IF Random_Number_Seed_2 < 0 THEN
			Random_Number_Seed_2 := Random_Number_Seed_2 + 2147483399;
 
		zz := Random_Number_Seed_1 - Random_Number_Seed_2;
 
		IF zz < 1 THEN
			zz := zz + 2147483562;
 
		Fetch_Random_Numbers := zz * 4.65661E-10;
 
	END;
 
 
	PROCEDURE Manually_Initialize_Seeds (value_1, value_2: LONGINT);
 
{ This is for repeatable use of the Random number generator }
{ requirement is that  Random_Number_Seed_1 be in the range [1,
2_147_483_562] and  Random_Number_Seed_2 }
{ be in [1, 2_147_483_398] ( [ -2**31 + 85, 2**31 - 85 ]). }
 
	BEGIN
		Random_Number_Seed_1 := value_1;
		Random_Number_Seed_2 := value_2;
	END;
 
 
	PROCEDURE Auto_Initialize_Seeds;
 
{ This version of Initialize_Seeds is Macintosh dependent. The theoretical }
{ requirement is that  Random_Number_Seed_1 be in the range [1,
2_147_483_562] and  Random_Number_Seed_2 }
{ be in [1, 2_147_483_398] ( [ -2**31 + 85, 2**31 - 85 ]). Here, 
Random_Number_Seed_1 and }
{  Random_Number_Seed_2 are in [0, 32_737]. See reference [1] cited in the
comments of }
{ Fetch_Random_Numbers for more information on generating seeds. }
 
	BEGIN
		GetDateTime(randSeed);
		Random_Number_Seed_1 := abs(Random);
		Random_Number_Seed_2 := abs(Random);
	END;
 
 
 
END. { Portable_Random_Numbers }

-------------

It is a classic Multiplicative Linear Congruential Generator as known about for over 20 years and proven to be the best "shitty" safe and equally distributed, evenly repeating random technique.

{1. L'Ecuyer, Pierre, Efficient and Portable Combined Random Number } { Generators. Communications of the ACM, 31, 6 (June 1988), 742-749, 774 }

L'Ecuyer reports that a comprehensive battery of over 20 different tests were performed on the generator that involved billions of psuedo-random numbers and over 200 hours of VAX-11/780 CPU time. The empirical behavior of the generators is very satisfactory for this class of generator.

Hope that solves it!

Bill Williams


From: db21@cbnewsc.cb.att.com (david.beyerl)
Subject: Re: Easter was: Re: Algorithm for Day-Of-Week NEEDED

Summary: Here's where to locate an algorithm.

In article <DM.92Sep24220600@stekt2.oulu.fi>, dm@stekt2.oulu.fi (Hannu Helminen) writes:

Does anyone have working C-code to determine when it is Easter in a given year? It would be handy since I am currently writing a calendar utility in C. Pointers to such code would also be appreciated.
An algorithm for the date of Easter is presented on pages 155-156 in "The Art of Computer Programming, Fundamental Algorithms, Volume 1, by Knuth. Below is C-source I obtained from someone on the net.

Dave Beyerl

att!ihlpm!db21

---------------------------------------------------------------------

From att!world.std.com!geoff Thu Jan 09 12:31:28 0500 1992
Subject: Re: Easter Sunday algorythm

Here you go; enjoy.

# To unbundle, sh this file
echo easter.c 1>&2
sed 's/^X//' >easter.c <<'!'
X/*
X * easter - compute date of easter
X *	see The Art of Computer Programming, Vol. 1, pp. 155-6.
X */

X# include <stdio.h>
X# include <time.h>
X# include <sys/types.h>

X# define CENTURY 100
X# define LEAP 4
X# define WEEK 7
X# define MARCH_DAYS 31
X# define APRIL 3
X# define METONIC 19
X# define BASEYEAR 1900	/* add to localtime(t)->tm_year */
X# define GREGORY 1582	/* start of gregorian calendar */

main(argc, argv)
char ** argv;
X{
X	if (argc > 2)
X		fprintf(stderr, "usage: easter [year]\n");
X	else if (argc == 2)
X		easter(atoi(argv[1]));
X	else
X		easter(this_year());
X	exit(0);
X}

this_year()
X{
X	register struct tm * lp;
X	time_t t;
X	extern struct tm * localtime();
X	extern time_t time();

X	t = time(&t);
X	lp = localtime(&t);
X	if (lp->tm_mon > APRIL)
X		lp->tm_year++;
X	return(lp->tm_year + BASEYEAR);
X}

easter(year)
register int year;
X{
X	register int epact, fullmoon;
X	int golden, century, noleap;

X	if (year < GREGORY) {
X		printf("%d: before Gregory\n", year);
X		return;
X	}
X	golden = year % METONIC + 1;
X	century = year / CENTURY + 1;
X	noleap = 3 * century / LEAP - 12;
X	epact = (11 * golden + 20 +
X		(8 * century + 5) / 25 - 5	/* moon orbit correction */
X		- noleap) % 30;
X	if (epact < 0)
X		epact += 30;	/* make remainder positive */
X	if (epact == 25 && golden > 11 || epact == 24)
X		++epact;
X	fullmoon = 44 - epact;
X	if (fullmoon < 21)
X		fullmoon += 30;
X	fullmoon += WEEK - (
X		(int)(5L * year / LEAP) - noleap - 10
X		+ fullmoon) % WEEK;	/* advance to Sunday */
X	if (fullmoon > MARCH_DAYS)
X		printf("%d April", fullmoon - MARCH_DAYS);
X	else
X		printf("%d March", fullmoon);
X	printf(" %d\n", year);
X}
!


From: mxmora@unix.sri.com (Matthew Xavier Mora)
Subject: fastQSort.p (Pascal port of fastQsort.c)

Source:Pascal

I don't remember seeing this get posted so I'll post another copy. This is the Pascal version of Hadyn Huntley's fastQsort.c that was posted here long ago. Since I posted a request for sorting code in pascal, I thought I should post this for the net in case anyone else is interested.

It could use a better memswap if someone has one.

Matt


{------------------------------------------------}
{ fastQsort port by Matthew Xavier Mora based on }
{ Hadyn Huntley fastqsort.c                      }
{ posted to C.S.M.P. on 09-28-92                 }
{                                                }
{Researched and written by:                      }
{                                                }
{   Haydn Huntley                                }
{   Haydn's Consulting                           }
{   203 West Stone                               }
{   Fairfield, Iowa  52556                       }
{ (515) 472-7025                                 }
{                                                }    
{ During the school year, I may be reached at the}
{ following address:                             }
{                                                }  
{   Haydn Huntley                                }
{   Eigenmann Hall #289                          }
{   Indiana University                           }
{   Bloomington, IN  47406                       }
{   (812) 857-8620                               }
{                                                }
{ E-mail:  huntley@copper.ucs.indiana.edu        }
{------------------------------------------------}
unit fastqsort;

interface

  procedure FastQSort (base: Ptr; numberofRecords: longint; sizeofRecord:
longint);

implementation


  procedure FastQSort (base: Ptr; numberofRecords: longint; sizeofRecord:
longint);
    var
      gSize: longint;
      tmp, LowerLimit, UpperLimit: ptr;

    procedure memSwap (p, q: ptr; qSize: longint);

    begin
      BlockMove(p, tmp, qSize);
      BlockMove(q, p, qSize);
      BlockMove(tmp, q, qSize);
    end;

    function GetSize (var n: longint; last, first: ptr): integer;
    begin
      n := (ord(last) - Ord(first)) div gsize;
      GetSize := n;
    end;

    function cmpFn (aPtr, bPtr: ptr): integer;
    begin
      cmpFn := IUMagString(ptr(ord(aPtr) + 1), ptr(ord(bPtr) + 1), gsize,
gsize);
    end;

{ ------------- begin doFastQSort ------------------------ }
    procedure doFastQSort (first, last: ptr);
      var
        i, j: Ptr;
        n: longint;
    begin
  { first, last, i, and j are scaled by gSize in this function }

      while ((getsize(n, last, first)) > 1) do
        begin

      { use a random pivot }
          memSwap(ptr(ord(first) + (abs(random) mod n) * gSize), first, gSize);

          i := first;
          j := last;

          while (true) do
            begin
              i := Ptr(ord(i) + gSize);
              while (ord(i) < ord(last)) and (CmpFn(i, first) < 0) do
                i := Ptr(ord(i) + gSize);

              j := Ptr(ord(j) - gSize);
              while (ord(j) > ord(first)) and (CmpFn(j, first) > 0) do
                j := Ptr(ord(j) - gSize);

              if (ord(i) >= ord(j)) then
                leave;

              memSwap(i, j, gSize);
            end;

          if (j = first) then
            begin
              first := ptr(ord(first) + gSize);
              cycle;
            end;

          memSwap(first, j, gSize);
          if (ord(j) - ord(first)) < (ord(last) - (ord(j) + gSize)) then
            begin
              doFastQSort(first, j);
              first := ptr(ord(j) + gSize);
      { equivalent to doFastQSort (j + gSize, last);    }
      { to save stack space               }
            end
          else
            begin
              doFastQSort(ptr(ord(j) + gSize), last);
              last := j;
      { equivalent to doFastQSort (first, j);     }
      { to save stack space               }
            end;
        end;
    end;
{ ------------- begin FastQsort ------------------------ }
  begin
  { setup the global variables used by fastQSort() }

    gSize := sizeofRecord;
    randseed := TickCount;
    tmp := NewPtr(gsize);             {temp ptr for swapping}
  if tmp <> nil then 
    begin 
       { the args for fastQSort () must be scaled by gSize }
        doFastQSort(base, ptr(ord(base) + (numberofRecords) * gSize));
        DisposePtr(tmp);
   end;
  end;
end.


From: steve@taumet.com (Steve Clamage)
Subject: Re: Algorithm for Day-Of-Week NEEDED

barrett@lucy.ee.und.ac.za (Alan P Barrett) writes:

>

In article <2652@bsu-cs.bsu.edu>,
>pickens@bsu-cs.bsu.edu (David B. Pickens) writes:
>> index = ((13 * month - 1) / 5) + day + (year % 100) +
>> ((year % 100) / 4) + ((year / 100 / 4) - 2 *
>> (year / 100) + 77;
>> index = index - 7 * (index / 7);
>Why not index = index % 7 ?
>I see that you try to avoid the mistake that I have seen in many
>implementations of Zeller's Congruence in C. If the +77 were not used
>then the initial index calculation could have a negative result, in which
>case the / and % operators would have an implementation defined meaning.
>With the +77, you are safe until the year 4599 (unless they change the
>rules before then), but it breaks on 1 Mar 4600 (month=1, day=1,
year=4600,
>initial index = 2 + 1 + 0 + 0 + 11 - 92 + 77 = -1).

Here is a C version of Zeller's Congruence which doesn't have that problem:

The day of the week 'W' (0 = Sunday) is given by

W = ((13*M - 1) / 5 + D + Y + Y/4 + C/4 - (2*C)%7 + 7) % 7;

where

C is the century (year/100)

Y is the last two digits of the year (year%100)

D is the day of the month

M is a special month number, where Jan and Feb are taken as

month 11 and 12 of the previous year (subtract 1 from the year)

and each division is truncating division, and cannot be combined.

The properties of modulo arithmetic are such that we may apply it individually to each term of a sum. To guarantee that our sum is positive, we can find (2*C)%7, which is subtracted instead of 2*C, and add an additional 7 before the final %7.


From: fry@tara.harvard.edu (David Fry)
Subject: Re: Better random number than Random()?

In article <1992Sep22.114218.14429@sunfs3.Camex.COM> kent@sunfs3.Camex.COM (Kent Borg) writes:

>In article <1992Sep19.012039.15704@husc3.harvard.edu> fry@tara.harvard.edu (David Fry) writes:

>>In article <1992Sep18.105410.24438@midway.uchicago.edu> hd12@midway.uchicago.edu writes:

>[About getting good random numbers, that Random() didn't seem great.]

>

>>Why do you think it isn't uniform? Are you doing anything strange

>>with the results of Random(), like using a "mod" to get a smaller

>>random number? This decreases the quality of your random numbers,

>>although not as much with Random() as with some other RNG's.

>

>So the obvious next question is how *should* one beat too-big random

>numbers down to the range you want? Should one simply reject

>out-of-range numbers and ask again?

To be technically correct, you should linearly normalize them to the range you want. For instance, if your RNG gives you numbers r between a and b, and you want numbers between x and y, compute

  ( (float)(y-b) )/( (float)(x-a) ) * ( (float)(r-b) ) + (float)y.

For practical purposes this may be too slow, so for many applications and with some RNG's (and the Mac one is pretty good), you can use MOD after all. But it also depends on how large the range you're MOD-ing the numbers into is. For high-brow computer simulations attempting to predict the effects of global warming, I'd use the above formula.

BTW, depending on the type of x and y, you may need to alter the above formula slightly to make sure that your new numbers fit between x and y inclusively or exclusively or however you want it. But be careful to not send too many r's to one end of your range by accident.


From: mjn@pseudo.uucp (Murray Nesbitt)
Subject: Re: Algorithm for Day-Of-Week NEEDED

In-Reply-To: rmh@taligent.com's message of Sat, 19 Sep 1992 01:02:46 GMT

rmh@taligent.com (Rick Holzgrafe) writes:

>

In article <1992Sep18.151922.753@Cigna.COM>, nagel@Cigna.COM (Mark Nagel)
>writes:
>>
>> Here's somthing from my "this-might-be-useful" box:
>>
[Problematic code deleted].
>
>I don't think this is correct... First, running it by hand on today's date
>(Sept. 18, 1992) I get 0 which I presume is "Sunday" - but today is Friday.
>
>I think the leap year calculation is wrong. A year is a leap year if it is
>evenly divisible by 4, but not by 100, unless by 400. (I can never phrase that
>well - but 1900 was not a leap year; 2000 will be; 2100 won't be.)
>
>Even with that, it would be wrong for antique dates. In September of 1752,
>England and its colonies converted from the Julian to the Gregorian calendar.

The following date functions may be helpful. daynumber() is the day

[1-365] within a year, and dayofweek is the day [0-6] within a week.

#define LEAPYEAR(Y) (( !((Y)%4) && ((Y)%100)) || !((Y)%400) )

int dayofyear( int month, int day, int year )
{
int d;

    d = ( ( month - 1 ) * 31 ) + day;
    if( month > 2 ) 
    {
        d -= ( ( 4 * month ) + 23 ) / 10;
        if( LEAPYEAR( year ) ) d++;
    }
    return d;
}

int dayofweek( int month, int day, int year )
{
int weekday;

    weekday = dayofyear( month, day, year );
    weekday--;

    weekday += year + 4 + ( ( year + 3 ) / 4 ); /* Julian Calendar      */
    if ( year > 1800 ) 
    {
        weekday -= ( ( year - 1701 ) / 100 );   /* Clavian correction   */
        weekday += ( ( year - 1601 ) / 400 );   /* Gregorian correction */
    }
    if( year > 1752 ) weekday += 3;             /* Adjust for Gregorian */
    return( weekday % 7 );
}

I've directed Followup-To: comp.programming

--

Murray


From: Bruce.Hoult@bbs.actrix.gen.nz
Subject: Re: Better random number than Random()?

In article <1992Sep18.105410.24438@midway.uchicago.edu> hd12@midway.uchicago.edu writes:

> (I did check FAQ before I post this.)

>

> I need a fast (speed is very important) random number generator. Right now

> I am using Random(), but it seems to me the distribution is not uniform

> at all. Is there a better generator? Will it be faster to use my own

> function than to use toolbox routine?

> Thanks for the help.

The toolbox Random() functions is perfectly OK if you use the *seed* as your random value and ignore the function result. OTOH, it's not exactly fast because of the Trap call overhead.

I've included my own random number generator, which requires a 68020 or better.

I've also included my attempt at a random generator that produces "normally" distributed numbers. It's based on the Central Limits Theorem that says the sum of several independant random variables is approximately normally distributed, no matter what distribution the individual variables have. I found that for my purposes the sum of ten uniformally distributed variables was good enough.

The only mildly clever thing about it is that I do a little parallel computing by working out the random numbers in the CPU and then adding them (to get a greater then 32 bit result) in the coprocessor.

Comment is invited on whether this is a good way to generate a normal distribution, or not.

-- Bruce

---------------

procedure MyRandom(var seed:longint);

inline

$225F, {movea.l (a7)+,a1}

$2011, {move.l (a1),d0}

$4C3C, $0401, $0000, $41A7, {mulu.l #41a7,d1:d0}

$4C7C, $0401, $7FFF, $FFFF, {divu.l #7fffffff,d1:d0}

$2281; {move.l d1,(a1)}

procedure MyNormal(

numToAdd:integer;

var seed:longint;

var result:extended

); external;

------------------------------------------------------

machine mc68020

mc68881

MyNormal proc export

link a6,#0

fmove.w #0,fp0

move.l 8(a6),a1 ;pointer to result

move.l 12(a6),a0 ;pointer to seed

move.l (a0),d0

move.w 16(a6),d2 ;number of loops

bra.b loopend

loop mulu.l #16807,d1:d0 ;7^5

divu.l #$7fffffff,d1:d0

fadd.l d1,fp0

move.l d1,d0

loopend dbra d2,loop

move.l d0,(a0) ;update the seed

fmove.x fp0,(a1) ;return the result

unlk a6

move.l (a7)+,a0

add.l #10,a7

jmp (a0)

endproc

end


From: pickens@bsu-cs.bsu.edu (David B. Pickens)
Subject: Re: Algorithm for Day-Of-Week NEEDED

I don't know if this will help, but in one of my programming classes we had to develop an algorithm to do this and I looked up on one of my backup diskettes. This should be ANSI compatible, but I was using Borland C/C++.

Here 'tis:

/*

Function: datename()

Description: This function returns the day of the week for a date.

It is based upon Zeller's Congruence.

Args: int month the month

int day the day

int year the year

Return: char * a pointer to the day of the week

*/

char *datename(int day, int month, int year)

{

static char *days[] = {"SUN","MON","TUE","WED","THU",

"FRI","SAT"};

int index;

if (year < 100) /* assume 1990's if century missing */

year += 1900;

if (month > 2) { /* begin with february */

month -= 2;

}

else {

month += 10;

year--;

}

index = ((13 * month - 1) / 5) + day + (year % 100) +

((year % 100) / 4) + ((year / 100 / 4) - 2 *

(year / 100) + 77;

index = index - 7 * (index / 7);

return days[index];

}

I hope this works, I know I received a decent grade on it.

Dave


From: sakamoto@sm.sony.co.jp (Tomohiko Sakamoto)
Subject: Re: Algorithm for Day-Of-Week NEEDED

Reply-To: sakamoto@sm.sony.co.jp

In article <1992Sep18.214003.13614@boingo.med.jhu.edu>,

jkeller@ishtar.med.jhu.edu (Jean Keller x5-8318) says:

> I think there is an error in your leap-year calculation - if year%400 == 0, it is not a leap year.

You are wrong. If year%400 == 0, it is a leap year. The following definition will help you.

#define isleap(y) ((y) % 4 == 0 && (y) % 100 != 0 || (y) % 400 == 0)


From: nagel@Cigna.COM (Mark Nagel)
Subject: Re: Algorithm for Day-Of-Week NEEDED

Keywords: day of week algorithm

In <lbhvboINNgui@paducah.cs.utexas.edu> stevem@cs.utexas.edu (Steve Anthony Mariotti) writes:

>I am currently in need of an algorithm that will return the day-of-week when

>passed the year, month, and day. I used to have an algorithm to do this, but

>my hard drive crash did away with it but good.

Here's somthing from my "this-might-be-useful" box:

/*
 * Find and return day of week (0-6)
 *
 * From: moorer@jacobs.CS.ORST.EDU (Rocky Moore)
 */

int day_of_week(int year, int month, int day)
{
     int offsets[13] = { 0,0,3,3,6,1,4,6,2,5,7,3,5 };
     int dw;

     dw=6+year+((year+3)/4)+offsets[month]+day;
     if( ((year%4) ==0) && (month > 2)) dw++;
     if( (year==0) && (month < 3)) dw++;
     dw=(dw%7);
     return(dw);
}


From: Bruce.Hoult@bbs.actrix.gen.nz
Subject: Re: Looking for fast sorting code for pascal

Source:Pascal

In article <Buq1CI.Mnz@world.std.com> aep@world.std.com (Andrew E Page) writes:
>
> As far as the heap collisions are concerned.... The QSort algorithms
> that I've seen in the past depend on recursion. This leads to the
> posibility that your stack is somehow slamming into your heap,
> leading to a Sys Err ID=28.

Recursion is, of course, necessary in QuickSort, but you can easily guarentee that it will go no deeper than log2(n) -- i.e. only 20 levels when sorting a million elements.

Here's my tried and tested Pascal sorting routine (written on the Mac in 1987). I've found it reliable and fast. It tries to be reasonably clever about avoiding bad partitioning, using the minimum of recursion and using the best algorithm for the amount of data to be sorted.

-- Bruce

-------------

procedure sort(
   left, right:longint;
   function less_than(a,b:longint):boolean;
   procedure swap(a,b:longint)
);
var
   i, j, mid, median, small_pos: longint;
begin
   while (right-left) >= 15 do begin
      mid := (left+right) div 2;
      
      {find the Median of left, mid, right}
      if less_than(left,mid) then begin
         if less_than(mid,right) then median := mid
         else if less_than(left,right) then median := right
         else median := left
      end else begin
         if less_than(left,right) then median := left
         else if less_than(mid,right) then median := right
         else median := mid;
      end; {finding the median}
      
      {
         partition the region into three:
         
         left <= x <= j:  those smaller than median
         j < x < i:       those equal to median
         i <= x <= right: those greater than median
      }
      i := left;
      j := right;
      while i <= j do begin
         while (i<right) & less_than(i,median) do i := i + 1;
         while (j>left) & less_than(median,j) do j := j - 1;
         if i <= j then begin
            if i < j then begin
               {watch for median getting moved under us!}
               if median = i      then median := j
               else if median = j then median := i;
               swap(i,j);
            end;
            {no need to look at these two again}
            i := i + 1;
            j := j - 1;
         end;
      end;
      {skip over any items equal to the guess of the median}
      while (i<right) & not less_than(median,i) do i := i + 1;
      while (j>left)  & not less_than(j,median) do j := j - 1;
      
      {now sort the two halves}
      if (j-left) < (right-i) then begin
         {the left half is smaller -- sort it recursively first}
         sort(left,j,less_than,swap);
         left := i; {prepare for next iteration}
      end else begin
         {the right half is smaller -- sort it recursively first}
         sort(i,right,less_than,swap);
         right := j; {prepare for next iteration}
      end;
   end; {while more than xxx elements to sort}
   
   {now selection sort any remaining elements}
   for i := left to right-1 do begin
      small_pos := i;
      for j := i+1 to right do
         if less_than(j,small_pos) then small_pos := j;
      if small_pos <> i then swap(i,small_pos);
   end;
   
end; {sort}