From:jnhall@sat.mot.com (Joseph Hall)
Subject: Re: long integer square root
In article <rayCA61B2.6nA@netcom.com writeth:
estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes ... Does somebody has a code or algorithm for extracting a long integer square root and returning a integer. I suggest looking at Newton's iteration in any decent CS book on the subject. With a sufficiently good first guess you can do it in about 12 instructions on a 68020+. And yes, I have done it, and no, you can't have it. It belong to the company I work for.
I pulled this one from my personal inventory. I hereby assign it to the public domain. Enjoy.
/*
* ISqrt-- * Find square root of n. This is a Newton's method approximation, * converging in 1 iteration per digit or so. Finds floor(sqrt(n) + 0.5). */
int ISqrt(register int n) { register int i; register long k0, k1, nn; for (nn = i = n, k0 = 2; i > 0; i >>= 2, k0 <<= 1) ; nn <<= 2; for (;;) { k1 = (nn / k0 + k0) >> 1; if (((k0 ^ k1) & ~1) == 0) break; k0 = k1; } return (int) ((k1 + 1) >> 1); }
From: jimc@tau-ceti.isc-br.com (Jim Cathey)
Subject: Re: long integer square root
In article <220o6u$cn@email.tuwien.ac.at> estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes:
>Does somebody have code or an algorithm for extracting a long integer
>square root and returning an integer?
There was a long series of letters in Dr. Dobbs' Journal a few years back that I was a part of. Here are two 'competing' subroutines for the 68000 that I wrote. One is a Newton-Raphson iterator, the other a hybrid of three different subroutines using two different methods, heavily optimized for speed. The non-Newton one is faster in some cases and slower in others. Choosing one as 'best' depends on what kind of input ranges you expect to see most often. Newton's time doesn't vary much on the average based on the input argument. The non-Newton one ranges from a lot better (small arguments) to a little worse (large arguments), but is always better than Newton's worst case. Don't neglect the fact that on a FPU-equipped machine it may be faster to use floating point. I've done no research into this possibility, nor have I timed this stuff on any 68020+ system. The speed balance is no doubt different then. (For that matter, the 68000 testbed had no wait states nor interrupt system overhead, which doesn't necessarily apply to a 68000 PowerBook, and certainly doesn't apply to a Mac+ or earlier.)
I'm personally fond of the non-Newton version, because the algorithm only uses shifts and adds, so it could be implemented in microcode with about same speed as a divide.
*************************************************************************
* *
* Integer Square Root (32 to 16 bit). *
* *
* (Newton-Raphson method). *
* *
* Call with: *
* D0.L = Unsigned number. *
* *
* Returns: *
* D0.L = SQRT(D0.L) *
* *
* Notes: Result fits in D0.W, but is valid in longword. *
* Takes from 338 cycles (1 shift, 1 division) to *
* 1580 cycles (16 shifts, 4 divisions) (including rts). *
* Averages 854 cycles measured over first 65535 roots. *
* Averages 992 cycles measured over first 500000 roots. *
* *
*************************************************************************
.globl lsqrt
* Cycles
lsqrt movem.l d1-d2,-(sp) (24)
move.l d0,d1 (4) ; Set up for guessing algorithm.
beq.s return (10/8) ; Don't process zero.
moveq #1,d2 (4)
guess cmp.l d2,d1 (6) ; Get a guess that is guaranteed to be
bls.s newton (10/8) ; too high, but not by much, by dividing the
add.l d2,d2 (8) ; argument by two and multiplying a 1 by 2
lsr.l #1,d1 (10) ; until the power of two passes the modified
bra.s guess (10) ; argument, then average these two numbers.
newton add.l d1,d2 (8) ; Average the two guesses.
lsr.l #1,d2 (10)
move.l d0,d1 (4) ; Generate the next approximation(s)
divu d2,d1 (140) ; via the Newton-Raphson method.
bvs.s done (10/8) ; Handle out-of-range input (cheats!)
cmp.w d1,d2 (4) ; Have we converged?
bls.s done (10/8)
swap d1 (4) ; No, kill the remainder so the
clr.w d1 (4) ; next average comes out right.
swap d1 (4)
bra.s newton (10)
done clr.w d0 (4) ; Return a word answer in a longword.
swap d0 (4)
move.w d2,d0 (4)
return movem.l (sp)+,d1-d2 (28)
rts (16)
end
*************************************************************************
* *
* Integer Square Root (32 to 16 bit). *
* *
* (Exact method, not approximate). *
* *
* Call with: *
* D0.L = Unsigned number. *
* *
* Returns: *
* D0.L = SQRT(D0.L) *
* *
* Notes: Result fits in D0.W, but is valid in longword. *
* Takes from 122 to 1272 cycles (including rts). *
* Averages 610 cycles measured over first 65535 roots. *
* Averages 1104 cycles measured over first 500000 roots. *
* *
*************************************************************************
.globl lsqrt
* Cycles
lsqrt tst.l d0 (4) ; Skip doing zero.
beq.s done (10/8)
cmp.l #$10000,d0 (14) ; If is a longword, use the long routine.
bhs.s glsqrt (10/8)
cmp.w #625,d0 (8) ; Would the short word routine be quicker?
bhi.s gsqrt (10/8) ; No, use general purpose word routine.
* ; Otherwise fall into special routine.
*
* For speed, we use three exit points.
* This is cheesy, but this is a speed-optimized subroutine!
page
*************************************************************************
* *
* Faster Integer Square Root (16 to 8 bit). For small arguments. *
* *
* (Exact method, not approximate). *
* *
* Call with: *
* D0.W = Unsigned number. *
* *
* Returns: *
* D0.W = SQRT(D0.W) *
* *
* Notes: Result fits in D0.B, but is valid in word. *
* Takes from 72 (d0=1) to 504 (d0=625) cycles *
* (including rts). *
* *
* Algorithm supplied by Motorola. *
* *
*************************************************************************
* Use the theorem that a perfect square is the sum of the first
* sqrt(arg) number of odd integers.
* Cycles
move.w d1,-(sp) (8)
move.w #-1,d1 (8)
qsqrt1 addq.w #2,d1 (4)
sub.w d1,d0 (4)
bpl qsqrt1 (10/8)
asr.w #1,d1 (8)
move.w d1,d0 (4)
move.w (sp)+,d1 (12)
done rts (16)
page
*************************************************************************
* *
* Integer Square Root (16 to 8 bit). *
* *
* (Exact method, not approximate). *
* *
* Call with: *
* D0.W = Unsigned number. *
* *
* Returns: *
* D0.L = SQRT(D0.W) *
* *
* Uses: D1-D4 as temporaries -- *
* D1 = Error term; *
* D2 = Running estimate; *
* D3 = High bracket; *
* D4 = Loop counter. *
* *
* Notes: Result fits in D0.B, but is valid in word. *
* *
* Takes from 544 to 624 cycles (including rts). *
* *
* Instruction times for branch-type instructions *
* listed as (X/Y) are for (taken/not taken). *
* *
*************************************************************************
* Cycles
gsqrt movem.l d1-d4,-(sp) (40)
move.w #7,d4 (8) ; Loop count (bits-1 of result).
clr.w d1 (4) ; Error term in D1.
clr.w d2 (4)
sqrt1 add.w d0,d0 (4) ; Get 2 leading bits a time and add
addx.w d1,d1 (4) ; into Error term for interpolation.
add.w d0,d0 (4) ; (Classical method, easy in binary).
addx.w d1,d1 (4)
add.w d2,d2 (4) ; Running estimate * 2.
move.w d2,d3 (4)
add.w d3,d3 (4)
cmp.w d3,d1 (4)
bls.s sqrt2 (10/8) ; New Error term > 2* Running estimate?
addq.w #1,d2 (4) ; Yes, we want a '1' bit then.
addq.w #1,d3 (4) ; Fix up new Error term.
sub.w d3,d1 (4)
sqrt2 dbra d4,sqrt1 (10/14) ; Do all 8 bit-pairs.
move.w d2,d0 (4)
movem.l (sp)+,d1-d4 (44)
rts (16)
page
*************************************************************************
* *
* Integer Square Root (32 to 16 bit). *
* *
* (Exact method, not approximate). *
* *
* Call with: *
* D0.L = Unsigned number. *
* *
* Returns: *
* D0.L = SQRT(D0.L) *
* *
* Uses: D1-D4 as temporaries -- *
* D1 = Error term; *
* D2 = Running estimate; *
* D3 = High bracket; *
* D4 = Loop counter. *
* *
* Notes: Result fits in D0.W, but is valid in longword. *
* *
* Takes from 1080 to 1236 cycles (including rts.) *
* *
* Two of the 16 passes are unrolled from the loop so that *
* quicker instructions may be used where there is no *
* danger of overflow (in the early passes). *
* *
* Instruction times for branch-type instructions *
* listed as (X/Y) are for (taken/not taken). *
* *
*************************************************************************
* Cycles
glsqrt movem.l d1-d4,-(sp) (40)
moveq #13,d4 (4) ; Loop count (bits-1 of result).
moveq #0,d1 (4) ; Error term in D1.
moveq #0,d2 (4)
lsqrt1 add.l d0,d0 (8) ; Get 2 leading bits a time and add
addx.w d1,d1 (4) ; into Error term for interpolation.
add.l d0,d0 (8) ; (Classical method, easy in binary).
addx.w d1,d1 (4)
add.w d2,d2 (4) ; Running estimate * 2.
move.w d2,d3 (4)
add.w d3,d3 (4)
cmp.w d3,d1 (4)
bls.s lsqrt2 (10/8) ; New Error term > 2* Running estimate?
addq.w #1,d2 (4) ; Yes, we want a '1' bit then.
addq.w #1,d3 (4) ; Fix up new Error term.
sub.w d3,d1 (4)
lsqrt2 dbra d4,lsqrt1 (10/14) ; Do first 14 bit-pairs.
add.l d0,d0 (8) ; Do 15-th bit-pair.
addx.w d1,d1 (4)
add.l d0,d0 (8)
addx.l d1,d1 (8)
add.w d2,d2 (4)
move.l d2,d3 (4)
add.w d3,d3 (4)
cmp.l d3,d1 (6)
bls.s lsqrt3 (10/8)
addq.w #1,d2 (4)
addq.w #1,d3 (4)
sub.l d3,d1 (8)
lsqrt3 add.l d0,d0 (8) ; Do 16-th bit-pair.
addx.l d1,d1 (8)
add.l d0,d0 (8)
addx.l d1,d1 (8)
add.w d2,d2 (4)
move.l d2,d3 (4)
add.l d3,d3 (8)
cmp.l d3,d1 (6)
bls.s lsqrt4 (10/8)
addq.w #1,d2 (4)
lsqrt4 move.w d2,d0 (4)
movem.l (sp)+,d1-d4 (44)
rts (16)
end
From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: long integer square root
jnhall@sat.mot.com (Joseph Hall) writes:
> In article <rayCA61B2.6nA@netcom.com> writeth:
estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes ...
>Does somebody has a code or algorithm for extracting a long integer
> square root and returning a integer.
I suggest looking at Newton's iteration in any decent CS book on the
subject. With a sufficiently good first guess you can do it in about
12 instructions on a 68020+.
And yes, I have done it, and no, you can't have it. It belong to the
company I work for.
I pulled this one from my personal inventory. I hereby assign it to the
public domain. Enjoy.
/*
* ISqrt-- * Find square root of n. This is a Newton's method approximation, * converging in 1 iteration per digit or so. Finds floor(sqrt(n) + 0.5). */ int ISqrt(register int n) { register int i; register long k0, k1, nn; for (nn = i = n, k0 = 2; i > 0; i >>= 2, k0 <<= 1) ; nn <<= 2; for (;;) { k1 = (nn / k0 + k0) >> 1; if (((k0 ^ k1) & ~1) == 0) break; k0 = k1; } return (int) ((k1 + 1) >> 1); }--
Here's an article I saved from c.s.m.p four months ago. Strangely, it was only distributed in North America. I don't know how fast it works in practice, but there are no multiplications or divisions in its inner loop, which is promising.
Subject: Re: Fixed Math sqrt, asin, acos
Here's an integer square root routine.
-----------------------------------------------------------------------
#include <math.h>
/*
* It requires more space to describe this implementation of the manual
* square root algorithm than it did to code it. The basic idea is that
* the square root is computed one bit at a time from the high end. Because
* the original number is 32 bits (unsigned), the root cannot exceed 16 bits
* in length, so we start with the 0x8000 bit.
*
* Let "x" be the value whose root we desire, "t" be the square root
* that we desire, and "s" be a bitmask. A simple way to compute
* the root is to set "s" to 0x8000, and loop doing the following:
*
* t = 0;
* s = 0x8000;
* do {
* if ((t + s) * (t + s) <= x)
* t += s;
* s >>= 1;
* while (s != 0);
*
* The primary disadvantage to this approach is the multiplication. To
* eliminate this, we begin simplying. First, we observe that
*
* (t + s) * (t + s) == (t * t) + (2 * t * s) + (s * s)
*
* Therefore, if we redefine "x" to be the original argument minus the
* current value of (t * t), we can determine if we should add "s" to
* the root if
*
* (2 * t * s) + (s * s) <= x
*
* If we define a new temporary "nr", we can express this as
*
* t = 0;
* s = 0x8000;
* do {
* nr = (2 * t * s) + (s * s);
* if (nr <= x) {
* x -= nr;
* t += s;
* }
* s >>= 1;
* while (s != 0);
*
* We can improve the performance of this by noting that "s" is always a
* power of two, so multiplication by "s" is just a shift. Also, because
* "s" changes in a predictable manner (shifted right after each iteration)
* we can precompute (0x8000 * t) and (0x8000 * 0x8000) and then adjust
* them by shifting after each step. First, we let "m" hold the value
* (s * s) and adjust it after each step by shifting right twice. We
* also introduce "r" to hold (2 * t * s) and adjust it after each step
* by shifting right once. When we update "t" we must also update "r",
* and we do so by noting that (2 * (old_t + s) * s) is the same as
* (2 * old_t * s) + (2 * s * s). Noting that (s * s) is "m" and that
* (r + 2 * m) == ((r + m) + m) == (nr + m):
*
* t = 0;
* s = 0x8000;
* m = 0x40000000;
* r = 0;
* do {
* nr = r + m;
* if (nr <= x) {
* x -= nr;
* t += s;
* r = nr + m;
* }
* s >>= 1;
* r >>= 1;
* m >>= 2;
* } while (s != 0);
*
* Finally, we note that, if we were using fractional arithmetic, after
* 16 iterations "s" would be a binary 0.5, so the value of "r" when
* the loop terminates is (2 * t * 0.5) or "t". Because the values in
* "t" and "r" are identical after the loop terminates, and because we
* do not otherwise use "t" explicitly within the loop, we can omit it.
* When we do so, there is no need for "s" except to terminate the loop,
* but we observe that "m" will become zero at the same time as "s",
* so we can use it instead.
*
* The result we have at this point is the floor of the square root. If
* we want to round to the nearest integer, we need to consider whether
* the remainder in "x" is greater than or equal to the difference
* between ((r + 0.5) * (r + 0.5)) and (r * r). Noting that the former
* quantity is (r * r + r + 0.25), we want to check if the remainder is
* greater than or equal to (r + 0.25). Because we are dealing with
* integers, we can't have equality, so we round up if "x" is strictly
* greater than "r":
*
* if (x > r)
* r++;
*/
unsigned int isqrt(unsigned long x)
{
unsigned long r, nr, m;
r = 0;
m = 0x40000000;
do {
nr = r + m;
if (nr <= x) {
x -= nr;
r = nr + m;
}
r >>= 1;
m >>= 2;
} while (m != 0);
if (x > r)
r++;
return(r);
}
>
--
Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy
From: kourakos@cardinal.ncsc.org (Alexander W. Kourakos)
Subject: Re: Area of a traingle formula wanted (not the easy one)
Area = 0.5 * (x1 * y2 + x2 * y3 + x3 * y1 - y1 * x2 - y2 * x3 - y2 * x1)
(in standard orthagonal coordinates)
This comes out of a formula for finding the area of any polygon, given the points.
awk
From: cpruett@holonet.net
Subject: Re: Area of a traingle
Originator: cpruett@zen.holonet.net
In a message dated 07-05-93 22:34 jeremy@belsize.demon.co.u wrote to comp.sys.mac.programmer:
Here!s a easy one (but read the WHOLE question before flaming): Anyone have a general purpose formula to calc the area of a triangle - anyone who says 0.5*base length*perp height gets no prize, but anyone who can give me a formula using only the 3 co-ords of the end points (say, (x1,y1),(x2,y2),(x3,y3)) does get a big smile. Ideally, the value returned can also be -ve for a triangle with the end points in a different order (see what I mean ?) Thanks, Jeremy Doig
Here's what I came up with:
2
Sqrt[(x2 y1 - x3 y1 - x1 y2 + x3 y2 + x1 y3 - x2 y3) ]
------------------------------------------------------
2
I tried it out on a couple of test-triangles and it seems to work but you will want to verify it for yourself. I used Heron's formula: A = Sqrt[s(s-a)(s-b)(s-c)] where s = 0.5(a + b + c) and a,b, & c are the lengths of the three sides.
Chris Pruett
Subject: Re: Point in Polygon routine needed
From: fry@zariski.harvard.edu (David Fry)
In article <8fxkxAO00WB7M=nOFO@andrew.cmu.edu> Andrew Lewis Tepper <at15+@andrew.cmu.edu> writes:
>I don't know if this routine is "standard", I just came up with it recently:
>
>For a polygon of points p1...pn, and a point P, make a table as follows:
>
>T(1)= angle from p1 to P to p2
>T(2)= angle from p2 to P to p3
>...
>T(n)= angle from pn to P to p1
>
>express all angles as: -PI < angle < PI.
>
>Add all entries in the table. If the sum = 0, the point is outside. If
>the sum is +/- PI, the point is inside. If the point is +/- xPI, you
>have a strange polygon. If ANY angle was = +/-PI, the point is on the
>border.
>
>Does anyone know if this is considered a good (known?) algorithm? It
>took a long time to come up with!
You have the essence of the "good," standard algorithm, but if you're working with thousands of polygons, those floating point calculations will be too slow and will probably lead to round-off error at some point. Round-off error is a huge pain in computational geometry problems. The hard part is finding ways to avoid it.
The trick is that you don't need to know the angle between the lines, you just need to know its sign, called the *circulation*. For every line AB in the polygon, form the triangle PAB and you want to know does it lie inside the polygon or not. That is, if you were riding a bike counterclockwise around the boundary of the polygon, would P lie on your left or your right when you crossed AB? You can find this by calculating PA x PB = (A.x-P.x)*(B.y-P.y) - (A.y-P.y)*(B.x-P.x). The circulation (the sign of PA x PB) can be found without doing a multiplication in many cases, but this is good enough. Remember, we've oriented the polygon so that you come to A before coming to B as your traverse its boundary.
Then the algorithm is:
In = 0
foreach edge AB in the polygonif A.y < P.y and B.y >= P.y and PA x PB > 0
In = In + 1
if A.y >= P.y and B.y < P.y and PA x PB < 0
In = In - 1
Then if In = 1, P is inside the polygon, and is outside if I = 0. This doesn't handle the case where P lies on some edge AB of the polygon.
David Fry
From: fixer@faxcsl.dcrt.nih.gov (Chris Mr. Tangerine Man Tate)
Subject: Re: card shuffling algorithm
Reply-To: fixer@faxcsl.dcrt.nih.gov
Here's a little snippet to shuffle a deck of N cards in Theta(N) time (written in C for the Macintosh):
void Shuffle(short deck[N])
{
short i, j, temp;
for (i = N; i > 1; i--)
{
j = abs(Random()) % i; // random # from 0 to i-1
temp = deck[j];
deck[j] = deck[i - 1];
deck[i - 1] = temp; // swap cards at positions i and j
}
}
Explanation: working backwards from the "top" of the deck, swap each card with a random card from the as-yet-unvisited portion of the deck. Note that since it's possible to "swap" a card with itself, this algorithm ensures that you can generate the null shuffle, i.e. all cards still in their starting order.
If you don't need to "deal out" all the cards at once (e.g. you're shuffling for a poker game rather than for bridge), you can simply do the random swap as each card is needed, and distribute the shuffling time across the dealing.
From: cklarson@flauta.engr.ucdavis.edu (Christopher Klaus Larson)
Subject: Re: Reversing the bytes in a long...
In article <absurd-150293001044@seuss.apple.com> absurd@apple.apple.com (Tim Dierks) writes:
>In article <1993Feb13.175719.4309@kth.se>, d88-jwa@hemul.nada.kth.se (Jon
>Wtte) wrote:
> In <C2DF3w.Guz@ns1.nodak.edu> nicolai@plains.NoDak.edu (Steven Nicolai) writes:
>
> >I've encountered an interesting brain-teaser working on a program. What's
> >the shortest sequence of 68K assembly that will reverse the order of the
> >bytes in a long word. (ie convert a little endian number to big endian.)
>
> Hmm, wouldn't that just be 3 SWAP instructions?
>
[Stuff deleted]
Ain't no SWAP.B; all you can do is swap words. Here's my solution:
And here is a register-parameter inline version of Tim's solution (slow night in Davis):
/* Code Begins */
#pragma parameter __D0 SwapLong (__D1)
unsigned long SwapLong (unsigned long filpMe) =
{
0x2001, /* move.l d1,d0 */
0x4840, /* swap d0 */
0x1001, /* move.b d1,d0 */
0x4840, /* swap d0 */
0x4841, /* swap d1 */
0x1001, /* move.b d1,d0 */
0xE198 /* rol.l #8,d0 */
};
/* Code Ends */
--Chris
From: cconstantine@galaxy.gov.bc.ca
Subject: Re: Reversing the bytes in a long...
In article <1993Feb17.151416.868@aio.jsc.nasa.gov>, mark@pokey.jsc.nasa.gov (Mark Manning) writes:
Keith Rollin of Taligent was kind enough to donate some C code to me for a database program I am writing (it reads PC files & FoxBas+/Mac files) that would do exactly that. Here it is in C:
void ConvertLong(long *num)
{
long tempLong;
char* sourcePtr = (char*) num;
char* destPtr = (char*) &tempLong;
destPtr[0] = sourcePtr[3];
destPtr[1] = sourcePtr[2];
destPtr[2] = sourcePtr[1];
destPtr[3] = sourcePtr[0];
*num = tempLong;
}
and here it is in ASM:
ConvertLong1: 00000000: 4E56 FFFC LINK A6,#$FFFC 00000004: 48E7 0018 MOVEM.L A3/A4,-(A7) 00000008: 286E 0008 MOVEA.L num(A6),A4 0000000C: 47EE FFFC LEA tempLong(A6),A3 00000010: 16AC 0003 MOVE.B $0003(A4),(A3) 00000014: 176C 0002 0001 MOVE.B $0002(A4),$0001(A3) 0000001A: 176C 0001 0002 MOVE.B $0001(A4),$0002(A3) 00000020: 1754 0003 MOVE.B (A4),$0003(A3) 00000024: 206E 0008 MOVEA.L num(A6),A0 00000028: 20AE FFFC MOVE.L tempLong(A6),(A0) 0000002C: 4CDF 1800 MOVEM.L (A7)+,A3/A4 00000030: 4E5E UNLK A6 00000032: 4E75 RTS 00000034
According to Keith, this is also a faster routine than doing a 1 line bitshift operation such as:
void ConvertLong(long *num)
{
*num = (*num >> 24) | ((*num >> 8) & 0xFF00) | ((*num
<< 8) & 0xFF0000) |
(*num << 24);
}
that one line produces the following in ASM:
00000000: 4E56 0000 LINK A6,#$0000 00000004: 2F0C MOVE.L A4,-(A7) 00000006: 286E 0008 MOVEA.L $0008(A6),A4 0000000A: 2014 MOVE.L (A4),D0 0000000C: 7218 MOVEQ #$18,D1 0000000E: E2A0 ASR.L D1,D0 00000010: 2214 MOVE.L (A4),D1 00000012: E081 ASR.L #$8,D1 00000014: 0281 0000 FF00 ANDI.L #$0000FF00,D1 0000001A: 8081 OR.L D1,D0 0000001C: 2214 MOVE.L (A4),D1 0000001E: E189 LSL.L #$8,D1 00000020: 0281 00FF 0000 ANDI.L #$00FF0000,D1 00000026: 8081 OR.L D1,D0 00000028: 2214 MOVE.L (A4),D1 0000002A: 7418 MOVEQ #$18,D2 0000002C: E5A9 LSL.L D2,D1 0000002E: 8081 OR.L D1,D0 00000030: 2880 MOVE.L D0,(A4) 00000032: 285F MOVEA.L (A7)+,A4 00000034: 4E5E UNLK A6 00000036: 4E75 RTS 00000038
NOTE: it's almost twice as long (if not longer) and therefore about 1/2 as fast!.
Just my (Keith's really) $0.02 worth.
Carl B. Constantine
CCONSTANTINE@galaxy.gov.bc.ca
Here's a little function I wrote that I thought I'd share with y'all. It compares Pascal strings in a manner suitable for sorting. It doesn't actually do alphabetical order. (I figure, in most cases, if you're going to try to do alphabetical order, you should use the Toolbox and properly handle diacriticals.)
I wrote this because I have an app that processes lists of strings that have to be sorted so I can access them with a binary search. I decided IUComparePStrings() or whatever it's called would be too slow, and that I was going to have to roll my own. Here it is.
Public domain.
short comparePStrings(register unsigned char *str1,
register unsigned char *str2,
Boolean caseSen)
{
/*
* This comparison routine returns 0 if the two strings passed to it
* are equal, and a negative or positive number otherwise. The
* values can be used to sort strings, i.e. the transitivity law
* applies to this function, i.e. if cPS(s1, s2)<0 and
* cPS(s2, s3)<0, then it's guaranteed that cPS(s1, s3) is also
<0.
*
* But it doesn't exactly sort in alphabetical order. Specifically,
* length is more important than the characters in the string.
* For example, cPS("\pa", "\pb") == -1 because 'a' precedes 'b',
* but cPS("\paa", "\pb") == 1 because "b" is shorter than "a".
*/
asm {
move.b (str1), D0 // get 1st string's length
sub.b (str2)+, D0 // subtract 2nd string's length
bne @done2 // if unequal, we're done; return len diff
movem.l D1-D3, -(A7) // save registers we'll need
moveq #0, D2 // clear out length
moveq #32, D3 // store ('a'-'A') in a handy register
move.b (str1)+, D2 // put length in its register
beq.s @done1 // if length is zero, we're done; return 0
subq #1, D2 // subtract one from length, to set up dbra
tst.b caseSen // is this a case-sensitive compare?
beq.s @nsLoop // no, go to that nonsensitive loop
@sLoop: // yes, do this nonsensitive loop
move.b (str1)+, D0 // get char from 1st string
sub.b (str2)+, D0 // subtract off corresponding char from 2nd
bne @done1 // if unequal, we're done; return char diff
dbra D2, @sLoop // loop on length
bra.s @done1 // we're done looping; return 0
@nsLoop:
move.b (str1)+, D0 // get char from 1st string
cmp.b #'Z', D0 // is it greater than 'Z'?
bgt.s @nonUC0 // if so, jump to non-upper-case section
cmp.b #'A', D0 // is it less than 'A'?
blt.s @nonUC0 // if so, jump to non-upper-case section
add D3, D0 // it's uppercase; make it lowercase
@nonUC0: move.b (str2)+, D1 // get char from 2nd string
cmp.b #'Z', D1 // is it greater than 'Z'?
bgt.s @nonUC1 // if so, jump to non-upper-case section
cmp.b #'A', D1 // is it less than 'A'?
blt.s @nonUC1 // if so, jump to non-upper-case section
add D3, D1 // it's uppercase; make it lowercase
@nonUC1: sub.b D1, D0 // subtract 2nd char from 1st
bne @done1 // if unequal, we're done; return char diff
dbra D2, @nsLoop // loop on length
// length is zero, we're done; return 0
@done1: movem.l (A7)+, D1-D3 // restore registers we used
@done2: ext.w D0 // extend byte to a short and return it
}
}
--
Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy
After seeing Jon and Matthias' three-instruction code for reversing the order of bytes in a long, I decided to make it a part of my personal library. I also decided to play around with that "#pragma parameter" thing, to make the functions inline. Here's what I ended up with; I'm posting it here because maybe I'll save someone the ten minutes it takes to assemble the code by hand. And also so that, if I did something stupid, someone will point it out. :-)
Public domain, of course.
/*
* swapBytes() swaps the bytes of the word at the given location.
* It doesn't usually need to return a value, but it's easy to do so
* because of the implementation. The value returned is equal to
* the 16-bit word, after it's been swapped.
*/
#pragma parameter _D0 swapBytes(__A0)
short swapBytes(void *theShort) = {
0x3010, // move.w (A0), D0
0xE058, // ror.w D0
0x3080 // move.w D0, (A0)
} ;
/*
* getIBMShort() returns the value of the 16-bit word at the given
* location, with the lo and hi bytes reversed. It does the same
* thing as swapBytes() except it doesn't store the new value into
* the memory location.
*/
#pragma parameter __D0 getIBMShort(__A0)
short getIBMShort(void *thePtr) = {
0x3010, // move.w (A0), D0
0xE058 // ror.w D0
} ;
/*
* getIBMLong() returns the value of the 32-bit longword at the
* given location, with the byte order reversed.
*/
#pragma parameter __D0 getIBMLong(__A0)
long getIBMLong(void *thePtr) = {
0x2010, // move.l (A0), D0
0xE058, // ror.w D0
0x4840, // swap D0
0xE058, // ror.w D0
} ;--
Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy
From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: Preloading regions? Is that a good idea?
dmoney@magnus.acs.ohio-state.edu (Dean R Money) writes:
>I'm writing a game that's played on a board with 150 small hexagons.
I'm assuming these hexes are laid out in a regular grid...
>Like Risk, when a player moves/attacks, he clicks on a region. I'm
>using polygons when I draw the board in the first place, then when I
>have to identify which hexagon is clicked on, I'm having to open 150
>regions, one by one, and use PtInRgn.
150's a little excessive. You can narrow it down to seven if you want to work hard, or nine if you don't. (I don't.)
I just finished an app that uses a hex grid. As it happens, I needed to outline the hexes at times, as well at detecting clicks in them. So I decided to use regions.
I built one region, the size and shape of one hex. (Wasn't easy, since it had to conform exactly to the bitmapped image our artist drew. I gave up on MoveTo()/LineTo() and went with FrameRect(). If your hexes are large, lines might work best for you; mine were 20x18.) I kept this guy in a global region handle.
When I needed to outline a hex, I'd OffsetRgn() it to the appropriate place and FrameRgn() it. Since OffsetRgn() works relative, not absolute, you have to peek at the region's rgnBBox to move it to an absolute location. Since I did this all the time, I made a routine that would return me the global region, offset to the proper point, and would build the region if it happened to be the first time it was being called:
RgnHandle CSkinPixels::getHexRgn(short offsetV, short offsetH)
{
if (theOneTrueHexRgn == NULL) {
Rect r;
theOneTrueHexRgn = NewRgn();
OpenRgn();
// Region-building code deleted.
// (You don't care how my hexes look.)
CloseRgn(theOneTrueHexRgn);
}
OffsetRgn(theOneTrueHexRgn,
offsetH - (**theOneTrueHexRgn).rgnBBox.left,
offsetV - (**theOneTrueHexRgn).rgnBBox.top);
return theOneTrueHexRgn;
}
I also needed a routine to turn "hex grid" coordinates into QuickDraw h,v coordinates. The former, I call "loc," the latter, "offset." You probably already have a routine like this, though yours may look different (there's two schools of thought for laying out hex grids).
void CSkinPixels::getHexOffset(short locV, short locH,
short *vOffset, short *hOffset)
{
*hOffset = -40 + (locH+locV)*15;
*vOffset = 92 + (locH-locV)*9;
}
But what you need is a routine that converts from _offset_ to _loc_. The first thing is to solve those equations in reverse. See those two got to turn them around and express them in terms of the other two variables. Yeah, that algebra stuff--you have to add equations together and everything. (I learned how to do this quickly in linear algebra, but I'm afraid I've forgotten it all. Sorry, Dr. Fink.)
My two "inverse" functions ended up being:
*locH = ((long) 3*hOffset + 5*vOffset - 340) / 90; *locV = ((long) 3*hOffset - 5*vOffset + 580) / 90;But that just gives you a point that's close to the one you want. After you know that, you have to go through all the adjacent hexes and test them to see if they're the proper ones. Here's what my "offset to loc" function ended up looking like:
void CSkinPixels::getHexLocation(short vOffset, short hOffset,
short *locV, short *locH)
{
short i, j;
Point thePt;
*locH = ((long) 3*hOffset + 5*vOffset - 340) / 90;
*locV = ((long) 3*hOffset - 5*vOffset + 580) / 90;
thePt.v = vOffset; thePt.h = hOffset;
for (j = -1; j <= 1; ++j) {
for (i = -1; i <= 1; ++i) {
RgnHandle theRgn;
short vO, hO;
getHexOffset(*locV+j, *locH+i, &vO, &hO);
theRgn = getHexRgn(vO, hO);
if (PtInRgn(thePt, theRgn)) {
goto foundIt;
}
}
}
DebugStr("\pCouldn't find it!");
foundIt:
*locV += j; *locH += i;
}
A little more complicated, but it'll take you 150 times less storage, and will run up to 15 times as fast. (My app never hit the debug line, in case you were wondering. :-)
Dean, if you want to chat more about this, email me; the details of this probably don't interest c.s.m.p...
--
Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy
From: chris@benton.prepress.com (christopher m. knox)
Subject: Re: Bit Manipulation help needed.
In article <rudolph.728896597@unixg.ubc.ca> Christopher E Rudolph, rudolph@unixg.ubc.ca writes:>First off I'm weak on and'ing and or'ing bits so bear with me if I....
Hi, (we Chris's need to stick together, eh ? :-))
A long integer is 32 bits, a short integer is 16 bits.
Using toolbox calls which allow a long as a bit number, you can set and clear bits in any structure by passing an address to the structure: counting from LEFT to RIGHT and starting at 0 you can set/clear/test bit N in your long value V as follows:
BitSet((Ptr) &V,(long) N); BitClr((Ptr) &V,(long) N); bitIsSet = BitTst((Ptr) &V,(long) N);If you want to go RIGHT to LEFT (as in your example), use (31 - N) instead of N. If it has to be fast, don't use toolbox calls: (from RIGHT to LEFT):
t = 1L << N; /* N = 0 => 1, N = 2 => 2, N = 3 => 4 .... */
V |= t; /* set */ V &= ~t; /* clear */ bitIsSet = (V & t) ? 1 : 0; /* test */
If it has to be still faster, then you should READ the value from
LEFT to RIGHT and fill it easily as follows:
BTW, V <<= N shifts V N bits to the left and shifts in 0s at the right.
> right 0000 0000 0000 0001
> right 0000 0000 0000 0011
> left 0000 0000 0000 0011
> right 0000 0000 0000 1011
> left 0000 0000 0000 1011
#define PushRIGHT(_V) {(_V) <<= 1; (_V) |= 1;}
#define PushLEFT(_V) {(_V) <<= 1; /* (_V) |= 0; */}
V = 0L; /* ... 0000 0000 0000 0000 */
PushRIGHT(V); /* ... 0000 0000 0000 0001 */
PushRIGHT(V); /* ... 0000 0000 0000 0011 */
PushLEFT(V); /* ... 0000 0000 0000 0110 */
PushRIGHT(V); /* ... 0000 0000 0000 1101 */
PushLEFT(V); /* ... 0000 0000 0001 1010 */
and read it back (albeit in reverse order):
#define PopBIT(_V,_bit) {_bit = (_V) & 1L; (_V) >>= 1;}
PopBIT(V,isRight);
if (isRight) {...}
else {...}
and so on... I hope this helps you some in 'bitsing' around...
chris knox, primary programmer, SpectreTouch
"Omne animal triste est post (or without) coitum" Umberto Eco
From: salmon@cwis.unomaha.edu (David Salmon)
Subject: Re: faster division routine wanted.
bayes@hplvec.LVLD.HP.COM writes:
> I wrote a routine that seems almost correct, and that allows you to
> quickly calculate a set of shifts and add/subs that approximate the
> division by any divisor. I know there's some stuff wrong with it, but
> thought it's worth sending out now, while the iron is hot.
The above mentioned algorithm is essentially correct. However, the implementation:
> 230 ! X/1000 = X>>10 + X>>15 - X>>17 + X>>21
> 240 ! The actual division this performs is very close to:
> 250 ! X/1000.0724845 (as shown by "approximation" printout.)
Does not really reporduce the divide by 1000 correctly. The higher order shifts lose the precision you need for accuracy. You need to shift-add- shift-add as in the following definitions:
/* This definition gives accurate results for n < 1 764 866 177 */ #define divNum8(n) ((((((((((((((((-n)>>4)-n)>>3) \ +n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10 /* This definition gives accurate results for n < 1 762 037 711 */ #define divNum7(n) ((((((((((((((-n)>>3) \ +n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10 /* This definition gives accurate results for n < 492 965 001 */ #define divNum6(n) (((((((((((n>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)&t;>10) /* Use of fewer terms yields unacceptable results */
The definition divNum6 is roughly the equivalent of the x/1000 statement given above, but maintains the precision. The numbers listed above represents the limits for which the macros give the same value as the divide instruction. You must use full registers (32 bits) to get the correct results. On a IIfx, the above macro (divNum8) is about 25% faster than the divide instruction. divNum6 which should be sufficient for your purposes should be faster.
Hope this helps.
ds
--
David C. Salmon
salmon@unomaha.edu
From: bruner@sp10.csrd.uiuc.edu (John Bruner)
Subject: Re: faster division routine wanted.
An alternate way to do division is to multiply by the fixed-point reciprocal. In this case, you can do it with
quotient = (0x8312 * numerator + 0x8000) >> 25;
From: bayes@hplvec.LVLD.HP.COM (Scott Bayes)
Subject: Re: (Q) Need bit manipulation algorithm
> name, but whose initials are M.A.C. And if you want a solution for
> a word, and you don't want a 32K lookup table, you actually have to
> _shift_! Shifting is for weenies.
But for a word, it's just the sum of the bits in each byte. Code something like this (untested) should do it PDQ.
---
* at beginning, d0 contains a word whose "1" bits we wish to count
* at end, d2 contains number of 1 bits in its lower byte
moveq #0,d1 - clear table index register
move.b d0,d1 - lower byte of data word
move.b btable(d1.w),d2 - nbits for lower byte (PC relative)
lsr.w #8,d0 - access upper byte of data word
move.b d0,d1 - upper byte of data word
add.b btable(d1.w),d2 - d2 contains total bits of word
---
ScottB
From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: faster division routine wanted.
sage.cc.purdue.edu (Broc Seib) writes:
>ctvien@cs.concordia.ca (VIEN cam thanh) wrote:
>>
>> I am looking for a division routine that will divide a 2 bytes number
>> by 1000 faster than M68000's DIV instruction.
First of all: are you really on the 68000? If not, you should be aware that the '020 and later chips are much faster at division. My 68000 manual pegs a signed division at 158 cycles, +/- less than 10%. You can get a few dozen shifts and moves in that time, so custom code might help. But on an '020 or later, the chip will almost certainly beat you.
>Replace your DIV operand with an
>LSR (or is it ASR --- which ever one fills in the upper byte with zeros as
>it shifts.)
LSR. Logical shifts fill with zero, arithmetic shifts copy the sign bit.
>With this operand you can _quickly!_ divide by 1024 (shift to
>the right 10 bits). i.e.
>
> move.w #$operand,d0
> lsr.w 10,d0
Actually, you have to put the 10 into a register; you can only shift up to eight bits in immediate mode. You can shift a longword.
>If dividing by 1000 _exactly_ is of essence, then you may accomplish this with
>a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as
>long (or possibly) longer than the DIV operand itself. You would have to
>count operand cycles (and compare to DIV) to determine this.
>
> [sample multiply-by-ten code deleted]
If I'm not mistaken, hardcoded division is a lot harder than hardcoded multiplication. Dividing by 1000 is dividing by eight and then dividing by five three times. How do you divide by five? To divide by five, you divide by four and take four-fifths of the remainder. How do you take four-fifths of something? You divide by five and subtract the result from the original. You recurse like this until the numbers get small, which is not all that efficient. And it's no fun to write either. This is why microcode engineers get paid lots of money.
If you really need speed, and you're on a 68000, build a 65,536-byte lookup table, so that x/1000==table[x]. Fourteen cycles. Can't beat it!
--
Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy
From: sage.cc.purdue.edu (Broc Seib)
Subject: Re: faster division routine wanted.
In article <5662@daily-planet.concordia.ca>, ctvien@cs.concordia.ca (VIEN cam thanh) wrote:
I am looking for a division routine that will divide a 2 bytes number by 1000 faster than M68000's DIV instruction.
I am not sure of the application in which you are doing this, but IF you only need an approximate division by 1000, AND this is an operation on a 2-byte integer (not floating point numbers!), then you may take advantage of the processor's speed in "shifting". Replace your DIV operand with an LSR (or is it ASR --- which ever one fills in the upper byte with zeros as it shifts.) With this operand you can _quickly!_ divide by 1024 (shift to the right 10 bits). i.e.
move.w #$operand,d0 lsr.w 10,d0
- Broc Seib, Purdue University
----------------------------------------------------------------------------
If that solution is inadequate, then this may possibly help...
----------------------------------------------------------------------------
If dividing by 1000 _exactly_ is of essence, then you may accomplish this with a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as long (or possibly) longer than the DIV operand itself. You would have to count operand cycles (and compare to DIV) to determine this.
I will show an analogous application that multiplies a number by 10 (before the days of luxurious MUL operand); I will leave the divide by 1000 story as a homework problem for the reader.
The multiply by 10 story is as follows:
move.w #$operand,d1 move.w d1,d0 lsl.w 2,d0 add.w d0,d1 lsl.w 1,d0
Verify this multiplication by 10 for yourself w/ pencil & paper.
Hint: The divide by 1000 story is simply 3 divide by 10's; just be careful with what you subtract from the "accumulator" register!
(I am a little rusty on my assembly; please excuse my mistakes)
Good Luck.
- Broc Seib
From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy)
Subject: Re: (Q) Need bit manipulation algorithm
Summary: algorithm analysis & gratuitous MicroSoft-bashing
chris_russo@gateway.qm.apple.com (Christopher Russo) writes:
>Matthew T. Russotto, russotto@eng.umd.edu writes:
>>There is a neat trick for counting the number of bits in a word that
>>a Microsoft geek showed me one time. If there are k bits with 1's,
>>it runs in O(k) time.
>
>I learned a cool way in an interview here at Apple. For the number of
>bits in a byte, all you'd need is a 256 byte look-up table where the
>index designated by, say, 00001010 would be index $A into the table which
>would contain the number 2 (that you put there.) It's pretty fast.
Naturally, however, the cool hackers at MicroSoft wouldn't use that! Heavens, a 256-byte lookup table is 1/2500th of the memory below the 640K barrier! And pointers to global variables are so hard to use anyway, unless you want to use one of those wimpy high-level languages, or those spineless computers with the flat memory models that we won't name, but whose initials are M.A.C. And if you want a solution for a word, and you don't want a 32K lookup table, you actually have to _shift_! Shifting is for weenies.
No, despite being ten to fifty times slower, it's much better to write:
for (nOneBits=(byte?1:0); byte&=(byte-1); ++nOneBits) ; ...than: nOneBits = gOneBitsArray[byte];
From: Ray.Arachelian@f204.n2603.z1.ieee.org (Ray Arachelian)
Subject: (Q) Need Bit Manipulation
On 01-06-93, FELCIANO@SUMMIT.STANFORD. wrote to ALL:
F> If I want to match 00110110, there is no exact match. In this case, I
F> want it to return #6, which is only 1 bit off. If #6 weren't there, I
F> would want it to return #9, which is 2 bits off, then #10, which is
F> also 2 bits off, then #2 (1 bit off), then #1 (default)
F>
F> So: I need an algorithm that will find a consistent fuzzy match by
F> subtracting bits. It needs to deal with ambiguities like #9 and #10.
F> Thus if two matches have 90% of the bits in common, I need both of
F> them back, one after the other (the order doesn't matter).
Since string manipulation is a lot easier than bit manipulation from high level languages in terms of searching and matching, you may want to first convert both the search bit pattern and the pattern you're searching for into two strings and then do an in-string search (strstr() function in C)
If you don't come up with a match, take your search string and flip the first bit, do strstr() again. If it fails, flip the first bit back to what it was and flip the next bit. Do the search again, etc.
Once you get this to work, you can think about a way of doing this with just bits, but you should do it with a bit mask and rotations. Why a bit mask? Because bytes, words, and long words are of fixed size. If you're searching for a fixed word, this is fine, but if you're searching for three bits, let's say, then life is much harder. What you'd want to look for is actually more like *11001* where the *'s are like Unix wild cards.
The way to do this is to keep two words, or two longs for your search patter. The pattern itself, and a bit mask. IE:
*11001* in a long word search would become:
11010000 00000000 00000000 00000000-> search string
11110000 00000000 00000000 00000000-> bit mask.
Then you take your candidate element, perform an AND operation between it and your bit mask. Then you compare the result of this with your search string. If it doesn't match, you rotate both the bit mask and the search string to the right by one bit, and try again.
When you've rotated 32 times (ie: your original search string matches your rotated search string) and you still haven't found a match, you can begin flipping bits within your bit mask. This will now all of a sudden become VERY complex. :-)
Tackle this problem with strings first, then turn it in bits, then re-write the whole thing in assembly because it will be a VERY SLOW search indeed.
Here's a bit of a binary fuzzy search code for you. This is off the top of my head, so I don't know if it will work. Bugs are guaranteed. :-) Play with it once you understand the model.
S -> long word containing search string of bits.
M -> long word containing mask of desired bits to search for.
A[i] -> array of long words to search in.
sr -> rotating search long word.
mr -> rotating mask long word.
sf -> search long word with flipped bits.
int fuzzy_bit_search(long &A[],int last_i,
unsigned long S, unsigned long M)
{
unsigned long sr,mr,sf;
int i;
int flip=0;
flip=1<<31// set flip to be 2^31, * check this for correctness *
for (i=0; i<=last_i; i++) // go through every element in the array
while (flip!=0) // go through every bit flip
{ sf=S;
sr=~sf;
while(sf!=sr) // go through every rotation.
{
sr=sf;
mr=M;
test = ((A[i] & mr)==(sr & mr)); // test for match with mask
if (test) return i;
sr=sr>>1;
mr=mr>>1;
}
sf=sf ^ flip;
flip=flip>>1;
}
return(-1); // sorry, we couldn't find a match.
}
Here are a couple of problems for the above:
1. The first match will be returned, this may be the wrong thing to do because you may have 111011 match with 111001 when you may have an element later on that is 111011. A sorting routine may help, but this will have to be done with strings as doing this with bits gets even more complex.
2. The code is not efficient. It will flip bits that are outside of the mask.
3. It only works for an exact match, or a match that is off by exactly one bit. You may again wish to build a better function that will build a list of candidates and return the best match, and also provide a way of doing a second and third flipped bit routine. (For each level of flipped bits, you'll need another nested loop and another flip variable.) You'll have to decide how many bits you wish at maximum to be off.
4. You have no way of knowing the closeness of the match. IE: How far off the flipped bits are from the real search string.
5. This code is untested, so it's sure to be buggy. :-)
Please post up your source code when you do the full fuzzy search, and
good luck.
rarachel@ishara.poly.edu
Ray.Arachelian@f204.n2603.z1.fidonet.org
* Freddie 1.3b4 * Do infants have as much fun in infancy as adults in
adultery?
--
From: chris_russo@gateway.qm.apple.com (Christopher Russo)
Subject: Re: (Q) Need bit manipulation algorithm
In article <1993Jan08.012806.14863@eng.umd.edu> Matthew T.Russotto, russotto@eng.umd.edu writes:
>>There is a neat trick for counting the number of bits in a word that
>>a Microsoft geek showed me one time. If there are k bits with 1's,
>>it runs in O(k) time.
I learned a cool way in an interview here at Apple. For the number of bits in a byte, all you'd need is a 256 byte look-up table where the index designated by, say, 00001010 would be index $A into the table which would contain the number 2 (that you put there.) It's pretty fast.
From: reinoud@dutecai.et.tudelft.nl (R. Lamberts)
Subject: Julia's Dream algorithm + example code
Reply-To: reinoud@dutecai.et.tudelft.nl (R. Lamberts)
Originator: reinoud@dutecai.et.tudelft.nl
Hi there!
This is for all of you who remember Julia's Dream (JD), a Mac application which calculates Julia sets really fast. Many people asked me how it works, but I'm not that good at explaining these things, so a lot of you must have been dissappointed. I humbly apologise for never having written a decent piece about the algorithm! However, here's a simplified implementation of JD written in C, and before that another attempt at explaining it.
The idea of Julia's Dream is very simple, even though I think it's not easy to explain. Maybe it's best to have a look at the source below if you don't like vague stories ;-). This code is based on the same idea as the Mac version, though the implementation is quite different.
The idea of JD is that with the escape-time algorithm, each iteration is essentially independent of its context. It does not make any difference whether a certain point in an orbit is an intermediate position or it is a starting position corresponding to a pixel. This is true for Julia sets, but not for the Mandelbrot set, where the (c or mu) parameter varies with the starting point (pixel). Now, because for a Julia set each intermediate point in an orbit may also be considered as a starting point, if a pixel is associated with each intermediate orbit point, the calculation of a pixel value effectively requires just one iteration.
There are various algorithms possible to implement this idea. One way is to calculate an orbit for a certain starting point, and assign pixel locations and values to each point in the orbit afterwards. This is used in the code below. The original Mac version of JD did calculations with each orbit point rounded to a pixel location each iteration, allowing for significant speed-up if the CPU is slow relative to memory access.
If you want to see the Mac code, let me know. It's a lot of hairy 68020 assembly code though...
To improve accuracy around the origin, some pixels are calculated using the good old escape-time algorithm there. There are probably more elegant ways to do this, but I didn't exercise the options.
The code below compiles on a SGI Indigo with entry (8-bit) graphics (sorry, I didn't even initialise the colormap). It uses the GL, not X. This is *not* finished code, especially the user interface is a quick hack. I've included the GL stuff with the hope that a concrete example may be easier to understand for some. Sorry about the lack of comments, this was really an experiment to see how well the JD algorithm would work out in C with straightforward FP calculations.
Well, I hope this is of any help (or interest, if you didn't know JD). Many thanks to all who contacted me about this! If you have ideas for (algorithmic) improvement, spiffy programs, or further questions, I'd very much like to know.
Have fun,
- Reinoud
/*
* Julia's Dream II (XP hack version)
*
* The nightmare continues...
*
* Copyright (c) 1992 Reinoud Lamberts, all rights reserved
*
* You are free to use this stuff as you like, as long as
* you give credit where credit is due.
*
*
* Reinoud Lamberts
* Oldambt 69
* 3524 BD Utrecht
* The Netherlands
*
* Phone (int'l code 31) 30 896775
*
* reinoud@duteca.et.tudelft.nl
* reinoud@knoware.ruu.nl (UUCP)
* reinoud%knoware.ruu.nl@accucx.cc.ruu.nl
*
*/
#include <stdlib.h>
#include <gl/gl.h>
#include <gl/device.h>
#define XRES 256 /* better be power of 2 */
#define YRES 256
#define XLOG 8 /* shift amount i.s.o multiply */
typedef unsigned char byte; /* should be 8 bits (and int 32) */
typedef double fl; /* optimise for performance */
byte *ospmap; /* things happen in here */
main()
{
int xi, yi, ospindex, it, i, ospwsiz, bhsiz, jinx;
fl deltax, deltay, idx, idy;
fl xc, yc, sqre, sqim, sqyc, re, im, cre, cim;
int orb[10000];
int bound, empty, hitval;
int pro, con;
/*
* yeah, let's arrange for some accomodation or something
*/
ospmap=malloc(YRES*XRES);
if (ospmap==NULL)
exit(1);
/*
* let's do the window thing now
*/
prefsize(XRES, YRES);
winopen("Julia's Dream II XP hack version (c) 1992 Reinoud Lamberts");
pixmode(PM_SIZE, 8); /* 8 bits offscreen*/
/*
* now let's start talking business
*/
/* constants */
deltax=4.0/(XRES-1); /* section of c plane is fixed */
deltay=4.0/(YRES-1);
idx=1/deltax;
idy=1/deltay;
ospwsiz=((YRES*XRES)>>2);
bhsiz=XRES>>4;
do
{
/* init */
cre=-2+getvaluator(MOUSEX)*deltax;
cim=-2+getvaluator(MOUSEY)*deltax;
for (i=((YRES>>1)*XRES)>>2; i<ospwsiz; i++)
((int *) ospmap)[i]=0;
/* enter the black hole */
yc=deltay*0.5;
for (yi=(YRES>>1); yi<(YRES>>1)+bhsiz; yi++)
{
xc=-(bhsiz+0.5)*deltax;
sqyc=yc*yc;
ospindex=(yi<<XLOG)+(XRES>>1)-1-bhsiz;
for(xi=(XRES>>1)-1-bhsiz; xi<((XRES>>1)+bhsiz); xi++)
{
sqim=sqyc;
sqre=xc*xc;
re=xc;
im=yc;
it=0;
while( (bound=((sqre+sqim)<4.0))&&(it<64) )
{
im=2.0*re*im+cim;
re=sqre-sqim+cre;
sqre=re*re;
sqim=im*im;
it++;
}
if (!bound)
ospmap[ospindex]=255-it;
else
ospmap[ospindex]=255;
xc+=deltax;
ospindex++;
}
yc+=deltay;
}
/* explore outer space */
ospindex=(YRES>>1)*XRES;
yc=deltay*0.5;
/* okay, here it goes */
for (yi=(YRES>>1); yi<YRES; yi++)
{
xc=-2.0;
sqyc=yc*yc;
for (xi=0; xi<XRES; xi++)
{
sqim=sqyc;
sqre=xc*xc;
re=xc;
im=yc;
it=0;
jinx=ospindex;
while ( (bound=((sqre+sqim)<4.0)) &&
(empty=((hitval=ospmap[jinx])==0)) && (it<10000)
)
{
orb[it]=jinx;
ospmap[jinx]=255;
im=2.0*re*im+cim;
re=sqre-sqim+cre;
if (im<0)
{
im=-im;
re=-re;
}
sqre=re*re;
sqim=im*im;
jinx=( ((int) ((im+2.0)*idy+0.5)) <<XLOG)+(re+2.0)*idx+0.5;
it++;
}
if (it!=0)
{
if ((!empty)&&(hitval!=255))
do
{
if (--hitval==0)
hitval=254;
ospmap[orb[--it]]=hitval;
}
while (it>0);
else
if (!bound)
{
hitval=255;
do
{
if (--hitval==0)
hitval=254;
ospmap[orb[--it]]=hitval;
}
while (it>0);
}
}
xc+=deltax;
ospindex++;
}
yc+=deltay;
}
pro=0;
con=YRES*XRES-1;
do
{
ospmap[pro++]=ospmap[con--];
}
while(pro<=con);
lrectwrite(0, 0, XRES-1, YRES-1, (unsigned long *) ospmap);
}
while (1);
return 0; /* easy on the compiler */
}
From: jmatthews@desire.wright.edu
Subject: Detecting double click?
Source:Pascal
Recently I suggested the following way to check for double clicks:
var gLastMouseUp: EventRecord;
procedure SaveLastMouseUp(event: EventRecord);
begin
gLastMouseUp := event
end;
function CheckDoubleClick (event: EventRecord): Boolean;
var slop: Rect;
begin
with gLastMouseUp.where do
SetRect(slop, h - 2, v - 2, h + 2, v + 2);
CheckDoubleClick:= ((event.when - gLastMouseUp.when) <=
GetDblTime) and (PtInRect(event.where, slop))
end;
In the main event loop, call SaveLastMouseUp in response to mouseUp events, and call CheckDoubleClick in response to mouseDown events. The latter returns true when a double click (as defined in IM vol.I, p.260) occurs.
This definition of a double click can lead to some confusion in subsequent calls to StillDown or WaitMouseUp. An alternative is to measure time between mouseDown events as follows:
var gLastMouseDown: EventRecord;
{should initialize gLastMouseDown.when to 0}
function CheckDoubleClick (event: EventRecord): Boolean;
var
slop: Rect;
test: Boolean;
begin
with gLastMouseDown.where do
SetRect(slop, h - 2, v - 2, h + 2, v + 2);
test := ((event.when - gLastMouseDown.when) <= GetDblTime) and
(PtInRect(event.where, slop));
if test then
gLastMouseDown.when := 0 {prevent multiple clicks}
else
gLastMouseDown := event; {save this one for future testing}
CheckDoubleClick := test
end;
How are others doing it?
John B. Matthews
From: russotto@eng.umd.edu (Matthew T. Russotto)
Subject: Re: (Q) Need bit manipulation algorithm
In article <20751@ucdavis.ucdavis.edu> lim@toadflax.cs.ucdavis.edu (Lloyd Lim) writes:
>
>There is a neat trick for counting the number of bits in a word that
>a Microsoft geek showed me one time. If there are k bits with 1's,
>it runs in O(k) time.
They probably use it for job interviews....
ones = 0;
while (k) {
k=k&(k-1);
ones++;
}
From: bowman@reed.edu (BoBolicious)
Subject: Re: (Q) Need bit manipulation algorithm
In article <1993Jan6.022430.19469@leland.Stanford.EDU> felciano@summit.stanford.edu (Ramon M. Felciano) writes:
>I've got an interesting bit-manipulation problem for you. I need to
>know how to match a given bit pattern into a stored array of patterns.
>In particular, I need to come up with a "fuzzy match" if no exact one
>is found.
The best way I can see to do this is to XOR your test pattern with each member of the table. Then the number of "on" bits (the Hamming distance?) is the number of bits by which the two patterns differ. I don't know the most efficient way to count the number of "on" bits; there are several ways to do it, so if you need it to be super-fast, you might want to experiment between looping through each bit in the pattern, shifting each time and testing the lowest (or highest, depending on which way you shift) bit. Or you could maintain a table which contained 1,2,4,8,16,... and AND the result of each XOR with each member of *this* table.
Doing the actual bookkeeping to keep track of which one(s) to return and by how many bits they differ from the test pattern should be pretty trivial.
From: lim@toadflax.cs.ucdavis.edu (Lloyd Lim)
Subject: Re: (Q) Need bit manipulation algorithm
Summary: bit counting and gratuitous Microsoft bashing
In article <1993Jan6.032113.597@planets.risc.rockwell.com> sho@risc.rockwell.com (Sho Kuwamoto) writes:
>
>Maybe not the fastest method, but a simple one:
> 1) come up with an algorithm which counts the number of
> 1's in a given bit pattern. You'll either use a loop
> with shifts and increments in it, or you'll use masks.
>[...]
There is a neat trick for counting the number of bits in a word that a Microsoft geek showed me one time. If there are k bits with 1's, it runs in O(k) time.
I don't remember the algorithm, but it involved subtracting 1 and could be written in one line of C. (Yeah, I know... You could write anything, even Microsoft Word, in one very long line of C code. Come to think of it, that's probably what they did.) The subtraction was used to create instant masks: e.g. 0x0100 - 1 = 0x00FF. It went from the highest 1 bit to the lowest, magically avoiding all of the 0's.
That's all I remember. Anyone remember this algorithm?
From: bwilliam@iat.holonet.net (Bill Williams)
Subject: Re: How do you draw a circle/oval?
Re: Request for filled oval graphics primitive.....
/*=********************************************************************=*/ /*=********************************************************************=*/ #define INTEGER_32 long int static void Draw_Filled_Ellipse_Primitive(INTEGER_32 start_X , INTEGER_32 start_Y ,INTEGER_32 axis_A, INTEGER_32 axis_B , PixPatHandle current_Pattern) { register INTEGER_32 active_X, active_Y; register INTEGER_32 axis_A_Squared, axis_B_Squared, axis_A_Squared_Times_2; register INTEGER_32 axis_B_Squared_Times_2, derivitive_X, derivitive_Y, dd; /* END VAR */ /*=--------------------------------------------------------------------=*/ /* Uses dual differential high efficiency technique */ /*=--------------------------------------------------------------------=*/ active_X = axis_A; active_Y = 0; axis_A_Squared = axis_A*axis_A; axis_B_Squared = axis_B*axis_B; axis_A_Squared_Times_2 = axis_A_Squared + axis_A_Squared; axis_B_Squared_Times_2 = axis_B_Squared + axis_B_Squared; derivitive_X = axis_B_Squared_Times_2*axis_A; derivitive_Y = 0L; dd = (axis_B_Squared / 4L) - (axis_B_Squared * axis_A) + axis_A_Squared; /*=--------------------------------------------------------------------=*/ /* Step along Y axis */ /*=--------------------------------------------------------------------=*/ while ( derivitive_X > derivitive_Y ) { Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X) , (start_Y - active_Y) , (active_X << 1) +1); Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X) ,(start_Y + active_Y) , (active_X << 1) +1); active_Y +=1; derivitive_Y += axis_A_Squared_Times_2; if (dd <= 0L ) dd += (derivitive_Y + axis_A_Squared); else { derivitive_X -= axis_B_Squared_Times_2; active_X -= 1L; dd += (derivitive_Y + axis_A_Squared - derivitive_X ); } } /* while ( derivitive_X > derivitive_Y ) */ /*=--------------------------------------------------------------------=*/ /* Step along X axis */ /*=--------------------------------------------------------------------=*/ dd += ( ( ((3L *(axis_B_Squared-axis_A_Squared)) / 2L) - ( derivitive_X + derivitive_Y ) ) / 2L); while ( active_X > 0L ) { Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X),(start_Y - active_Y), (active_X << 1) +1); Draw_Patterned_Eastward_Line_Subprimitive( (start_X - active_X),(start_Y + active_Y), (active_X << 1) +1); active_X -=1; derivitive_X -= axis_B_Squared_Times_2; if (dd > 0L ) dd += (axis_B_Squared - derivitive_X); else { derivitive_Y += axis_A_Squared_Times_2; active_Y += 1L; dd += (derivitive_Y + axis_B_Squared - derivitive_X ); } } /* while ( active_X > 0L ) */ while (active_Y <= axis_B) { Draw_Patterned_Eastward_Line_Subprimitive(start_X ,(start_Y + active_Y) , 1L); /* actually only one pixel */ Draw_Patterned_Eastward_Line_Subprimitive(start_X ,(start_Y - active_Y) , 1L); /* actually only one pixel */ active_Y += 1L; } } /* Draw_Filled_Ellipse_Primitive */ /*=********************************************************************=*/ /*=********************************************************************=*/
I Guarantee this puppy is FAAAAST. I designed it from various sources.
(Draw_Patterned_Eastward_Line_Subprimitive should be written as fast as< you can make it.... or just make it a LineTo.)
Enjoy.
Bill Williams
From: jmunkki@vipunen.hut.fi (Juri Munkki)
Subject: Re: 64-bit multiply and divide on 68000
Keywords: 68000 Assembler Math discrete
Reply-To: jmunkki@vipunen.hut.fi (Juri Munkki)
In article <By3BM7.InD@world.std.com> squeegee@world.std.com (Stephen C. Gilardi) writes: >I need to calculate a linear interpolation rapidly. The equation is
>
> y = x * numer / denom,
>
>where y, numer, and denom are 32-bit unsigned longs and x is a 16-bit
>unsigned short.
>
>On the 68020, I can use the built-in "long" multiply and divide
instructions.
>However on the 68000 it's more involved.
>
>I found 68000 code in a book to do the 64-bit unsigned multiply. I'd
>like to have the divide in assembly language as well.
>
>Does anyone have such a routine, or a reference to where I can find one?
I'm also interested in a good division routine for the 68K. Here's a routine that does essentially what you wanted to do and a bit more. It uses 32 bit fixed point numbers. I would very much like to see an improved division routine. I wasn't able to figure out anything better on my own and since my main target processor is the 68030, this version is not used all that often.
Since your x is a short, you will save quite a bit of time by using that knowledge to optimize the code.
Fixed FMulDiv68000(a,b,c)
Fixed a,b,c;
{
asm {
movem.l D3-D5,-(sp)
move.l a,D0
bpl.s @positive_a
neg.l c
neg.l D0
@positive_a
move.l b,D1
bpl.s @positive_b
neg.l D1
neg.l c
@positive_b
move.w D1,D2
mulu.w D0,D2 ; D2 = Lo * Lo
move.w D1,D3
swap D0
mulu.w D0,D3 ; D3 = Hi * Lo
swap D1
move.w D1,D4
mulu.w D0,D4 ; D4 = Hi * Hi
swap D0
mulu.w D0,D1 ; D1 = Lo * Hi
clr.l D5
move.w D3,D5
swap D5
clr.w D3
swap D3
add.l D5,D2 ; 64 bit addition Hi*Hi:Lo*Lo += Hi*Lo
addx.l D3,D4
clr.l D5
move.w D1,D5
swap D5
clr.w D1
swap D1
add.l D5,D2 ; 64 bit addition Hi*Hi:Lo*Lo+Hi*Lo += Lo*Hi
addx.l D1,D4 ; Result is now in D4:D2
add.l D2,D2
addx.l D4,D4
move.l c,D1
bpl.s @positive_c
neg.l D1
@positive_c
moveq.l #1,D0
bra.s @divloop
@divok
lsl.l #1,D0
bcs.s @divdone
add.l D2,D2
addx.l D4,D4
@divloop
sub.l D1,D4
bcc.s @divok
addx.l D0,D0
bcs.s @divdone
add.l D1,D4
add.l D2,D2
addx.l D4,D4
bra.s @divloop
@divdone
move.l c,D1
bpl.s @positive_result
addq.l #1,D0
bra.s @done
@positive_result
eor.l #-1,D0
@done
movem.l (sp)+,D3-D5
}
}
--
Juri Munkki Windsurf: fast sailing
jmunkki@hut.fi Macintosh: fast software
I have lots of random number generator source, most are in "C" two are in 68000 assembly but one of them (My ultimate favorite) is actually in Pascal (The language you requested) here it is:
===========
UNIT Portable_Random_Numbers;
INTERFACE
FUNCTION Fetch_Random_Numbers: real;
PROCEDURE Manually_Initialize_Seeds (value_1, value_2: LONGINT);
PROCEDURE Auto_Initialize_Seeds;
IMPLEMENTATION
VAR
Random_Number_Seed_1, Random_Number_Seed_2: LONGINT; {
signed 32-bit integers }
FUNCTION Fetch_Random_Numbers: real;
VAR
zz, kk: longint;
BEGIN
kk := Random_Number_Seed_1 DIV 53668;
Random_Number_Seed_1 := 40014 * (Random_Number_Seed_1 - kk
* 53668) - kk * 12211;
IF Random_Number_Seed_1 < 0 THEN
Random_Number_Seed_1 := Random_Number_Seed_1 + 2147483563;
kk := Random_Number_Seed_2 DIV 52774;
Random_Number_Seed_2 := 40692 * (Random_Number_Seed_2 - kk
* 52774) - kk * 3791;
IF Random_Number_Seed_2 < 0 THEN
Random_Number_Seed_2 := Random_Number_Seed_2 + 2147483399;
zz := Random_Number_Seed_1 - Random_Number_Seed_2;
IF zz < 1 THEN
zz := zz + 2147483562;
Fetch_Random_Numbers := zz * 4.65661E-10;
END;
PROCEDURE Manually_Initialize_Seeds (value_1, value_2: LONGINT);
{ This is for repeatable use of the Random number generator }
{ requirement is that Random_Number_Seed_1 be in the range [1,
2_147_483_562] and Random_Number_Seed_2 }
{ be in [1, 2_147_483_398] ( [ -2**31 + 85, 2**31 - 85 ]). }
BEGIN
Random_Number_Seed_1 := value_1;
Random_Number_Seed_2 := value_2;
END;
PROCEDURE Auto_Initialize_Seeds;
{ This version of Initialize_Seeds is Macintosh dependent. The theoretical }
{ requirement is that Random_Number_Seed_1 be in the range [1,
2_147_483_562] and Random_Number_Seed_2 }
{ be in [1, 2_147_483_398] ( [ -2**31 + 85, 2**31 - 85 ]). Here,
Random_Number_Seed_1 and }
{ Random_Number_Seed_2 are in [0, 32_737]. See reference [1] cited in the
comments of }
{ Fetch_Random_Numbers for more information on generating seeds. }
BEGIN
GetDateTime(randSeed);
Random_Number_Seed_1 := abs(Random);
Random_Number_Seed_2 := abs(Random);
END;
END. { Portable_Random_Numbers }-------------
It is a classic Multiplicative Linear Congruential Generator as known about for over 20 years and proven to be the best "shitty" safe and equally distributed, evenly repeating random technique.
{1. L'Ecuyer, Pierre, Efficient and Portable Combined Random Number } { Generators. Communications of the ACM, 31, 6 (June 1988), 742-749, 774 }
L'Ecuyer reports that a comprehensive battery of over 20 different tests were performed on the generator that involved billions of psuedo-random numbers and over 200 hours of VAX-11/780 CPU time. The empirical behavior of the generators is very satisfactory for this class of generator.
Hope that solves it!
Bill Williams
From: db21@cbnewsc.cb.att.com (david.beyerl)
Subject: Re: Easter was: Re: Algorithm for Day-Of-Week NEEDED
Summary: Here's where to locate an algorithm.
In article <DM.92Sep24220600@stekt2.oulu.fi>, dm@stekt2.oulu.fi (Hannu Helminen) writes:
Does anyone have working C-code to determine when it is Easter in a given year? It would be handy since I am currently writing a calendar utility in C. Pointers to such code would also be appreciated.An algorithm for the date of Easter is presented on pages 155-156 in "The Art of Computer Programming, Fundamental Algorithms, Volume 1, by Knuth. Below is C-source I obtained from someone on the net.
Dave Beyerl
att!ihlpm!db21
---------------------------------------------------------------------From att!world.std.com!geoff Thu Jan 09 12:31:28 0500 1992
Here you go; enjoy.
# To unbundle, sh this file
echo easter.c 1>&2
sed 's/^X//' >easter.c <<'!'
X/*
X * easter - compute date of easter
X * see The Art of Computer Programming, Vol. 1, pp. 155-6.
X */
X# include <stdio.h>
X# include <time.h>
X# include <sys/types.h>
X# define CENTURY 100
X# define LEAP 4
X# define WEEK 7
X# define MARCH_DAYS 31
X# define APRIL 3
X# define METONIC 19
X# define BASEYEAR 1900 /* add to localtime(t)->tm_year */
X# define GREGORY 1582 /* start of gregorian calendar */
main(argc, argv)
char ** argv;
X{
X if (argc > 2)
X fprintf(stderr, "usage: easter [year]\n");
X else if (argc == 2)
X easter(atoi(argv[1]));
X else
X easter(this_year());
X exit(0);
X}
this_year()
X{
X register struct tm * lp;
X time_t t;
X extern struct tm * localtime();
X extern time_t time();
X t = time(&t);
X lp = localtime(&t);
X if (lp->tm_mon > APRIL)
X lp->tm_year++;
X return(lp->tm_year + BASEYEAR);
X}
easter(year)
register int year;
X{
X register int epact, fullmoon;
X int golden, century, noleap;
X if (year < GREGORY) {
X printf("%d: before Gregory\n", year);
X return;
X }
X golden = year % METONIC + 1;
X century = year / CENTURY + 1;
X noleap = 3 * century / LEAP - 12;
X epact = (11 * golden + 20 +
X (8 * century + 5) / 25 - 5 /* moon orbit correction */
X - noleap) % 30;
X if (epact < 0)
X epact += 30; /* make remainder positive */
X if (epact == 25 && golden > 11 || epact == 24)
X ++epact;
X fullmoon = 44 - epact;
X if (fullmoon < 21)
X fullmoon += 30;
X fullmoon += WEEK - (
X (int)(5L * year / LEAP) - noleap - 10
X + fullmoon) % WEEK; /* advance to Sunday */
X if (fullmoon > MARCH_DAYS)
X printf("%d April", fullmoon - MARCH_DAYS);
X else
X printf("%d March", fullmoon);
X printf(" %d\n", year);
X}
!
From: mxmora@unix.sri.com (Matthew Xavier Mora)
Subject: fastQSort.p (Pascal port of fastQsort.c)
Source:Pascal
I don't remember seeing this get posted so I'll post another copy. This is the Pascal version of Hadyn Huntley's fastQsort.c that was posted here long ago. Since I posted a request for sorting code in pascal, I thought I should post this for the net in case anyone else is interested.
It could use a better memswap if someone has one.
Matt
{------------------------------------------------}
{ fastQsort port by Matthew Xavier Mora based on }
{ Hadyn Huntley fastqsort.c }
{ posted to C.S.M.P. on 09-28-92 }
{ }
{Researched and written by: }
{ }
{ Haydn Huntley }
{ Haydn's Consulting }
{ 203 West Stone }
{ Fairfield, Iowa 52556 }
{ (515) 472-7025 }
{ }
{ During the school year, I may be reached at the}
{ following address: }
{ }
{ Haydn Huntley }
{ Eigenmann Hall #289 }
{ Indiana University }
{ Bloomington, IN 47406 }
{ (812) 857-8620 }
{ }
{ E-mail: huntley@copper.ucs.indiana.edu }
{------------------------------------------------}
unit fastqsort;
interface
procedure FastQSort (base: Ptr; numberofRecords: longint; sizeofRecord:
longint);
implementation
procedure FastQSort (base: Ptr; numberofRecords: longint; sizeofRecord:
longint);
var
gSize: longint;
tmp, LowerLimit, UpperLimit: ptr;
procedure memSwap (p, q: ptr; qSize: longint);
begin
BlockMove(p, tmp, qSize);
BlockMove(q, p, qSize);
BlockMove(tmp, q, qSize);
end;
function GetSize (var n: longint; last, first: ptr): integer;
begin
n := (ord(last) - Ord(first)) div gsize;
GetSize := n;
end;
function cmpFn (aPtr, bPtr: ptr): integer;
begin
cmpFn := IUMagString(ptr(ord(aPtr) + 1), ptr(ord(bPtr) + 1), gsize,
gsize);
end;
{ ------------- begin doFastQSort ------------------------ }
procedure doFastQSort (first, last: ptr);
var
i, j: Ptr;
n: longint;
begin
{ first, last, i, and j are scaled by gSize in this function }
while ((getsize(n, last, first)) > 1) do
begin
{ use a random pivot }
memSwap(ptr(ord(first) + (abs(random) mod n) * gSize), first, gSize);
i := first;
j := last;
while (true) do
begin
i := Ptr(ord(i) + gSize);
while (ord(i) < ord(last)) and (CmpFn(i, first) < 0) do
i := Ptr(ord(i) + gSize);
j := Ptr(ord(j) - gSize);
while (ord(j) > ord(first)) and (CmpFn(j, first) > 0) do
j := Ptr(ord(j) - gSize);
if (ord(i) >= ord(j)) then
leave;
memSwap(i, j, gSize);
end;
if (j = first) then
begin
first := ptr(ord(first) + gSize);
cycle;
end;
memSwap(first, j, gSize);
if (ord(j) - ord(first)) < (ord(last) - (ord(j) + gSize)) then
begin
doFastQSort(first, j);
first := ptr(ord(j) + gSize);
{ equivalent to doFastQSort (j + gSize, last); }
{ to save stack space }
end
else
begin
doFastQSort(ptr(ord(j) + gSize), last);
last := j;
{ equivalent to doFastQSort (first, j); }
{ to save stack space }
end;
end;
end;
{ ------------- begin FastQsort ------------------------ }
begin
{ setup the global variables used by fastQSort() }
gSize := sizeofRecord;
randseed := TickCount;
tmp := NewPtr(gsize); {temp ptr for swapping}
if tmp <> nil then
begin
{ the args for fastQSort () must be scaled by gSize }
doFastQSort(base, ptr(ord(base) + (numberofRecords) * gSize));
DisposePtr(tmp);
end;
end;
end.
From: steve@taumet.com (Steve Clamage)
Subject: Re: Algorithm for Day-Of-Week NEEDED
barrett@lucy.ee.und.ac.za (Alan P Barrett) writes:
>
In article <2652@bsu-cs.bsu.edu>,
>pickens@bsu-cs.bsu.edu (David B. Pickens) writes:
>> index = ((13 * month - 1) / 5) + day + (year % 100) +
>> ((year % 100) / 4) + ((year / 100 / 4) - 2 *
>> (year / 100) + 77;
>> index = index - 7 * (index / 7);
>Why not index = index % 7 ?
>I see that you try to avoid the mistake that I have seen in many
>implementations of Zeller's Congruence in C. If the +77 were not used
>then the initial index calculation could have a negative result, in which
>case the / and % operators would have an implementation defined meaning.
>With the +77, you are safe until the year 4599 (unless they change the
>rules before then), but it breaks on 1 Mar 4600 (month=1, day=1,
year=4600,
>initial index = 2 + 1 + 0 + 0 + 11 - 92 + 77 = -1).
Here is a C version of Zeller's Congruence which doesn't have that problem:
The day of the week 'W' (0 = Sunday) is given by
W = ((13*M - 1) / 5 + D + Y + Y/4 + C/4 - (2*C)%7 + 7) % 7;
where
C is the century (year/100)
Y is the last two digits of the year (year%100)
D is the day of the month
M is a special month number, where Jan and Feb are taken as
month 11 and 12 of the previous year (subtract 1 from the year)
and each division is truncating division, and cannot be combined.
The properties of modulo arithmetic are such that we may apply it individually to each term of a sum. To guarantee that our sum is positive, we can find (2*C)%7, which is subtracted instead of 2*C, and add an additional 7 before the final %7.
From: fry@tara.harvard.edu (David Fry)
Subject: Re: Better random number than Random()?
In article <1992Sep22.114218.14429@sunfs3.Camex.COM> kent@sunfs3.Camex.COM (Kent Borg) writes:>In article <1992Sep19.012039.15704@husc3.harvard.edu> fry@tara.harvard.edu (David Fry) writes:
>>In article <1992Sep18.105410.24438@midway.uchicago.edu> hd12@midway.uchicago.edu writes:
>[About getting good random numbers, that Random() didn't seem great.]
>
>>Why do you think it isn't uniform? Are you doing anything strange
>>with the results of Random(), like using a "mod" to get a smaller
>>random number? This decreases the quality of your random numbers,
>>although not as much with Random() as with some other RNG's.
>
>So the obvious next question is how *should* one beat too-big random
>numbers down to the range you want? Should one simply reject
>out-of-range numbers and ask again?
To be technically correct, you should linearly normalize them to the range you want. For instance, if your RNG gives you numbers r between a and b, and you want numbers between x and y, compute
( (float)(y-b) )/( (float)(x-a) ) * ( (float)(r-b) ) + (float)y.
For practical purposes this may be too slow, so for many applications and with some RNG's (and the Mac one is pretty good), you can use MOD after all. But it also depends on how large the range you're MOD-ing the numbers into is. For high-brow computer simulations attempting to predict the effects of global warming, I'd use the above formula.
BTW, depending on the type of x and y, you may need to alter the above formula slightly to make sure that your new numbers fit between x and y inclusively or exclusively or however you want it. But be careful to not send too many r's to one end of your range by accident.
From: mjn@pseudo.uucp (Murray Nesbitt)
Subject: Re: Algorithm for Day-Of-Week NEEDED
In-Reply-To: rmh@taligent.com's message of Sat, 19 Sep 1992 01:02:46 GMT
rmh@taligent.com (Rick Holzgrafe) writes:
>
In article <1992Sep18.151922.753@Cigna.COM>, nagel@Cigna.COM (Mark Nagel)
>writes:
>>
>> Here's somthing from my "this-might-be-useful" box:
>>
[Problematic code deleted].
>
>I don't think this is correct... First, running it by hand on today's date
>(Sept. 18, 1992) I get 0 which I presume is "Sunday" - but today is Friday.
>
>I think the leap year calculation is wrong. A year is a leap year if it is
>evenly divisible by 4, but not by 100, unless by 400. (I can never phrase that
>well - but 1900 was not a leap year; 2000 will be; 2100 won't be.)
>
>Even with that, it would be wrong for antique dates. In September of 1752,
>England and its colonies converted from the Julian to the Gregorian calendar.
The following date functions may be helpful. daynumber() is the day
[1-365] within a year, and dayofweek is the day [0-6] within a week.
#define LEAPYEAR(Y) (( !((Y)%4) && ((Y)%100)) || !((Y)%400) )
int dayofyear( int month, int day, int year )
{
int d;
d = ( ( month - 1 ) * 31 ) + day;
if( month > 2 )
{
d -= ( ( 4 * month ) + 23 ) / 10;
if( LEAPYEAR( year ) ) d++;
}
return d;
}
int dayofweek( int month, int day, int year )
{
int weekday;
weekday = dayofyear( month, day, year );
weekday--;
weekday += year + 4 + ( ( year + 3 ) / 4 ); /* Julian Calendar */
if ( year > 1800 )
{
weekday -= ( ( year - 1701 ) / 100 ); /* Clavian correction */
weekday += ( ( year - 1601 ) / 400 ); /* Gregorian correction */
}
if( year > 1752 ) weekday += 3; /* Adjust for Gregorian */
return( weekday % 7 );
}
I've directed Followup-To: comp.programming
--
Murray
From: Bruce.Hoult@bbs.actrix.gen.nz
Subject: Re: Better random number than Random()?
In article <1992Sep18.105410.24438@midway.uchicago.edu> hd12@midway.uchicago.edu writes:> (I did check FAQ before I post this.)
>
> I need a fast (speed is very important) random number generator. Right now
> I am using Random(), but it seems to me the distribution is not uniform
> at all. Is there a better generator? Will it be faster to use my own
> function than to use toolbox routine?
> Thanks for the help.
The toolbox Random() functions is perfectly OK if you use the *seed* as your random value and ignore the function result. OTOH, it's not exactly fast because of the Trap call overhead.
I've included my own random number generator, which requires a 68020 or better.
I've also included my attempt at a random generator that produces "normally" distributed numbers. It's based on the Central Limits Theorem that says the sum of several independant random variables is approximately normally distributed, no matter what distribution the individual variables have. I found that for my purposes the sum of ten uniformally distributed variables was good enough.
The only mildly clever thing about it is that I do a little parallel computing by working out the random numbers in the CPU and then adding them (to get a greater then 32 bit result) in the coprocessor.
Comment is invited on whether this is a good way to generate a normal distribution, or not.
-- Bruce
---------------
procedure MyRandom(var seed:longint);
inline
$225F, {movea.l (a7)+,a1}
$2011, {move.l (a1),d0}
$4C3C, $0401, $0000, $41A7, {mulu.l #41a7,d1:d0}
$4C7C, $0401, $7FFF, $FFFF, {divu.l #7fffffff,d1:d0}
$2281; {move.l d1,(a1)}
procedure MyNormal(
numToAdd:integer;
var seed:longint;
var result:extended
); external;
------------------------------------------------------
machine mc68020
mc68881
MyNormal proc export
link a6,#0
fmove.w #0,fp0
move.l 8(a6),a1 ;pointer to result
move.l 12(a6),a0 ;pointer to seed
move.l (a0),d0
move.w 16(a6),d2 ;number of loops
bra.b loopend
loop mulu.l #16807,d1:d0 ;7^5
divu.l #$7fffffff,d1:d0
fadd.l d1,fp0
move.l d1,d0
loopend dbra d2,loop
move.l d0,(a0) ;update the seed
fmove.x fp0,(a1) ;return the result
unlk a6
move.l (a7)+,a0
add.l #10,a7
jmp (a0)
endproc
end
From: pickens@bsu-cs.bsu.edu (David B. Pickens)
Subject: Re: Algorithm for Day-Of-Week NEEDED
I don't know if this will help, but in one of my programming classes we had to develop an algorithm to do this and I looked up on one of my backup diskettes. This should be ANSI compatible, but I was using Borland C/C++.
Here 'tis:
/*
Function: datename()
Description: This function returns the day of the week for a date.
It is based upon Zeller's Congruence.
Args: int month the month
int day the day
int year the year
Return: char * a pointer to the day of the week
*/
char *datename(int day, int month, int year)
{
static char *days[] = {"SUN","MON","TUE","WED","THU",
"FRI","SAT"};
int index;
if (year < 100) /* assume 1990's if century missing */
year += 1900;
if (month > 2) { /* begin with february */
month -= 2;
}
else {
month += 10;
year--;
}
index = ((13 * month - 1) / 5) + day + (year % 100) +
((year % 100) / 4) + ((year / 100 / 4) - 2 *
(year / 100) + 77;
index = index - 7 * (index / 7);
return days[index];
}
I hope this works, I know I received a decent grade on it.
Dave
From: sakamoto@sm.sony.co.jp (Tomohiko Sakamoto)
Subject: Re: Algorithm for Day-Of-Week NEEDED
Reply-To: sakamoto@sm.sony.co.jp
In article <1992Sep18.214003.13614@boingo.med.jhu.edu>,jkeller@ishtar.med.jhu.edu (Jean Keller x5-8318) says:
> I think there is an error in your leap-year calculation - if year%400 == 0, it is not a leap year.
You are wrong. If year%400 == 0, it is a leap year. The following definition will help you.
#define isleap(y) ((y) % 4 == 0 && (y) % 100 != 0 || (y) % 400 == 0)
From: nagel@Cigna.COM (Mark Nagel)
Subject: Re: Algorithm for Day-Of-Week NEEDED
Keywords: day of week algorithm
In <lbhvboINNgui@paducah.cs.utexas.edu> stevem@cs.utexas.edu (Steve Anthony Mariotti) writes:
>I am currently in need of an algorithm that will return the day-of-week when
>passed the year, month, and day. I used to have an algorithm to do this, but
>my hard drive crash did away with it but good.
Here's somthing from my "this-might-be-useful" box:
/*
* Find and return day of week (0-6)
*
* From: moorer@jacobs.CS.ORST.EDU (Rocky Moore)
*/
int day_of_week(int year, int month, int day)
{
int offsets[13] = { 0,0,3,3,6,1,4,6,2,5,7,3,5 };
int dw;
dw=6+year+((year+3)/4)+offsets[month]+day;
if( ((year%4) ==0) && (month > 2)) dw++;
if( (year==0) && (month < 3)) dw++;
dw=(dw%7);
return(dw);
}
From: Bruce.Hoult@bbs.actrix.gen.nz
Subject: Re: Looking for fast sorting code for pascal
Source:Pascal
In article <Buq1CI.Mnz@world.std.com> aep@world.std.com (Andrew E Page) writes:
>
> As far as the heap collisions are concerned.... The QSort algorithms
> that I've seen in the past depend on recursion. This leads to the
> posibility that your stack is somehow slamming into your heap,
> leading to a Sys Err ID=28.
Recursion is, of course, necessary in QuickSort, but you can easily guarentee that it will go no deeper than log2(n) -- i.e. only 20 levels when sorting a million elements.
Here's my tried and tested Pascal sorting routine (written on the Mac in 1987). I've found it reliable and fast. It tries to be reasonably clever about avoiding bad partitioning, using the minimum of recursion and using the best algorithm for the amount of data to be sorted.
-- Bruce
-------------
procedure sort(
left, right:longint;
function less_than(a,b:longint):boolean;
procedure swap(a,b:longint)
);
var
i, j, mid, median, small_pos: longint;
begin
while (right-left) >= 15 do begin
mid := (left+right) div 2;
{find the Median of left, mid, right}
if less_than(left,mid) then begin
if less_than(mid,right) then median := mid
else if less_than(left,right) then median := right
else median := left
end else begin
if less_than(left,right) then median := left
else if less_than(mid,right) then median := right
else median := mid;
end; {finding the median}
{
partition the region into three:
left <= x <= j: those smaller than median
j < x < i: those equal to median
i <= x <= right: those greater than median
}
i := left;
j := right;
while i <= j do begin
while (i<right) & less_than(i,median) do i := i + 1;
while (j>left) & less_than(median,j) do j := j - 1;
if i <= j then begin
if i < j then begin
{watch for median getting moved under us!}
if median = i then median := j
else if median = j then median := i;
swap(i,j);
end;
{no need to look at these two again}
i := i + 1;
j := j - 1;
end;
end;
{skip over any items equal to the guess of the median}
while (i<right) & not less_than(median,i) do i := i + 1;
while (j>left) & not less_than(j,median) do j := j - 1;
{now sort the two halves}
if (j-left) < (right-i) then begin
{the left half is smaller -- sort it recursively first}
sort(left,j,less_than,swap);
left := i; {prepare for next iteration}
end else begin
{the right half is smaller -- sort it recursively first}
sort(i,right,less_than,swap);
right := j; {prepare for next iteration}
end;
end; {while more than xxx elements to sort}
{now selection sort any remaining elements}
for i := left to right-1 do begin
small_pos := i;
for j := i+1 to right do
if less_than(j,small_pos) then small_pos := j;
if small_pos <> i then swap(i,small_pos);
end;
end; {sort}