# Glide mod control



## joris1974 (Nov 24, 2008)

Hi all,

As I understand it, there is no built-in engine parameters to control glide intensity and timing with KSP. But one can always assign external CCs to the slider and the knob, respectively. Those, however, do not respond linearly to the CCs. From what it looks like, it is some kind of power law. If I want a certain value for the glide slider (cents per octave), how do I calculate the CC value to send. Same question for the glide time knob (as the power law seems different). I'm sure somebody figured it out, and the KSP math lib is suitable to do the trick, but I need some guidance here.

Thanks in advance.

Joris


----------



## Nickie Fønshauge (Nov 25, 2008)

You should be able to control intensity with $ENGINE_PAR_INTMOD_INTENSITY and I am guessing you can control timing with $ENGINE_PAR_INTMOD_FREQUENCY.


----------



## joris1974 (Nov 25, 2008)

Nickie,

Thanks for your response. Yes, that would be the logical way to do it. But after trying (apparently, modulation indices start with Amp. module except for the few source-specific engine pars) and finding this post :
http://vi-control.net/forum/viewtopic.php?t=8621
it seems that CCs may be the only way to go. I suppose, to account for the non-linear scaling, I can record a bunch of (CC,Slider value) pairs in arrays, and do a little interpolation to get the desired CC in my script? Unless the function linking the two is known... and calculable!

Joris


----------



## Nickie Fønshauge (Nov 25, 2008)

I conducted a small test for you and found, that the litteral value *195* as the first argument of the _set_engine_par function affects the Glide Speed knob. I haven't been able to find a corresponding parameter mnemonic, though.

Ex:
``_set_engine_par( 195, <speed value>, <group>, <slot>, <generic> )


----------



## joris1974 (Nov 25, 2008)

Nickie,

Thanks for investigating. Curiously, 195 gives me an invalid tag error. However, I managed to find the correct pointer for the glide intensity slider ($ENGINE_PAR_INTMOD_INTENSITY actually worked, I must have screwed up the slot indices in the past...). So, all is left is the Time/Speed knob control. The other problem is that values sent scale to the slider position instead of the actual parameter value ; e.g. given that 500000 sets glide to 0% and 1000000 sets it to max (12% or 1200 cents), 750000 will set the slider half-way up in position, but to 1.5% (150 cents) in value, instead of the 6% half point (600 cents). Any idea what formula translates between the two?

Joris


----------



## Big Bob (Nov 25, 2008)

Hi Joris,



> Any idea what formula translates between the two?



From the 3 data points you have given in your last post, it looks suspiciously like NI's favorite cubic function (see my volume control study).

Try testing some other data pairs against the following:


```
P = 1200*[(N - 500000)/500000]^3
```

The above cubic function satisfies your 3 data points for N = 500000, 750000 and 1000000 (ie P = 0, P = 150, and 1200) so try it with some other pairs. Who knows, maybe we'll get lucky.

God Bless,

Bob


----------



## Nickie Fønshauge (Nov 26, 2008)

You are ½ right, Joris. It doesn't work in K2.2.4. But it does work in K3, which I used for my test. Another good reason to upgrade :D


----------



## joris1974 (Nov 26, 2008)

Bob,

You are correct and brilliant! I strongly suspected a power law, so I started recording pairs of values by hand into a spreadsheet. And a linear regression of the log-log table did the rest... One question remains : what's the most efficient way to calculate a cube root with KSP? Table, Horner, Newton/Halley iterations? I'll take a look in KSP Math.

Nickie,
Thanks for the precision. I tried to iterate the tag numbers in a loop, just for kicks, and invalid starts at 188 with Kontakt 2.2. Which makes you 3/2 right  Thank you so much for your help. Worst case, I'll revert to the controller assignment method, as soon as I have the scaling formula figured out (which I will need anyway - see comment on slider control).

Joris


----------



## TheoKrueger (Nov 26, 2008)

Hi , sorry for jumping in the thread but i wanted to say a few things too because this glide conversation reminded me of Portamento!! :D

First of all Hi Robert! nice to see you again! It's really been a long time no see, i hope everything is well. I was using SIPS2 the other day and it was really great, i was very happy to see the portamento mode as well but i made a small change to the code so the portamento time knob starts from -2000 instead of 0 and this way i was able to get a much tighter transition between notes, sine the default had a bit of initial time which i could not lessen, especially lets say for solo violin. I am still wondering, isn't there some way to have a sample stretch from note to note in a selected time? Without any crossfades between the notes, just the same sample being stretched up/down to the target note, i'm really curious about this because i have seen it in action (in other synth/samplers) and it works very well, at least for 4-5 semitones stretching.

That's all! Sorry for popping in like this :D

Theo.


----------



## Big Bob (Nov 26, 2008)

> One question remains : what's the most efficient way to calculate a cube root with KSP? Table, Horner, Newton/Halley iterations? I'll take a look in KSP Math.



Joris,

The KSP Math Library, V205 has a cube root function that was added for the very purpose of 'inversing' NI's cubic curve (which incidentally they seem to use instead of the more typical inverse logarithmic (ie exponential) function). The Math Library obtains the cube root using Newton's method as is fully discussed in the Technical Guide. You may also find these threads of some interest.

http://vi-control.net/forum/viewtopic.php?t=8711

http://vi-control.net/forum/viewtopic.p ... trol+study



> I am still wondering, isn't there some way to have a sample stretch from note to note in a selected time? Without any crossfades between the notes, just the same sample being stretched up/down to the target note, i'm really curious about this because i have seen it in action (in other synth/samplers) and it works very well, at least for 4-5 semitones stretching.



Hi Theo,

Like you say, long time no see! Certainly it's possible to do a glide or bend without adjacent-sample crossfading if you are willing to give up the formant-correction.

God Bless,

Bob


----------



## Big Bob (Nov 26, 2008)

Hi Joris,

I just remembered that the cube root function for the Math Library is actually an 'inner' routine of the *VR_to_ep *function. However, it can still be used to compute N as a function of P for your situation by suitable scaling. Here is a sample P to N function based on the library routine *VR_to_ep*.

*function* P2N(p,n)
``VR_to_ep(32*abs(p),n)
``*if* p < 0
````n := -n
``*end if*
``n := 500000 + 37*n/73
*end function*

The *P2N* function accepts P values between -1200 and +1200 and generates N from 0 to 1,000,000.

Alternatively, you could 'factor out' the cube root routine from *VR_to_ep *and build a new routine around it to provide the desired *P2N* function.

God Bless,

Bob


----------



## Big Bob (Nov 27, 2008)

> As long as we're talking cube roots, would the Halley convergence scheme be appropriate?



Hi Joris,

The simple answer to your question is I really don't know because I never even considered it :oops: . If you study the algorithm I used you will no doubt discover that it is sort of a 'hodge podge'. I only used Newton's method to derive the 'integer' part of the root and then used another approximation to obtain the 'fractional' part. At the time I designed this cube root function, its only known purpose was to 'invert' NI's 18db/octave volume control curve. Therefore, I imbedded the cube root function in *VR_to_ep *and tailored its scaling for that purpose alone. Over the needed argument range 0 < VR < 40000, a max of 5 iterations occur with the '1-sigma' iterations being 3 (as shown in Table 7 in the technical guide). Halley's method might lower this a little on average, but, whether or not this would speed up the execution is another matter, since the loop body would need to execute more calculations. I also used a rather crude method to derive the 'initial guess' and perhaps a little more thought about this would help even more to reduce the number of iterations required (but at the expense of more initial calculations).

However, since it now seems that NI has used their cubic function (as a pseudo-exponential) for control of things other than volume, in hind sight, perhaps it would be best to generalize the cube root function and make it available by itself (rather than using a custom-tailored version in the *VR_to_ep *function). I think I may look into doing this later today. So ò¥d   Œ¥d   ŒŽ¥e   Œ¥e   Œ¥e   Œ‘¥e   Œ’¥e   Œ“¥e   Œ”¥e   Œ•¥e   Œ–¥e   Œ—¥e   Œ˜¥e   Œ™¥e   Œš¥e   Œ›¥e   Œœ¥e   Œ¥e   Œž¥e   ŒŸ¥e   Œ ¥e   Œ¡¥e   Œ¢¥e   Œ£¥e   Œ¤¥e   Œ¥¥e   Œ¦¥e   Œ§¥e   Œ¨¥e   Œ©¥e   Œª¥e   Œ«¥e   Œ¬¥e   Œ­¥e   Œ®¥e   Œ¯¥e   Œ°¥e   Œ±¥e   Œ²¥e   Œ³¥e   Œ´¥e   Œµ¥e   Œ¶¥e   Œ·¥e   Œ¸¥e   Œ¹¥e   Œº¥e   Œ»¥e   Œ¼¥e   Œ½¥e   Œ¾¥e   Œ¿¥e   ŒÀ¥e   ŒÁ¥e   ŒÂ¥e   ŒÃ¥e   ŒÄ¥e   ŒÅ¥e   ŒÆ¥e   ŒÇ¥e   ŒÈ¥e   ŒÉ¥e   ŒÊ¥e   ŒË¥e   ŒÌ¥e   ŒÍ¥e   ŒÎ¥e   ŒÏ¥e   ŒÐ¥e   ŒÑ¥e   ŒÒ¥e   ŒÓ¥e   ŒÔ¥e   ŒÕ¥e   ŒÖ¥e   Œ×¥e   ŒØ¥f   ŒÙ¥f   ŒÚ¥f   ŒÛ¥f   ŒÜ¥f   ŒÝ¥f   ŒÞ¥f   Œß¥f   Œà¥f   Œá¥f   Œâ¥f   Œã¥f   Œä¥g   Œå¥g   Œæ¥g   Œç¥g   Œè¥g   Œé¥g   Œê¥g   Œë¥g   Œì¥g   Œí¥g   Œî¥g   Œï¥g   Œð¥g   Œñ¥g   Œò¥g   Œó¥g   Œô¥g   Œõ¥g   Œö¥g   Œ÷¥g   Œø¥g   Œù¥g   Œú¥g   Œû¥g   Œü              ò¥g   Œþ¥g   Œÿ¥g   Œ‚ ¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚¥g   Œ‚	¥g   Œ‚
¥g   Œ‚¥g   Œ‚¥g   Œ


----------



## Big Bob (Nov 28, 2008)

Hi Joris,

I took a quick look at generalizing the cube root function and I think it will be possible to accept any input value over the entire 32-bit signed integer range and then present the output cube root scaled by 10^6 for reasonable precision in the result.

However, for the iterative portion of the routine, if we want to use Halley's method, I think we would have to reduce the allowed input range (perhaps to one third or less) in order to have enough headroom for the interim calculations. Therefore, if we need to reduce the number of iterations, it might be better to stick with Newton's method but put a little more effort into refining the initial estimate for the root.

I'll try to cobble something together later today if I can. When I have something, I'll post it for you to throw darts at :lol: .

God Bless,

Bob


----------



## Big Bob (Nov 29, 2008)

Hi Joris,

Here's my first-cut at a generalized cube root routine. This is pretty close to what I have in mind for the next Math Library release. So, start throwing darts :lol: 

_{ -------------------------------------------------------------------------------
Root3(x,r) Computes the cube-root of any input integer

x = Input integer, -2^31 <= x < 2^31
r = Cube Root of x, scaled by 10^6

This routine uses a hybrid algorithm which determines the integer
part of the cube root using Newton's method and the fractional
part of the root using a mathematical approximation. A 16-value
lookup table is used to obtain an accurate intial estimate of the
root. This approach allows a single Newton iteration to refine the
initial estimate to within +/-1 of the closest integer root (thus
no iteration 'loop' is needed). }_

*function* Root3(x,r)
``BuildR3Table 
``*declare* S```_{ Normalization, bit-trio shift counter }_
``*declare* A```_{ Absolute value of input X }_
``*declare* r1``_{ Temporary for computing root }_
``*declare* M```_{ Rem/r = A/r - r*r, where Rem = A - r*r*r }_
``*declare* M1``_{ Rem/r1 = A/r1 - r1*r1, where Rem = A - r1*r1*r1 }_
``
``*if* x = 0x80000000```_{ Watch out for -2^31, it has no positive counterpart }_
````r := -1290159154``_{ Return pre-computed, negative cube root }_
``*else*
````A := abs(x)```````_{ Force radicand to positive value }_
````*if* A < 2``````````_{ If radicand is 0 or 1, root = X.000000 }_
``````r := 1000000*x
````*else*``````````````_{ Begin calculation of cube root for 1 < A < 2,147,483,648 }_
``````S := 0``````````_{ Normalize A so that top nibble is from 1 to 7 inclusive }_
``````*while* A .*and*. 0x70000000 = 0
````````A := sh_left(A,3)
````````inc(S)
``````*end while*
``````r1 := R3Tbl[sh_right(A,26) - 4]````_{ Initial estimate for integer part }_
``````r := (2*r1 + A/(r1*r1))/3````_{ One iteration needed to get within +/-1 }_
``````M := A/r - r*r```````_{ Determine M = Rem/r after cubing integer root r }_
``````r1 := r + (sh_right(M,31) .*or*. 1)``_{ r & r1 'straddle' the root }_
``````M1 := A/r1 - r1*r1```_{ Determine M1 = Rem/r1 after cubing integer root r1 }_
``````*if* abs(M) > abs(M1)``_{ Use integer closest to root (above or below) }_
````````r := r1```_{ This is done to keep the maximum magnitude of M < 2000 }_
````````M := M1```_{ Thus the calculation 10^6*M is guaranteed < 2^31 }_
``````*end if*``````_{ Using only the Newton integer causes M > 2147 often }_
``````r1 := 1000000*r + 1000000*M/(3*r) _{ Combine with fractional part of root }_
``````r := sh_right(r1,S)``_{ Denormalize result }_
``````*if* x < 0````````
````````r := -r````````````_{ If input is negative, negate the root }_
``````*end if*
````*end if*
``*end if*
*end function* _{ Root3 }_

_{ ---------------------------------------------------------------------------------
BuildR3Table Data Function to declare the R3Tbl used by the
Root3 function to obtain an initial estimate
of the root for the subesequent Newton Iteration. 

NOTE: While only 16 estimates are required by Root3, some of those
estimates are duplicated in the table in order to simplify the
code required to access the table. Thus the actual array
contains 28 elements. }_

*function* BuildR3Table``_{ Cube-Root table of initial estimates }_
``*declare* global R3Tbl[28] := (671,717,758,795,829,861,890,917,956,956, ...
````````````````````1002,1002,1045,1045,1084,1084,1121,1121,1156,1156, ...
````````````````````1204,1204,1204,1204,1263,1263,1263,1263)
*end function* _{ BuildR3Table }_

BTW On my computer, this routine executes in under 1.9 usec (with K2)

God Bless,

Bob

Updated 12-4-08


----------



## joris1974 (Dec 1, 2008)

Bob,

Thanks for looking into this. I was myself working on some recipe. I'm not sure I follow the table lookup thing, but it sounds interesting. Here is a function (without the fool-proofing offered by yours). It finds the integer part first (exact value of floor(x^1/3)) and uses an "explicitation" scheme to solve the equation to the remainder for the fractional part. It sounds very flaky, like a lot of explicit methods, but I still get a relative error <1e-5. I think I got similar benchmarks (my CPU may be slower than yours, your function yielding 26ms on average for 10000 extractions). Here it is :

*function* FastCbrt(x,r)
``*declare* s := 30 
``*declare* a
``*declare* b

``*declare* M
``*declare* rsq
``r:=0
``a:=x
``*while* s>=0 _{calculate integer part of the root}_
````r:= sh_left(r,1)
````b := sh_left((3*r*(r + 1) + 1),s) 
````s := s - 3 
````*if* (a >= b) 
```````a := a - b 
```````inc(r)
````*end if*
``*end while*
``
``rsq:=r*r
``M:=x-r*rsq _{remainder}_
``r:=1000*r+(1000*(M-M*M/(3*r*rsq)))/(3*rsq) _{fractional part f is obtained by solving 
the quadratic equation 3r.f^2+3r^2.f-M=0 This is done by neglecting the quadratic term, 
solving in f,i.e. f=M/3r^2, and reinjecting the quadratic term M^2/3r^3.
Finally, the resulting linear equation is solved : f=(M-M^2/3r^3)/3r^2
}_ 

*end function*

Obviously less accurate or elegant than yours 

Joris


----------



## Big Bob (Dec 1, 2008)

> your function yielding 26ms on average for 10000 extractions).



Hi Joris,

How are you measuring the execution time? For a compute-bound process, the engine uptime is not reliable (if that is what you are using). The way I measured execution time is as follows: put the Root3 function in a loop that runs it through 10,000,000 values for X. To 'work-around' the KSP watchdog, arrange it so that for every 10,000 executes of Root3, call the KSP wait(1) function. This will keep the watchdog from terminating the loop but insert only a neglible overhead (a few microseconds per 10,000 executions). With this setup on my computer, the 10,000,000 executes take about 16 seconds of wall time (with only K2/3 running). I didn't bother trying to measure the time to run with a null Root3 function, so the execution time is conservative.

Thanks for posting your 'recipe' :lol: I'll study your code later today and post my comments.

God Bless,

Bob


----------



## Big Bob (Dec 1, 2008)

Hi Joris,

I took a look at your 'recipe' and did some comparison shopping. First off, let me say that my crude execution time estimate of 1.6us was a bit understated because I only ran a quick test with normalized input numbers. Therefore, my normalization loop didn't have to run at all. This also allowed me to use a single wait(1) for each 10,000 calls to Root3. Apparently the KSP allows something just over 40,000 'nested' iterations, so when the normalization loop runs the limit (10 iterations), we can only make about 4000 calls before calling wait(1).

However, even with unormalized input numbers, it looks like Root3 still runs under 2usec on my computer.

Here is the test code I used to compare Fastcbrt with Root3.

*on init*
``*declare* ui_label Info (2,5)
``move_control(Info,2,1)
``*declare* ui_button Calc
``*declare* ui_button Run
``*declare* ui_value_edit X (-2147483648,2147483647,1)
``*declare* ui_label VX (1,1)
``move_control(X,1,1)
``move_control(VX,1,2)
``move_control(Calc,1,3)
``move_control(Run,1,5)
``set_text(VX,'X='&X)
``*declare* r
``*declare* j
``*declare* k
``message('')
*end on*

*on ui_control* (Calc)
``Calc := 0
``FastCbrt(X,r)
``set_text(Info,'root = ' & r)
*end on* _{ Calc }_

*on ui_control* (X)
``set_text(VX,'X='&X)
*end on* _{ X }_

*on ui_control* (Run)
``set_text(Info,'')
``Run := 0
_{ for j := 2 to 10000001 }_
``*for* j := 0x1000000 *to* 0x1000000 + 9999999
````FastCbrt(j,r)
````inc(k)
````*if* k > 4000
``````k := 0
``````wait(1)
````*end if*
``*end for*
``set_text(Info,'Done')
*end on* _{ Run }_

I ran the test over two input argument ranges of 10M extractions each. Test #1 used lower values with x = 2 to 10000001. Test #2 used higher values with x starting at 0x10000000 (268,435,456 decimal). The times I got for the ten million extractions are as follows:

```
Test #1:  Root3 = 19.5 secs          Fastcbrt = 53.5 secs
Test #2:  Root3 = 18.5 secs          Fastcbrt = 52.5 secs
```

In the light of this, maybe you might want to rename your function to Slowcbrt  

I also notice that the accuracy of Fastcbrt is not very uniform especially for smaller numbers. Here are some X values that I just randomly chose in order to compare the accuracy of the two routines.


```
Radicand       Fastcbrt          Root#         HP15C
       2            1.333        1.259921     1.259921050
       5            0.667        1.709976     1.709975847
      12            2.333        2.289428     2.289428485
     100            4.625        4.641588     4.641588833
     954            9.831        9.844253     9.844253562
  150000           53.133       53.132926    53.13292844
  280000           65.421       65.421332    65.42132618
```

Let me know if you concur.

God Bless,

Bob


----------



## joris1974 (Dec 1, 2008)

Bob,

Thank you for pointing out the benchmarking method. I did the precision test with random numbers in a spreadsheet, but the sample of test radicands was certainly too "normal", pun intended  to exhibit the flaws. I'll try to isolate the bottleneck to see if it's worth going for the exact integer root as a first step. I was trying to get decent results for numbers above 1000 (say I am targeting values between 0 and 1200, then scaled by a factor 1000), so perhaps I got naive with the idea that losing precision below that would be OK. I suppose I could try to reduce the acceptable range and do the integer root over 21 bits (drop from 10 to 7 iterations). The other idea that I remembered was to boost Newton with a confinement method e.g. max(floor,min(ceiling,iteration term)), but that adds the cost of a min and max. I'll keep you posted when I have a few more tests done. But so far, you're right, Slowcbrt it is :oops: Time for some rest after a 2 hr-rehearsal at the symphony!

Joris


----------



## Big Bob (Dec 2, 2008)

Hi Joris,

Here's an interesting data point, I repeated the execution time tests with K3 (I had been testing with K2 because I still don't completely trust K3), and lo and behold, K3 is noticeably (about 20%) slower :( With the small number series, K2 runs Root3 in less than 1.9usec while K3 takes 2.3usec (FastCbrt takes 6.7usec). If you have been testing with K3, this might account for your slower execution times



> I was trying to get decent results for numbers above 1000 (say I am targeting values between 0 and 1200, then scaled by a factor 1000), so perhaps I got naive with the idea that losing precision below that would be OK.



By all means if what you are trying to do is to cook up a routine that will convert pitch to ep, FastCbrt may be just fine. It certainly has the benefit of being compact (if not accurate and fast). However, I must have misunderstood what you wanted. Since you were suggesting Halley's method for faster convergence, I thought you were interested in speed. For the Math Library, I'm interested in both speed and precision (both of which Root3 seems to provide). It's not as compact as FastCbrt but I don't want to sacrifice speed or precision for compactness. Of course I still need to run an exhaustive error analysis by plugging Root3 into my Delphi program which I hope to do later today if time allows. Incidentally, I decided to delete a few frills and thus trimmed down the number of code lines, so I edited my prior post with the latest version. I decided the detection of 'perfect cubes' wasn't of sufficient value to warrant the extra lines of code. I also replaced the alternate integer loop conditionals with a one-line equivalent. The only penalty for that is less clarity for anyone reading the code.



> The other idea that I remembered was to boost Newton with a confinement method e.g. max(floor,min(ceiling,iteration term)), but that adds the cost of a min and max.



That is essentially what Root3 does, it 'straddles' the root with the two closest integers. Then the integer that is the closest in magnitude (whether above or below the actual root) is used with a signed offset fraction to complete the calculation. This holds the maximum M/r magnitude to less than 2000 which can then safely be multiplied by 1000000. If only the low integer root is used, the M/r magnitude can easily exceed 2147 and arithmetic overflow becomes a problem when calculating the fractional approximation (if high accuracy is desired).



> I'm not sure I follow the table lookup thing, but it sounds interesting.



Using the Newton iteration loop exit criteria that two passes in a row agree with each other within +/- one, the higher normalized radicand ranges from 0x60000000 to 0x6FFFFFFF and 0x70000000 to 0x7FFFFFFF require only a single initial estimate value of 1204 and 1263 respectively to exit the loop in two or less passes. Unfortunately, the lower normalized radicands converge a little more slowly and thus require more initial estimate values. For the normalized zones where the first hex digit is 3, 4, or 5, the ranges have to broken into two sections each. The normalized zones where the first hex digit is 1 or 2 converge the slowest and require that the zones be broken into quarters (to keep the Newton iterations to two or less). Thus, the lookup table needs to provide 16 different initial estimates. However, when this is done, the iteration loop never runs more than twice over the entire normalized integer range from 0x10000000 to 0x7FFFFFFF. And since the exit criteria is that two consecutive estimates will not differ by more than one, only the first iteration need be performed to find the integer root just above or just below the actual root.

Using Root3, the next revision of the Math Library will then implement VR_to_ep and Pitch_to_ep something like this:

*function* VR_to_ep(VR,N)
``*if* VR <= 0
````N := 0
``*else* *if* VR > 40000
````N := 1000000
``*else*
````Root3(25*VR,N)
````N := (N + 50)/100
``*end if*
*end function* _{ VR_to_ep }_

*function* Pitch_to_ep(P,N)
``*if* abs(P) <= 1200
````Root3(104167*P,N)
````N := N/1000 + 500000
``*else* *if* P < 0
````N := 0
``*else*
````N := 1000000
``*end if* 
*end function* _{ Pitch_to_ep }_

God Bless,

Bob


----------



## joris1974 (Dec 2, 2008)

Bob,

Thank you for all the clarifications. Hex and bin doesn't come easy to a former CFD engineer, but I think I got it for the most part. I did my testing with your routine, and the previous figures were flawed. So, I now get comparable exec times (using K2).



> Incidentally, I decided to delete a few frills and thus trimmed down the number of code lines, so I edited my prior post with the latest version. I decided the detection of 'perfect cubes' wasn't of sufficient value to warrant the extra lines of code.



Yes, I realized that the odds of dealing with perfect cubes are small enough that doing the actual computation for those would have a marginal impact. 



> For the Math Library, I'm interested in both speed and precision (both of which Root3 seems to provide). It's not as compact as FastCbrt but I don't want to sacrifice speed or precision for compactness.



Now, regarding compactness, I admit that the aspect had a certain appeal, but if it doesn't work... Anyway, I did manage to implement a Halley correction to your routine, to bypass the 'root straddling', and I think the precision is good while being slightly faster (by 3-4 secs with your test), perhaps because it uses 6 multiplications instead of 9 (after initial guess), and only one more division. So, here it is :

*function* AltRoot3(x,r)``
``BuildR3Table
``*declare* S```_{ Normalization, bit-trio shift counter }_
``*declare* A```_{ Absolute value of input X }_
``*declare* r1``_{ Temporary for computing root }_
``*declare* rr``_{ Temporary for r^2 }_
``*if* x = 0x80000000```_{ Watch out for -2^31, it has no positive counterpart }_
````r := -1290159154``_{ Return pre-computed, negative cube root }_
``*else*
````A := abs(x)```````_{ Force radicand to positive value }_
````*if* A = 0``````````_{ If radicand is 0, root = 0 }_
``````r := 0
````*else*``````````````_{ Begin calculation of cube root for 1 < A < 2,147,483,648 }_
``````S := 0``````````_{ Normalize A so that top nibble is from 1 to 7 inclusive }_
``````*while* A .*and*. 0x70000000 = 0
````````A := sh_left(A,3)
````````inc(S)
``````*end while*
``````r1 := R3Tbl[sh_right(A,26) - 4]``_{ Get estimate for integer part of cube root }_
``````r := (2*r1 + A/(r1*r1))/3``_{ Only one Newton iteration is needed to get within 1 }_
``````rr := r*r
``````r1:=1000000*r+(1000000*(A/r-rr))/(A/rr+2*r) _{Halley for the fractional part}_
``````r := sh_right(r1,S)```_{ Denormalize result }_
``````*if* x < 0````````
````````r := -r```````_{ If input is negative, negate the root }_
``````*end if*
````*end if*
``*end if*
*end function*

Off to some more testing...
Best,

Joris


----------



## Big Bob (Dec 2, 2008)

Hi Joris,

I think the very large errors reported by my Pascal program may be due to KSP arithmetic overflow. The value of A/r - rr can exceed 2147. For example, when the radicand is 368600377, the floor integer is r = 716. Thus, A/r - rr is 2148 using KSP arithmetic. Multiply that by 1000000 and there's your overflow. This being the case, perhaps if we scale up by one million in two steps, we might be able to avoid the overflow but still maintain some reasonable precision.

I'm expecting to have a little more time just before bed tonight and if so, I'll pursue this.

God Bless,

Bob

*Late Edit:* The problem turns out to be essentially the same as I had with *Root3* before I changed to straddling the root with the closest pair of consecutive integers. Basically, when you use the complementary integers and select the closest one to the root (above or below), the maximum value that *M/r* ever reaches is 1935. Whereas, if you only use the floor integer, the value of *M/r* can reach nearly twice that for some radicands.

With your Halley approximation, the culprit is *A/r -rr*. Since my *M = A - rrr*, *M/r* and *A/r -rr *are the same thing. So, since *AltRoot* only uses one integer (sometimes ceiling, sometimes the floor), *A/r - rr *can reach as high as 3870 and anything over 2147 will overflow (when multiplied by 10^6). To verify this, I changed the new code line to the following:

``````r1:=1000000*r+((500000*(A/r-rr))/(A/rr+2*r))*2 _{Halley for the fractional part}_ 

And, I'm very pleased to report that all the huge errors vanished! :D In fact, the error data is now pretty much on a par with that of *Root3* (*AltRoot3* is in some cases slightly less accurate but they are very close). So, it looks like we may be able to redeem this thing after all o-[][]-o . When I'm fresh tomorrow, I'll see if I can improve on this in any way (the 2*500000 was just the first thing that occured to me).


----------



## joris1974 (Dec 2, 2008)

Bob,

Just a (edit: not-so-) quick note before more details (when I have a bit more time, that is...). 



> The value of A/r - rr can exceed 2147. For example, when the radicand is 368600377, the floor integer is r = 716. Thus, A/r - rr is 2148 using KSP arithmetic. Multiply that by 1000000 and there's your overflow.



I noticed your comment on the condition on M/r to avoid overflow, and tried to massage the Halley formula to get there without trouble. Oops #1



> This being the case, perhaps if we scale up by one million in two steps, we might be able to avoid the overflow but still maintain some reasonable precision.



I had a first attempt that was successful, using a two-step scaling as you indicate : 1000*((1000*x)/y) instead of (1000000*x)/y. I figured I'd be smart and jump to the latter, thus saving a multiplication, assuming that the M/r guaranteed upper bound would save me from trouble. Oops #2 :oops: 



> I wonder if maybe we could try 2 Haley passes some way without getting into arithmetic headroom problems?



Yes, indeed, I am pursuing the double Halley idea, and from a few manipulations on paper, I also have good hope that a multiple-step scaling (well, let's just call that a potentially clever use for product associativity) would make a quadric convergence (or third-order Householder) scheme possible. One thing that I never asked you, though, is : how big a range for the radicand do you need, and how much absolute precision do you expect ? (the rest, mostly speed, is less predictable, so the former should be the primary specifications of the algorithm). That would give me a better way of "eyeballing" convergence criteria.

Other musings : 
1) I'll see if I can shorten the loop from FastCbrt, by feeding it with your normalized radicand, and see when the "useful" part of the loop kicks in. 

2) your lookup table could be generalized to be optimal for a n-th order Householder - so different table for a different initial scheme 

3) in the case of a double-Halley, one could be using the same r*r value from the first pass - but correcting the said value by a factor theta for the second (hence the process's name, two-step theta explicitation, mostly used for ODEs and PDEs, but where there's convergence, there's a chance  )

On those well-intended thoughts, I'll wish you a good night.

Joris


----------



## Big Bob (Dec 3, 2008)

Good Morning Joris,



> One thing that I never asked you, though, is : how big a range for the radicand do you need, and how much absolute precision do you expect ?



Since my intention was to generalize the cube root function, I didn't want to make any more assumptions than necessary. So the blue-sky generalization is that the radicand cover the entire 32-bit integer range. As to precision, since the cube root of the maximum radicand is around 1290, the max precision that the root could have would be obtained by scaling it up by a factor of about 1.6M or so. For human convenience I thus chose !0^6 for output scaling. This then sets the upper bound on precision. However, I realize that we may not end up quite that high.

I also realize that we might improve our precision if we reduce the radicand range. However, it appears that we are very close to covering the full range with near maximum precision, so I think we should agonize over this a bit more before we yield on either range or precision. If we can achieve something near the 'blue-sky' objectives, we'll have all the bases covered for as yet 'unforseen uses' in the future. After all, when I cobbled the prior cube root algorithm together, I thought its only purpose would be in the *VR_to_ep *routine and then you popped up with the *Pitch_to_ep *thing. Who knows what might be next, so why not keep this routine as general as possible? Especially since we are so close :D 



> (or third-order Householder) scheme possible.



Boy, you are really making me clean out the cobwebs in my brain :lol: . I retired nearly 20 years ago now, and I hate to tell you how much I have forgotten and how poorly my brain seems to remember new things these days. So I really appreciate the mental excercize you are giving me. :shock: 

And now, while I'm still alert, let me pursue some of the thoughts I had on this just before drifting off to sleep last night :roll: .

Have a great day my friend,

God Bless,

Bob


----------



## Big Bob (Dec 4, 2008)

Hi Joris,

I didn't get as much time to work on this yesterday as I thought I would, but, here's a quick summary of what I did discover.

First some terminology. The algorithm has two phases, the derivation of the integer part and the derivation of the fractional part of the root. For the first phase, let's define two types which I'll name *One Sided *and *Two Sided*. *Root3* as originally posted finds the two consecutive integers that straddle the root and then chooses the integer closest to the root (either above or below). So let's call this technique *Two Sided*. *AltRoot3* finds only one of the two integers without regard to whether it is the closest to the root or whether it's above or below the root, so let's call this technique *One Sided*.

For deriving the fractional part of the root, *Root3* uses a simple binomial approximation whereas *AltRoot3* uses a Halley approximation. So let's just call these two methods *Binomial* and *Halley* respectively.

Yesterday, I tested a number of combinations of these ideas with various scaling and arithmetic-ordering ideas and, at least so far the results lean in the following direction. *Two-Sided Binomial *produces the best overall accuracy but is less compact. *One-Sided Halley *runs a close second for accuracy and has the advantage of using fewer code lines (about 6 less than *two-sided binomial*) as well as fewer local variables. The accuracy of the *one-sided Halley *can be nudged closer to *Root3* but only by adding additional code lines. I haven't yet repeated any execution time comparisons but I don't expect any significant differences. Once this all settles down to two clear candidates, I'll run some final benchmarks.

What I would really like to find is a one-sided method that always produces the nearest integer directly without the need to examine the complementary integer and comparing to see which is closest. I tried several ideas along these lines yesterday but without any success.

Today I hope to try a few more things such as using Halley instead of Newton to see if the lookup table size can be reduced. Since only one Newton iteration is used in *Root3*, using Halley won't speed that up but it may broaden the initial estimate coverage and thereby potentially reduce the table size. And, who knows, maybe one of us can dream up an elegant, one-sided, closest integer scheme.

To be continued .........

God Bless,

Bob

BTW I rearranged the arithmetic a little for *Root3* because it improved the accuracy a bit. I updated the original post to reflect the changes.


----------



## Big Bob (Dec 5, 2008)

Hi Joris,

*The overall winner is the one-sided Halley*. The 2-sided binomial has slightly better precision but not enough to justify the extra code lines and execution time penalty. However, 6 decimal digits for the fraction is more than we are utilizing (considering the overall best accuracy obtained) and so I reduced the output scaling to 10^5 .

Even with this output scaling change, the 2-sided binomial still had the best precision overall but the one-sided Halley was a very close second. The 2-sided binomial, while more precise, uses 6 more lines of code and 2 more local variables. Moreover its execution time is on the order of 1.9 to 2.0 usec compared with 1.5 to 1.6 usec for the one-sided Halley. Therefore, I've decided to use the one-sided Halley for the new *Math Library's *standard *Root3* routine.

If there is some situation for which a bit more accuracy is needed, it can easily be obtained by replacing the two code lines devoted to the fractional calculation. Using a two-stage division of *A/r* (to pre-scale *M* by 1000), the precision of the one-sided Halley can be improved sufficiently that it becomes even more precise than the 2-sided binomial. This high-precision (HP) version of the one-sided Halley can be attained with no increase in the code-line count. However, there is an increase in execution time. The *HPRoot3* routine has an execution time on the order of 1.8 usec. But this is still faster than the 2-sided binomial and just as compact as the new *Root3*. Therefore, there would be little point in retaining the 2-sided binomial version.

I'll post the updated, *V210 Math Library *as soon as I can find the time to update the Technical Guide. In the meantime, here is the new *Root3* routine.

_{ -------------------------------------------------------------------------------
Root3(x,r) Computes the cube-root of any input integer

x = Input integer, -2^31 <= x < 2^31
r = Cube Root of x, scaled by 10^5 

This routine uses a hybrid algorithm which determines the integer
part of the cube root using Newton's Method and then derives the
fractional part using a single iteration of Halley's Method. By
using a 16-value lookup table to obtain the intial estimate for
the root, only one Newton iteration is required. See the
Technical Guide for a detailed discussion of this algorithm. }_

*function* Root3(x,r)
``BuildR3Table 
``*declare* S```_{ Normalization, bit-trio shift counter }_
``*declare* A```_{ Absolute value of input X }_
``*declare* M```_{ M = Rem/r = A/r - r*r, where Rem = A - r*r*r }_
``
``*if* x = 0x80000000```_{ Watch out for -2^31, it has no positive counterpart }_
````r := -129015915```_{ Return pre-computed, negative cube root }_
``*else*
````A := abs(x)```````_{ Force radicand to positive value }_
````*if* A < 2``````````_{ If radicand is 0 or 1, root = X.000000 }_
``````r := 100000*x
````*else*``````````````_{ Begin calculation of cube root for 1 < A < 2,147,483,648 }_
``````S := 0``````````_{ Normalize A so that top nibble is from 1 to 7 inclusive }_
``````*while* A .*and*. 0x70000000 = 0
````````A := sh_left(A,3)
````````inc(S)
``````*end while*
``````r := R3Tbl[sh_right(A,26) - 4] _{ Initial estimate for integer part }_
``````r := (2*r + A/(r*r))/3`````````_{ Only one iteration needed to converge }_
``````M := A/r - r*r`````````````````_{ M = Rem/r, where -2859 < M < 3790 }_
``````r := 400000*r + 400000*M/(3*r + M/r)``_{ Combine with fractional part }_
``````r := sh_right(sh_right(r,S) + 2,2)````[/colorò¨ù   <N¨ù   <O¨ù   <P¨ù   <Q¨ù   <R¨ù   <S¨ù   <T¨ù   <U¨ù   <V¨ù   <W¨ù   <X¨ù   <Y¨ù   <Z¨ù   <[¨ù   <\¨ù   <]¨ù   <^¨ù   <_¨ù   <`¨ù   <a¨ù   <b¨ù   <c¨ù   <d¨ù   <e¨ù   <f¨ù   <g¨ù   <h¨ù   <i¨ù   <j¨ù   <k¨ù   <l¨ù   <m¨ù   <n¨ù   <o¨ù   <p¨ù   <q¨ù   <r¨ù   <s¨ù   <t¨ù   <u¨ù   <v¨ù   <w¨ù   <x¨ù   <y¨ù   <z¨ù   <{¨ù   <|¨ù   <}¨ù   <~¨ù   <¨ù   <€¨ù   <¨ù   <‚¨ù   <ƒ¨ù   <„¨ù   <…¨ù   <†¨ù   <‡¨ù   <ˆ¨ù   <‰¨ù   <Š¨ù   <‹¨ù   <Œ¨ù   <¨ù   <Ž¨ù   <¨ù   <¨ù   <‘¨ù   <’¨ù   <“¨ù   <”¨ú   <•¨ú   <–¨ú   <—¨ú   <˜¨ú   <™¨ú   <š¨ú   <›¨ú   <œ¨ú   <¨ú   <ž¨ú   <Ÿ¨ú   < ¨ú   <¡¨ú   <¢¨ú   <£¨ú   <¤¨ú   <¥¨ú   <¦¨ú   <§¨ú   <¨¨ú   <©¨ú   <ª¨ú   <«¨ú   <¬¨ú   <­¨ú   <®¨ú   <¯¨ú   <°¨ú   <±¨ú   <²¨û   <³¨û   <´¨û   <µ¨û   <¶¨û   <·¨û   <¸¨û   <¹¨û   <º¨û   <»¨û   <¼¨û   <½              ò¨û   <¿¨û   <À¨û   <Á¨û   <Â¨û   <Ã¨û   <Ä¨û   <Å¨û   <Æ¨û   <Ç¨û   <È¨ü   <É¨ü   <Ê¨ü   <Ë¨ü   <Ì¨ü   <Í¨ü   <Î¨ü   <Ï¨ü   <Ð¨ü   <Ñ¨ü   <Ò¨ü   <Ó¨ü   <Ô¨ü   <Õ¨ü   <Ö¨ü   <×¨ü   <Ø¨ü   <Ù¨ü   <Ú¨ü   <Û¨ü   <Ü¨ü   <Ý¨ü   <Þ¨ü   <ß¨ü   <à¨ü   <á¨ü   <â¨ü   <ã¨ü   <ä¨ü   <å¨ü   <æ¨ü   <ç¨ü   <è¨ü   <é¨ü   <ê¨ý   <ñ¨ý   <ò¨ý   <ó¨ý   <ô¨ý   <õ¨ý   <ö¨ý   <÷¨ý   <ø¨ý   <ù¨ý   <ú¨ý   <û¨ý   <ü¨ý   <ý¨ý   <þ¨ý   <ÿ¨ý   = ¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =	¨ý   =
¨ý   =¨ý   =¨ý   = ¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   =¨ý   = ¨ý   =!¨ý   ="¨ý   =#¨ý   =$¨ý   =%¨ý   =&¨ý   ='¨ý   =(¨ý   =)¨ý   =*¨ý   =+¨ý   =,¨ý   =-¨ý   =.¨ÿ   <ë¨ÿ   <ì¨ÿ   <í¨ÿ   <î¨ÿ   <ï¨ÿ   <ð              ò¨ÿ   =8¨ÿ   =9¨ÿ   =:¨ÿ   =;¨ÿ   =<¨ÿ   ==¨ÿ   =>¨ÿ   =?¨ÿ   [email protected]¨ÿ   =A¨ÿ   =B¨ÿ   =C¨ÿ   =D¨ÿ   =E¨ÿ   =F¨ÿ   =G¨ÿ   =H¨ÿ   =I¨ÿ   =J¨ÿ   =K¨ÿ   =L¨ÿ   =M¨ÿ   =N¨ÿ   =O¨ÿ   =P¨ÿ   =Q¨ÿ   =R¨ÿ   =S¨ÿ   =T¨ÿ   =U¨ÿ   =V¨ÿ   =W¨ÿ   =X¨ÿ   =Y¨ÿ   =Z¨ÿ   =[¨ÿ   =\¨ÿ   =]¨ÿ   =^¨ÿ   =_¨ÿ   =`¨ÿ   =a¨ÿ   =b¨ÿ   =c¨ÿ   =d¨ÿ   =e¨ÿ   =f¨ÿ   =g¨ÿ   =h¨ÿ   =i¨ÿ   =j©    =/©    =0©    =1©    =2©    =3©    =4©    =5©    =6©    =k©    =l©    =m©    =n©    =o©    =p©    =q©    =r©   =s©   =t©   =u©   =v©   =w©   =x©   =y©   =z©   ={©   =|©   =}©   =~©   =©   =€©   =©   =‚©   =ƒ©   =„©   =…©   =†©   =‡©   =ˆ©   =‰©   =Š©   =‹©   =Œ©   =©   =Ž©   =©   =©   =‘©   =’©   =“©   =”©   =•©   =–©   =—©   =˜©   =™©   =š©   =›©   =œ©   =©   =ž©   =Ÿ              ò©   =¡©   =¢©   =£©   =¤©   =·©   =¸©   =¹©   =º©   =»©   =


----------



## joris1974 (Dec 5, 2008)

Bob,

I tried, as you suggested, to "predict" the closest root estimate, but after 1) firing up Maxima (one can only have so much fun expanding rational fractions by hand) 2) downloading FreePascal and dusting off my Pascal skills after a 15-year hiatus, after which I comforted myself with the fact that F77 or K&R/ANSI C were SO much easier for that kind of stuff, I got nowhere :oops: As it turns out, one can back-track from the Newton iteration, and establish a distance condition on the two candidates, but it yields an unreductible rational fraction with a 7th degree numerator... yuck. If I have some spare time, I might get to the bottom of it for future reference (just because I won't be satisfied until I reach that bottom, figuratively speaking). I am glad you liked my suggestion of a one-sided solution, albeit not as precise, and above all very happy to have been of some value. Your contributions to the KSP community certainly dwarf mines, and just being a small part of it is a delight o-[][]-o .

I'll post my instrument/keyswitch range checker as soon as I recover from this one (if a cube root ever pops up in the classes I teach, I will now censor them!).

It is truly a pleasure to work with you. Best,

Joris


----------

