FIRST DIGIT SIMULATION
Author: Roland Engdahl
University of Kalmar
Sweden
mr.engdahl@telia.com
Background
'First Digit' is the first digit, not equal zero, in a number. i e the first significant digit. e g in 0.0003457 the first digit is 3.
HistoryNewcomb noticed in 1881 that in logarithm tables the first pages, containing numbers starting with 1, were more dirty than later pages.Benford rediscovered the properties in 1938. He formulated a law for the occurence of first digits:Benfords Law: The probability that the first digit is k is log(1+1/k) ( log(a) = There are two aspects of Benfords law:1. Application to numerical data2. Numerical calculation
Application to numerical data
On the WEB we can get many applications to numerical data e g physical constants,
stock market, rivers area, population, Am. league, addresses, squareroots, death rate, newspapers.
A more interesting application of Benfords principle is fraud detection in business accounts.
Numerical calculation
Before 1600 it was hard to do the product of several factors, even worse to divide.
John Napier (15501617) invented logarithms. In tables you could find the logarithm for a number.
To multiply numbers you had to add the logarithms and to divide you had to subtract the logarithms.
At last you had to go the other way in the tables, finding the 'antilogarithm'.
This method was common in schools and applications until the seventies.
An important facility was the slide rule. The scale is logarithmic, so the distance between 1 and 2 is the same as the distance between 2 and 4 etc
To multiply is to add distances on the slide rule, to divide is to subtract distances on the slide rule.
After some multiplications or divisions of numbers taken at random the probability to end betwwen 1 and 2 is about 30 percent (log(2)).
If you are the owner of a slide rule compare the distance between 1 and 2 with the distance between 1 and 10.
Simulation with random numbers
Quotation from Donald E. Knuth, The Art of Computer Programming vol 2 p 173
The most prudent policy for a person to follow is to run each Monte Carlo program at least twice
using quite different sources of random numbers, before taking the answers of the program seriously;
this not only will give an indication of the stability of the results,
it also will guard against the danger of trusting in a generator with hidden deficiencies.
(Every random number generator will fail in at least one application.)
Donald E. Knuth. The Art of Computer Programming, vol 2. Ch 3 Random Numbers Ch 4.2.4
Theory about leading digits p 238
You can select different random number generators.
OBSERVE: The result of calling rand() depends of the version of Maple:
Early version of Maple is here Maple versions up to and including 9.5 Eversion
Later version of Maple is here Maple from version 10 Lversion
Simulation with Builtin generators
In RandomTools[Linear Congruence] the generator is of the linear congruence type
x[k+1] = m*x[k] mod p
p = 999999999989 is the greatest prime with 12 digits, m = 427419669081 multiplier
For short we call this generator for single
In Eversion when calling rand() you get version RandomTools[Linear Congruence] i e single generator
In Lversion when calling rand() you get version RandomTools[MersenneTwister]{GenerateInteger]
This generator is not of LinearCongruence type
RandomTools[MersenneTwister][NewGenerator] is applicable in Lversion and gives the same result as rand()
Simulation with triplegenerator
The generator triple consists of three generators of linear congruence type in parallel for 'smoothing out'
In Eversion
Simulation in Builtin generator: single and rand are the same procedure: NewGenerator not applicable
Simulation in triplegenerator: same result as Lversion (LinearCongruence)
Simulatiom:Another approach... not applicable
In Lversion
Simulation in Builtin generator: single and rand are quite different procedures:
Simulation in triplegenerator: same result as Eversion (LinearCongruence)
Simulatiom:Another approach: applicable RandomTools[MersenneTwister][NewGenerator] very fast
We can compare the results from the simulations with different generators. See the results at Comments and results.
If you take only one factor (nrf = 1) in the product the relative frequency of first digits will be (almost) equal for all digits 1..9.
This is an indication of that the random number generator passes the frequency test..
Let the random number generator produce a great many numbers. The computer calculates the product of these random numbers.
At last we look at the first digit (first nonzero digit) in the product. We study the distribution of these first digits.
Choose the number of factors in nrf
Choose the number of trials in trial
Simulation with Builtin generators
The single generator
> 

parameters for random number generator single
single is of the same as the builtin generator in LinearCongruence
> 
m:=427419669081: p:=999999999989: x:=32564: 
> 
single:=proc() local s: global x,m,p:
x:=x*m mod p: s:=x/p: evalf(frac(s));
end proc; 

(5.1.1) 
procedure prodrand gives the first digit with nrf factors in the product and the generator singel
> 
prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*single(); end do;
while pr < 1 do pr:=pr*10 end do; trunc(pr)
end proc; 

(5.1.2) 
Choose nrf : number of factors
Choose trial number of trials
> 
for k from 1 to 9 do u[k]:=0 end: 
> 
for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do: 
> 
print('i', relfreq); Digits:=5:
for i from 1 to 9 do i,evalf(u[i]/trial) end do; 
The rand generator
> 

pp module for builtin generator rand
procedure prodrand gives the first digit with nrf factors in the product and
the builtin random number generator rand()
> 
prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*rand()/pp; end do;
while pr < 1 do pr:=pr*10 end do; trunc(pr)
end proc; 

(5.2.1) 
Start new experiment from here
Choose nrf : number of factors
Choose trial : number of trials
> 
for k from 1 to 9 do u[k]:=0 end: 
> 
for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do: 
Record of distribution of first digit of the product
> 
print('i',relfreq);Digits:=5:
for i from 1 to 9 do i, evalf(u[i]/trial) end do; 
NewGenerator
> 
restart: with(RandomTools[MersenneTwister]); 

(5.3.1) 
> 
M:=NewGenerator(range=10^12); 

(5.3.2) 
> 
prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*Float(M(),12); end do;
while pr < 1 do pr:=pr*10 end do; trunc(pr)
end proc; 

(5.3.3) 
Start new experiment from here
Choose nrf : number of factors
Choose trial : number of trials
> 
for k from 1 to 9 do u[k]:=0 end: 
> 
for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do: 
Record of distribution of first digit of the product
> 
print('i',relfreq);Digits:=5:
for i from 1 to 9 do i, evalf(u[i]/trial) end do; 
Simulation with triple generator
> 

Parameters for random number generator triple.
Produced with numtheory : safeprime (p) and primroot (m) triple is of the same type as the builtin generator singel in LinearCongruence
Triple has three generators in parallel for smooting out for better performance in serial test
> 
m[1]:=413654159661: p[1]:=876543220739: x[1]:=32564:
m[2]:=319894150324: p[2]:=676543219907: x[2]:=536766:
m[3]:=274222336502: p[3]:=555985626419: x[3]:=2321: 
> 
triple:=proc() local s,i: global x,m,p;
s:=0: for i from 1 to 3 do x[i]:=x[i]*m[i] mod p[i]: s:=s+x[i]/p[i]: end do:
evalf(frac(s)); end proc; 

(6.1) 
procedure trust gives the first digit with generator triple and nrf factors
> 
trust:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*triple(); end do;
while pr < 1 do pr:=pr*10 end do; trunc(pr)
end proc; 

(6.2) 
Choose nrf : number of factors
Choose trial number of trials
> 
for k from 1 to 9 do u[k]:=0 end: 
> 
for k from 1 to trial do s:=trust();
u[s]:=u[s]+1 end do: 
> 
print('i', relfreq); Digits:=5:
for i from 1 to 9 do i,evalf(u[i]/trial) end do; 
Comments and results from simulations
Comments on Random Number Generators
In what extent can we trust a pseudorandom generator?
In Birger Jansson, Random Number Generators (doctor's dissertation 1966)
and at Donald Knuth we find different kinds of test:
Frequency tests, Serial correlation, Poker tests, Gap tests, Run tests,
Coupon collector's test, The  test, Visual tets.
Summary of results
number of trials : 10 000 000
From the results we can expect that the error is at most 1 promille
The generators can be considered suitable for the purpose!
FD is First Digit
BIN is builtin generator rand() in Lversion triple is triplegenerator
relative frequency
nrf FD BIN triple
2 FD =1 0.24142 0.24125
FD=2 0.18309 0.18313
3 FD=1 0.30057 0.30062
FD=2 0.18839 0.18817
4 FD=1 0.30731 0.30752
FD=2 0.17814 0.17839
5 FD=1 0.30276 0.30287
FD=2 0.17525 0.17518
6 FD=1 0.30081 0.30063
FD=2 0.17557 0.17556
Theory for nrf = 2
nrf2FD1 is probability for FD = 1
nrf2FD2 is probability for FD = 2
We leave to Maple do the work
nrf=2
12

(7.1) 
1020
> 
b1:=int(1010/x,x=1..2)+int(20/x10/x,x=2..10); 

(7.2) 
> 
(nrf2FD1):=evalf((a1+b1)/81); 

(7.3) 
> 
a2:=int(3/x2/x,x=1..2)+int(3/x1,x=2..3); 

(7.4) 
> 
b2:=int(1020/x,x=2..3)+int(30/x20/x,x=3..10); 

(7.5) 
> 
(nrf2FD2):=evalf((a2+b2)/81); 

(7.6) 
The correspondence with simulated and theoretical values are very good.
There are good reasons to trust the simulated values even for nrf > 2
It seems that P(FD=1) has a maximum for nrf = 4 and that P(FD=1) for nrf>2 lies near log(2)
Simulation: Principle Slide Rule
We study the numbers k * r where r is a given irrational number and k passes the natural numbers from 1 to trial c to convert from = exp(x) to
> 
restart: r:=evalf(sqrt(2)/2): s:=0: c:=evalf(ln(10)): 
In sumlog we add the irrational number r to s. s is transformed to (0,1) . Let log(u) = r.
The effect of add r is the same as multiply with u. On the slide rule we add distances when doing multiplication.
> 
sumlog:=proc() global s,r,c;
s:=s+r; while s<0 do s:=s+1 end do;
trunc(exp(s*c))
end proc; 

(8.1) 
> 
for i from 1 to 9 do u[i]:=0 end do: 
> 
for k from 1 to trial do t:=sumlog():
u[t]:=u[t]+1 end do: 
> 
Digits:=5: print(digit,relfreq);
for k from 1 to 9 do k, evalf(u[k]/trial) end do; 
Benfords law
table of log(i+1)log(i) for i = 1..9
> 
Digits:=5:print('i','log10(1+1/i)');
for i from 1 to 9 do i,evalf(log10(1+1/i)) end do; 
Results and comments
slide rule principle Benfords law
FD = 1 0.30103 0.30103
FD = 2 0.17609 0.17609
The agreement between simulation and Benfords law is not surprising.
We study the numbers k*r where r is an irrational number and k passes the the natural numbers
It is a known fact the numbers frac(k*r) lie densely and the distribution is uniform on the interval (0,1)
Let log (u) = r. In the following we assume that is reduced to the interval (0,1)
Let F(x) be the probability that < x. F(x) = P ()
This implies
log() < log(x) and this is equivalent with frac(k*r) < log(x)
So we have F(x) = log(x)
First digit in frac(k*r)) ≤ j implies frac(k*r) < j+1. This occurs with the probability log(j+1)
P(First Digit = h) = log(h+1)log(h) = log(1+1/h)
Slide rule :
To multiply with u repeatedly is the same as add distance r repeatedly. If necessary go 'back to' interval (1,10).
We note that the distribution is dense and uniform
How often do we end in the interval (1,2) on the slide rule?.
The relative frequency must be the ratio on the slide rule between distance (12) and distance (110) i e log(2).
So we may realize that the relative frequency for first digit is equal to 1 is log(2)
Simulation: Another approach
When multiply with a random number the products are accumulated. In this program we only study FD=1
> 
restart: with(RandomTools[MersenneTwister]); 

(11.1) 
> 
M:=NewGenerator(range=10^10); 

(11.2) 
In maple we multiply pr with a random number in the interval (0.01  0.1) (In Float(M(),11)
On the slide rule we start with 1 and multiply repeatedly with random numbers (uniformly distributed) and read the first digit for every new random number.
For every new serie we start with pr=1
> 
maple:=proc() global pr;
pr:=pr*Float(M(),11);
while pr<1 do pr:=pr*10 end do;
trunc(pr)
end proc; 

(11.3) 
Start new experiment from here
Choose number of trials for each serie
> 
trial:=10000: serie :=3: 
> 
for i from 1 to serie do u[i]:=0:
pr:=1:for k from 1 to trial do t:=maple():
if t=1 then u[i]:=u[i]+1 end if end do: end do:
Digits:=5: print(relfreq);
for k from 1 to serie do evalf(u[k]/trial) end do;
print(average);
su:=0: for k from 1 to serie do su :=su + u[k]: end do:
evalf(su/trial/serie); 
COMMENTS From the results we can see the variations in the series of simulations. It seems that the probability for FD=1 is near log(2) as in the other simulations. Theory may be rather complex. Do your own experiments with several values for trial and serie This simulation resembles SIMULATION Principle slide Rule There we added a constant term. Here we perform the simulation by multiplying with a new factor at haphazard every time on the slide rule: How often will the result end up between 1 and 2 on the slide rule? If you wish to write comments to me by email,
you must specify the subject to 'First Digit' (email with other subjects are deleted)
Legal Notice: The copyright for this application is owned by the author(s). Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for noncommercial, nonprofit use only. Contact the author for permission if you wish to use this application in forprofit activities.