The data developed will be described first, followed by explanations on how this set of data came about.
The data represents a number of noisy sine waves, as shown in the plot to the right. It has 300 rows and 3 columns
The rows
The columsThere are 3 columns
|
# -*- coding: utf-8 -*-
"""
Make Data.py To create test data for Simon's project'
2024/01/20
"""
import math
import numpy.random
import statistics
def RandomNormal(mean, sd):
return numpy.random.normal(mean,sd)
def AddNoise(sourceAr, numSD):
sd = statistics.stdev(sourceAr)
noiseLevel = numSD * sd
print(sd,numSD,noiseLevel)
resAr = []
for i in range(len(sourceAr)):
resAr.append(sourceAr[i] + RandomNormal(0, noiseLevel))
return resAr
"""
nCycle, number of cycles, each cycle 2 360 degrees, so a pos and a neg wave
nPoints, number of data points for each cycle
toMean and toSD is what sine values 0-1 translate to
The length of the data is nCylcles x nPoints
"""
def MakeSineWaves(nCycles, nPoints, toMean, toSD):
order = []
groupNames = []
groupNumbers = []
sines = []
intv = 360 / nPoints
k = intv
n = 1
for i in range(nCycles):
for j in range(nPoints):
degree = k % 360
x = math.sin(math.radians(degree))
grpName = "F"
grpNum = 0
if x>0:
grpName = "T"
grpNum = 1
order.append(n)
groupNames.append(grpName)
groupNumbers.append(grpNum)
sines.append(x)
k += intv
n += 1
mean = statistics.mean(sines)
sd = statistics.stdev(sines)
newVals = []
for v in sines:
newVals.append(((v - mean) / sd) * toSD + toMean)
return order, groupNames, groupNumbers, sines, newVals
if __name__ == "__main__":
nCycles = 5 # number of cycles
nPoints = 60 # each cycle divided into 60 data points
toMean = 100 # mean and SD
toSD = 10
order, groupNames, groupNumbers, sines, newVals = \
MakeSineWaves(nCycles, nPoints, toMean, toSD)
noise_20 = AddNoise(newVals, 2)
noise_30 = AddNoise(newVals, 3)
noise_40 = AddNoise(newVals, 4)
noise_50 = AddNoise(newVals, 5)
for i in range (len(order)):
print(order[i], "\t", groupNames[i], "\t", groupNumbers[i], "\t", \
"%.4f" % sines[i], "\t", "%.4f" % newVals[i], "\t", \
"%.4f" % noise_20[i], "\t", "%.4f" % noise_30[i], "\t", \
"%.4f" % noise_40[i], "\t", "%.4f" % noise_50[i])
The Python program that produced the data demonstrated above is shown in the panel to the right. The remainder of this page desceibes the thinking behaind and leading to this program.
It began with idea to explore how to clean up and interpret a sequence of numbers sampled from an analog signal. The model being to sample a continuous electical signal and convert this into digital bipolar values of 0/1
I have in mind two conceptual processes
It is also envisaged that large quantities of data will be repeatedly required for this exercise, firstly to find a suitable set of data to act as the model while exploring alternative strategies. More importantly, if what appears to be a successful strategy emerges, there is a need to repeatedly testing it for robustness (not to make wrong interpretations) and sensitivity (able to detect the underlying signal)
I therefore decided to produce a short program with changeable parameters. As the data is generated by random numbers but controlled by its parameters, similar but different sets of data can thus be generated quickly. The program developed should be able to produce data with the following characteristics
![]() |
![]() |
| F | T | All | |
| n | 150 | 150 | 300 |
| Set_0 | |||
| mean | 91.0201 | 108.9799 | 100 |
| SD | 4.3767 | 4.3767 | 10 |
| Set_3 | |||
| mean | 92.0824 | 112.5713 | 102.3268 |
| SD | 28.9289 | 26.8896 | 29.7096 |
Only the data of set_3 are shown in the second plot to the left. The blue circles are from those values that were above the midline (v=100) of the noiseless sine wave, and are designated group true (T). The red circles are those values at or below the midline of the original sine wave, and are designated as group false (F). This plot shows how the incorporation of noise increases the overlapping of the data in the two groups
The difference betwwen the F (red) and T (blue) groups is also shown in the normal distribution plot to the right. It can be seen that the spread of the data in the two groups are similar, the modes are different, but there are large overlaps because of the high noise level.
The distributions of data from the two groups (F and T) in the two sets (Set_0 to the left and Set_3 to the right) are fusther demonstated in the plot below and to the left, and in the table above.
It can be seen that the sign wave data has uniform distribution, and the two groups are defined by their values, so there is no overlap.
With the addition of noise, and particularly if the noise is random but normally distributed, the range of measurements is much increased, and the data in the 2 groups now are more normally distributed, and overlap considerably.
Thinking backwards, the real signal inside the noise is likely to be of smaller amplitude then the raw signals, which included the noise, and the difference betwwn the two poles is very much smaller than what appears in the signal.
I shall therefore proceed with what I have got (Set_3), and would be grateful for comments and suggestions for change from you