Here is a description of some topics for projects in statistics. For more information contact me
office: 1136, SB2,
phone: (735) 93539,
Many statistical distributions are characterized by special properties. For example, if the sum of two independent identically distributed random variables and their difference are independent, then these random variables have normal distributions. Or, if a positive random variable satisfies the no after-effect property, then it has an exponential distribution. A number of characterizations are known for the Poisson, Cauchy, geometric, uniform etc. distributions. Some statistical goodness-of-fit tests are based on these characterizations. Such a test however is effective only if the corresponding characterization is stable, that is if the property under consideration is “approximately” satisfied, then the distribution is close to the distribution, which is characterized by this property. In the frames of this project, it is supposed to study qualitative and quantitative aspects of stability of various statistical characterizations.
A nonparametric approach to estimation of functions is used when no pre-specified functional form of the function to be estimated is available. This situation is quite typical for many problems of statistical data analysis and image processing arising in applications. Many powerful methods of nonparametric estimation have been developed during a few last decades: kernel estimators, projection estimators (including wavelets) etc. There are, however, a lot of problems which are far from the complete solution. The choice of the smoothing parameter and quantitative estimation of the accuracy of estimators for finite values of the sample size are among them. The project includes some problems of nonparametric estimation of the probability density function and the characteristic function of a probability distribution of this kind. Apart from a theoretical study of the problem, some applications in statistical modelling, signal and image processing are supposed to be investigated, depending upon student's interests.
In applications the problem of estimating a distribution function (or testing the hypothesis of its form) often arises under condition that we observe a sample not from this distribution but from its multiple convolution. Situations leading to the stated problem arise, for example, in a quality control of products obtained after certain sequential procedures when the total number of rejected products is a sum of the rejects of each procedure. Similar situation is in queueing theory when one customer is served by several devices simulteneously, we know only a total number of served customers, and we have to estimate the distribution of a number of served customers for one device. Another class of examples concernes medicine. Such situation often arise in many biomedical studies as a result of pooling biospecimen such as blood samples in order to reduce study cost. Depending upon student's interests the project can deal with theoretical aspects of the problem or with simulation.
Binning (prebinning the data on an equally spaced mesh) is a recently developed technique for reduction of computational expenses in some statistical problems, mainly in nonparametric density estimation. Binned statistical procedures are also appropriate in the common situation when the data is only available in a discretised form. The idea of binning consists in replacing the original observations by the prebinned data: each observed data value is distributed (with some weights, possibly negative) among grid points on an equally spaced mesh. The ordinary estimator is replaced then by a binned estimator which usually has the same form as the ordinary one but is based on the binned data. It tuns out that the resulting (binned) estimator often does not lose much in accuracy to the original one while allows one to essentially save computational expenses. The same idea of prebinning the data can be applied to other statistical problems where either direct operation with the original sample leads to high computational expenses or when data is available only in a discretized form. This includes in particular nonparametric hypotheses testing when for calculation of a test statistic one needs to compute a certain function for each individual data point, for example, when a test statistic is based on the empirical characteristic function. In the frames of this project it is supposed to construct and investigate various binned nonparametric statistical tests. The investigation can be theoretical and/or by simulation.