Asian Research Journal of Business Management Issue 4 (Vol.4)2017 Issn: Probability and Non Probability Sampling

Size: px
Start display at page:

Download "Asian Research Journal of Business Management Issue 4 (Vol.4)2017 Issn: Probability and Non Probability Sampling"

Transcription

1 CODEN: ARJBAX Asian Research Journal of Business Management Probability and Non Probability Sampling Susan Wanjiru Mwangi Procurement Manager & Part time Lecturer Technical University of Mombasa and Jomo Kenyatta University of Agriculture and Technology: Procurement, Logistics Supply Chain and Strategic Management Received: 15 May 2017; Revised: 16 June 2017; Accepted: 20 June 2017 Abstract: This paper looked at probability and non-probability sampling. Probability is the measure of the possibility that an activity will happen. 0 and 1 are quantified in Probability where, loosely speaking, 0 indicates impossibility and 1 indicates certainty. Certainty comes in when there is higher probability of occurrence.a simple example is the rain expectation. Since the no one can tell when the rain will fall, ("rain" and "sun") are both equally probable; the probability of "rain" equals the probability of "sun"; and since no other outcomes are possible, the probability of either "rain" or "sun" is ½.This is a common scenario when you visit a town along the equator line (which could also be written as 0.5 or 50%). Therefore probability has been defined as branch of mathematics that deals with calculations of the likelihood of a given event and most of the times the trials of either have no effect on the other. Keywords: Probability; occurrence; sampling; INTRODUCTION Population can be referred to a particular region, group, or type of people or animals living in an area or country. A selection of a subset of individuals from a population to form the sample for your survey is referred to as sampling and a very important aspect of population. A sample therefore is a subset, or some part, of a larger whole depending on what works you as a researcher is working on and the time factor and costs involved. A larger whole could be anything out which sample is taken. That `whole' could be a bucket of water, a bag of salt, a group of companies, a group of lecturers, a group of shoe shiners, or a group nurses in a hospital. These groups must share common set of characteristics thus bring their identity in terms of population [1] REASONS FOR SAMPLING I. Cost, Labor and Time: Depending on the research one may opt to conduct a study on the total population rather than a sample because cost, labor, and time constraints are relatively insignificant. Although sample study cuts costs, reduces labor requirements, and gathers vital information quickly, yet there could be other reasons. Budget and time constraints are common parameters in applied research projects. This is due to financial cost as well as time. Many researchers today go for sample study since it is pragmatic and very reliable. 113

2 II. Quality Management/Supervision: Professional fieldworkers are a scarce commodity. It may be advisable in a large study rather than employing less qualified staff to do a sample study and employ highly qualified professional fieldworkers. It can certainly affect the quality of the study. Supervision, record keeping, training, and so forth would all be more difficult in a very large study. At the same time it may be easier to manage a small group and produce quality information. III. Accurate and Reliable Results: If the elements of a population are quite similar, only a small sample is necessary to accurately portray the characteristics of interest. Most of us have had blood samples taken from the finger, the arm, or another part of body. Another major reason for sampling is that samples, if properly selected, are sufficiently accurate in most of the cases. The assumption is that blood is sufficiently similar throughout the body, the characteristics of the blood can be determined on the basis of sample. When the elements of population are highly homogenous, samples are highly representative of the population. Under these circumstances almost any sample is as good as another. Samples may be more accurate than census. In a census study of large population there is a greater likelihood of nonsampling errors. In a survey mistakes may occur that are unrelated to the selection of people in the study. For example, a response may be coded incorrectly, or the keyboard operator might make data entry error. In field survey, a small, well trained, closely supervised group may do a more careful and accurate job of collecting information than a large group of nonprofessional interviewers trying to contact everyone. IV. Sampling may be the Only Way: Many research projects, especially those in quality control testing, require the destruction of the items being tested. If the manufacturer of firecrackers wished to find out whether each product met a specific production standard, there would be no product left after testing. Similarly, consider the case of electric bulbs. In testing the life of bulbs, if we were to burn every bulb produced, there would be none left to sell. This is destructive sampling. V. Determine the Period of Study: In long periods study, even the seasonal variation may influence the response pattern of the respondents. Interviewing every element of a large population without sampling requires lot of time, may be a year or more For example, if the study was aimed at measuring the level of unemployment in a given large city, the unemployment rate produced by the survey data would not refer to the city as of the beginning of interviewing or as of the end. Researcher may be forced to attribute the unemployment to some hypothetical date, representing to the midpoint of the study period. Hence it will be difficult to determine the exact timing to which the data of the study pertains [2] TYPES OF SAMPLING The major alternative sampling plans may be grouped into: 1. Probability techniques 2. Non-probability techniques. 114

3 PROBABILITY SAMPLING According to Dawson & Catherine, [3] every element in the population has a known non zero probability of selection. The simple random is the best known probability sample, in which each member of the population has an equal probability of being selected. Probability sampling designs are used when the representativeness of the sample is of importance in the interest of wider generalizability. When time or other factors, rather than generalizability, become critical, non-probability sampling is generally used. NON-PROBABILITY SAMPLING The probability of any particular element of the population being chosen is unknown. The selection of units in non-probability sampling is quite arbitrary, as researchers rely heavily on personal judgment. It should be noted that there are no appropriate statistical techniques for measuring random sampling error from a non-probability sample. Thus projecting the data beyond the sample is statistically inappropriate. Nevertheless, there are occasions when nonprobability samples are best suited for the researcher's purpose. [4] WHEN TO USE NON-PROBABILITY SAMPLING This type of sampling can be used when demonstrating that a particular trait exists in the population. It is a common type of sampling since it has been widely used in the 21 st century. It can also be used when the researcher aims to do a qualitative, pilot or exploratory study. Another situation where this type of sampling is used is when randomization is impossible like when the population is almost limitless. When the research does not aim to generate results that will be used to create generalizations pertaining to the entire population this is the best tool to engage. It is also good for situation where there is minimal budget and workforce and initial study will be done again through randomized probability sampling. [5] CHARACTERISTICS OF THE TWO METHODS PROBABILITY SAMPLING The following are the probability sampling characteristics which cannot be ignored. 1. You have a complete sampling frame. You have contact information for the entire population. 2. You can select a random sample from your population. Since all persons (or units ) have an equal chance of being selected for your survey, you can randomly select participants without missing entire portions of your audience. 3. You can generalize your results from a random sample. With this data collection method and a decent response rate, you can extrapolate your results to the entire population. 4. Can be more expensive and time-consuming than convenience or purposive sampling. 115

4 NON-PROBABILITY SAMPLING Characteristics of non-probability sampling outlined as follows 1. Used when there isn t an exhaustive population list available. Some units are unable to be selected, therefore you have no way of knowing the size and effect of sampling error (missed persons, unequal representation, etc.). 2. Not random. 3. Can be effective when trying to generate ideas and getting feedback, but you cannot generalize your results to an entire population with a high level of confidence. Quota samples (males and females, etc.) are an example. 4. More convenient and less costly, but doesn t hold up to expectations of probability theory. [4] FACTOR TO CONSIDER WHEN CHOOSING SAMPLE DESIGN According Wikipedia, [5] a researcher is a person who must make a decision concerning the most appropriate sample design for a specific project will identify a number of sampling criteria and evaluate the relative importance of each criterion before selecting a sample design. The most common criteria. a) Degree of accuracy: Selecting a representative sample is, of course, important to all researchers. However, the error may vary from project to project, especially when cost saving or another benefit may be a trade-off for reduction in accuracy. RESOURCES The costs associated with the different sampling techniques vary tremendously. If the researcher's financial and human resources are restricted, this limitation of resources will eliminate certain methods. For a graduate student working on a master's thesis, conducting a national survey is almost always out of the question because of limited resources. Managers usually weigh the cost of research versus the value of information often will opt to save money by using non-probability sampling design rather than make the decision to conduct no research at all. b) Advance Knowledge of the Population: Advance knowledge of population characteristics, such as the availability of lists of population. Members, is an important criterion. A lack of adequate list may automatically rule out any type of probability sampling. c) National versus Local Project: Geographic proximity of population elements will influence sample design. When population elements are unequally distributed geographically, a cluster sampling may become more attractive. 116

5 d) Need for Statistical Analysis: The need for statistical projections based on the sample is often a criterion. Non-probability sampling techniques do not allow researcher to use statistical analysis to project the data beyond the sample e) Sampling Design and their Advantages and Disadvantages Type of Sampling When to use it Advantages Disadvantages Probability Strategies Simple Sampling Random Systematic Sampling Stratified Sampling Cluster Sampling Non-Probability Sampling Random Convenience Sampling Quota Sampling When the population members are similar to one another on important variables When the population members are similar to one another on important variables When the population is heterogeneous and contains several different groups, some of which are related to the topic of the study When the population consists of units rather than individuals When the members of the population are convenient to sample When strata are present and stratified sampling is not possible Ensures a high degree of representativeness Ensures a high degree of representativeness, and no need to use a table of random numbers Ensures a high degree of representativeness of all the strata or layers in the population Easy and convenient Convenience inexpensive and Insures some degree of representativeness of all the strata in the population Time consuming and tedious Less random than simple random sampling Time consuming and tedious Possibly, members of units are different from one another, decreasing the techniques effectiveness Degree generalizability questionable Degree generalizability questionable of is of is TYPES OF PROBABILITY SAMPLING More work is needed for Probability samples that rely on random processes. It is the duty of the researcher to include in the sample specific elements which they have identified before embarking on the research. E.g. If conducting an HIV survey, the researcher needs to try to test a specific sample of people to get the right result. Research has confirmed that random samples mostly yields a sample that present the correct situation on the ground. It gives the researcher 117

6 an opportunity to calculate the relationship between sample and population indicating the error. The deviation between sample result and population parameter due to random process is referred to as a deviation of a non-statistical error. [4] I. Simple Random Sample: The easiest random sample to understand is the simple random and the one on which other types are modeled. A research develops an accurate sampling frame, selects elements from sampling frame according to mathematically random procedure, then locates the exact element that was selected for inclusion in the sample. The eases the work of the researcher since he doesn t get any difficulties in conducting his research. The researcher uses a list of random numbers to decide which elements to select after labelling all the items in the sampling frame. A sample of 120,120 random numbers are needed since the researcher needs as many numbers as the items to be sampled for accuracy. Depending on where the research is done, numbers can be obtained from a random number table, chosen in a mathematically random way. This are available in most statistics and research methods books if at all success will be obtained. For the number to have an equal probability of appearing in any format, they have to be generated by a pure random process. Lists of random numbers can be produced by identified computer programs which must be well chosen. The starting point must be chosen at the beginning. Every random sample is not guaranteed that it represents a population, since data is not always accurate. Most of the time, the researcher works with estimates to sample distributions which is the key for calculating error and confidence interval. II. Systematic Random Sample: Systematic random sampling is a simple random sampling with a short cut for random selection is a systematic random sampling. The research must number each item in the sampling frame at the onset of the exercise. The researcher must calculate a sampling interval and ensure he has his own quasi random selection method e.g. (1 in N where N is some number) This move will help the researcher select elements from a sampling frame by skipping elements in the frame before one for the sample. Computing sample intervals is easy provided there is a sample size and population. The researcher must always think of the sample interval as the inverse of the sampling ratio. The sampling ratio for 200 names out of 600 will be 200/600 =.222 = 22.2 percent. The sampling interval is 600/200 = 3 Begin with a random start. At the beginning, point blindly at the number. The easiest way to do this is to point blindly that are likely to be part of the sampling interval. When the items are organized in some kind of cycle or pattern, the systematic sampling will not give a representative sample. III. Stratified Random Sample: Heterogeneous population may not encourage the use of simple random since the results may not be accurate. Some of the bigger strata may get over representation while some of the small ones may entirely be eliminated. The researcher must work with variables that are likely to affect the result and consider homogenous groups within the data. A table of random numbers will then be used to draw the required sample. The 118

7 researcher will the draw a stratified random sampling a sub-sample utilizing simple random sampling within each stratum. Some of the reasons for stratified random sample can be identified below: a) For statistical efficiency of the sample b) For analyzing various sub-populations since data will be sufficient c) To allow different research methods and procedures to be used in different strata. Stratification is equal to simple random but mostly it is considered efficient that the latter. With Each stratum is homogeneous internally and heterogeneous with other strata in ideal situations. Stratification is possible when the researcher wants to study the characteristics of a certain population subgroups. When different methods of data collection are applied in different population, Stratified sampling is also called useful and can save the research in going an extra mile to achieve their objective. Stratification Process: Stratification is identified as an efficient basis based on the primary variable (the dependent variable) under study. It is a characteristic of the population elements known to be related to the dependent variable or other variables of interest. A list of population items must be availed for each stage at each given time and segments a situation which will give the chosen variable an increase homogeneity within each stratum and increase heterogeneity between strata.this is a common scenario in stratification. Using a table of random numbers or some other device, a researcher may separate simple random sample taken within each stratum. The researcher must however determine how large a sample must be drawn from each stratum. [6] Proportionate Versus Disproportionate: A sample can be proportionate stratified sampling if the number of sampling units drawn from each stratum is in proportion to the relative population size of the stratum. To ensure an adequate number of sampling units in every stratum in a disproportionate Sometime, a disproportionate stratified sample will be selected so that a sample size for each stratum is not allocated in proportion to the population size, but is dictated by analytical considerations. [7] IV. Cluster Sampling: To sample economically while retaining the features of a probability remains the sole purpose of cluster sampling. The researcher must choose items clustered together in order to have heterogeneity among the members within each group. This is in distinction to choosing some items from the defined group as in simple random sampling, or stratifying and then identify items from the strata, or choosing every nth case in the population in systematic sampling. A random sampling of the clusters or groups can ideally be done when several groups with intra-group heterogeneity and inter-group homogeneity and information gathered from each of the members in the randomly chosen clusters. Cluster samples offer more heterogeneity within identified groups and more homogeneity among and homogeneity within each group and heterogeneity across selected groups. Cluster 119

8 sampling addresses two problems: the researchers must have a good sampling frame to conduct a successful research. A Researcher first samples clusters, each of which contains elements, then draws a second a second sample from within the clusters selected in the first stage of sampling and ensures a good sampling research framework. There is also the aspect where each element within each cluster are physical closer to the identified member making it an advantage. Each element will thus have an advantage. [6] Double Sampling: This plan is adopted when further information is needed from a subset of the group from which some information has already been collected for the same study. A sampling design where initially a sample is used in a study to collect some preliminary information of interest, and later a sub-sample of this primary sample is used to examine the matter in more detail, is called double sampling. [6] NON PROBABILITY SAMPLING INCLUDE 1) Convenience or haphazard sampling 2) Volunteer sampling 3) Judgment sampling 4) Quota sampling 5) Snowball Sampling CONVENIENCE OR HAPHAZARD SAMPLING Convenience sampling is sometimes referred to as haphazard or accidental sampling. It is not normally representative of the target population because sample units are only selected if they can be accessed easily and conveniently. The obvious advantage is that the method is easy to use, but that advantage is greatly offset by the presence of bias. Although useful applications of the technique are limited, it can deliver accurate results when the population is homogeneous. For example, a scientist could use this method to determine whether a lake is polluted. Assuming that the lake water is well-mixed, any sample would yield similar information. A scientist could safely draw water anywhere on the lake without fretting about whether or not the sample is representative. [6] Examples of convenience sampling include: 1) The female moviegoers sitting in the first row of a movie theatre 2) The first 100 customers to enter a department store 3) The first three callers in a radio contest. 120

9 VOLUNTEER SAMPLING As the term implies, this type of sampling occurs when people volunteer their services for the study. In psychological experiments or pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public. In these instances, the sample is taken from a group of volunteers. Sometimes, the researcher offers payment to entice respondents. In exchange, the volunteers accept the possibility of a lengthy, demanding or sometimes unpleasant process. Sampling voluntary participants as opposed to the general population may introduce strong biases. Often in opinion polling, only the people who care strongly enough about the subject one way or another tend to respond. The silent majority does not typically respond, resulting in large selection bias. [7] JUDGEMENT SAMPLING This approach is used when a sample is taken based on certain judgments about the overall population. The underlying assumption is that the investigator will select units that are characteristic of the population. The critical issue here is objectivity: how much can judgment be relied upon to arrive at a typical sample? Judgment sampling is subject to the researcher's biases and is perhaps even more biased than haphazard sampling. Since any preconceptions the researcher may have are reflected in the sample, large biases can be introduced if these preconceptions are inaccurate. Statisticians often use this method in exploratory studies like pre-testing of questionnaires and focus groups. They also prefer to use this method in laboratory settings where the choice of experimental subjects (i.e., animal, human, vegetable) reflects the investigator's pre-existing beliefs about the population. One advantage of judgment sampling is the reduced cost and time involved in acquiring the sample. [7] QUOTA SAMPLING This is one of the most common forms of non-probability sampling. Sampling is done until a specific number of units (quotas) for various sub-populations have been selected. Since there are no rules as to how these quotas are to be filled, quota sampling is really a means for satisfying sample size objectives for certain sub-populations. The quotas may be based on population proportions. For example, if there are 100 men and 100 women in a population and a sample of 20 are to be drawn to participate in a cola taste challenge, you may want to divide the sample evenly between the sexes 10 men and 10 women. Quota sampling can be considered preferable to other forms of non-probability sampling (e.g., judgment sampling) because it forces the inclusion of members of different subpopulations. Quota sampling is somewhat similar to stratified sampling in that similar units are grouped together. However, it differs in how the units are selected. In probability sampling, the units are 121

10 selected randomly while in quota sampling it is usually left up to the interviewer to decide who is sampled. This results in selection bias. Thus, quota sampling is often used by market researchers (particularly for telephone surveys) instead of stratified sampling, because it is relatively inexpensive and easy to administer and has the desirable property of satisfying population proportions. However, it disguises potentially significant bias. As with all other non-probability sampling methods, in order to make inferences about the population, it is necessary to assume that persons selected are similar to those not selected. Such strong assumptions are rarely valid. The main difference between stratified sampling and quota sampling is that stratified sampling would select the students using a probability sampling method such as simple random sampling or systematic sampling. In quota sampling, no such technique is used. The 15 students might be selected by choosing the first 15 Grade 10 students to enter school on a certain day, or by choosing 15 students from the first two rows of a particular classroom. Keep in mind that those students who arrive late or sit at the back of the class may hold different opinions from those who arrived earlier or sat in the front. The main argument against quota sampling is that it does not meet the basic requirement of randomness. Some units may have no chance of selection or the chance of selection may be unknown. Therefore, the sample may be biased. It is common, but not necessary, for quota samples to use random selection procedures at the beginning stages, much in the same way as probability sampling does. For instance, the first step in multi-stage sampling would be randomly selecting the geographic areas. The difference is in the selection of the units in the final stages of the process. In multi-stage sampling, units are based on up-to-date lists for selected areas and a sample is selected according to a random process. In quota sampling, by contrast, each interviewer is instructed on how many of the respondents should be men and how many should be women, as well as how many people should represent the various age groups. The quotas are therefore calculated from available data for the population, so that the sexes, age groups or other demographic variables are represented in the correct proportions. But within each quota, interviewers may fail to secure a representative sample of respondents. For example, suppose that an organization wants to find out information about the occupations of men aged 20 to 25. An interviewer goes to a university campus and selects the first 50 men aged 20 to 25 that she comes across and who agree to participate in her organization's survey. However, this sample does not mean that these 50 men are representative of all men aged 20 to 25. Quota sampling is generally less expensive than random sampling. It is also easy to administer, especially considering the tasks of listing the whole population, randomly selecting the sample and following-up on non-respondents can be omitted from the procedure. Quota sampling is an effective sampling method when information is urgently required and can be carried out 122

11 independent of existing sampling frames. In many cases where the population has no suitable frame, quota sampling may be the only appropriate sampling method. [7] SNOWBALL SAMPLING Snowball sampling (also called network, chain referral, or reputational sampling) is a method for identifying and sampling (or selecting) cases in the network. It is based on an analogy to a snowball, which begins small but becomes larger as it is rolled on wet snow and picks up additional snow. It begins with one or a few people or cases and spreads out on the basis of links to the initial cases. This design has been found quite useful where respondents are difficult to identify and are best located through referral networks. In the initial stage of snowball sampling, individuals are discovered and may or may not be selected through probability methods. This group is then used to locate others who possess similar characteristics and who, in turn, identify others. The "snowball" gather subjects as it rolls along. Researcher then goes to the four friends and asks each to name four close friends, then goes to those four and does the same thing again, and so forth. Before long, a large number of people are involved. Each person in the sample is directly or indirectly tied to the original teenagers, and several people may have named the same person. The researcher eventually stops, either because no new names are given, indicating a closed network, or because the network is so large that it is at the limit of what he or she can study. [7] THE SAMPLING PROCESS COMPRISES SEVERAL STAGES: 1. Defining the population of concern 2. Specifying a sampling frame, a set of items or events possible to measure 3. Specifying a sampling method for selecting items or events from the frame 4. Determining the sample size 5. Implementing the sampling plan 6. Sampling and data collecting 7. Reviewing the sampling process REFERENCES 1. D. P. MacKinnon, S. Coxe & A. N. Baraldi, Journal of Business and Psychology, 2012, 27(1), T. Ogwang & A. Abdou, Social Indicators Research, 2003, 64(1), Dawson, Catherine, 2002, Practical Research Methods, New Delhi, UBS Publishers Distributors. 4. D. K. Kombo and D. L. A. Tromp, Proposal and Thesis Writing: An Introduction, Wikipedia, C. R. Kothari, Research methodology: methods and techniques (3rd Ed). New Delhi: New Age International Ltd

12 7. O. M. Mugenda and A. G. Mugenda, A Revised, 2003 Research Methods Quantitative Approaches, Acts press, 1999, Nairobi Kenya. Corresponding Author: Susan Wanjiru Mwangi Procurement Manager & Part time Lecturer Technical University of Mombasa and Jomo Kenyatta University of Agriculture and Technology: Procurement, Logistics Supply Chain and Strategic Management 124