SOME ALTERNATIVE SAMPLING TECHNIQUES IN THE MEASUREMENT OF FARM-BUSINESS CHARACTERISTICS~

Size: px
Start display at page:

Download "SOME ALTERNATIVE SAMPLING TECHNIQUES IN THE MEASUREMENT OF FARM-BUSINESS CHARACTERISTICS~"

Transcription

1 SOME ALTERNATIVE SAMPLING TECHNIQUES IN THE MEASUREMENT OF FARM-BUSINESS CHARACTERISTICS~ QUENTIN M. WEST Inter-American Institute of Agricultural Sciences AREA-SEGMENT sampling on a probability basis has spread very fi rapidly in the few years since it was introduced and perfected to the point of practical applicability. It offers many advantages where, as in most studies of farms, universe lists are expensive to assemble, costs of travel are high, the universe turns over slowly but continuously through time, and geographic location is a significant characteristic of all observational units. The usefulness of area-segment sampling varies, however, with the circumstances under which it is employed. There also are many alternative area-segment designs, some of which are likely to be more satisfactory than others under given circumstances. The project here reported was undertaken to investigate the suitability of alternative area-segment designs under certain New York State conditions. It represents only a beginning in this direction, in view of the many sampling alternatives, the multiplicity of purposes served by projects that employ sampling, and the agricultural variability from area to area in the state. The project was an attempt to accumulate a body of working experience that may become part of applied statictics in research specialties where farms and farming must be studied in the field. Experimentation with a real universe played a major part in the project, although the role of theory was by no means unimportant. Statistical theory provided hypotheses for experimental testing as well as some measure of independent evidence in many instances. The complete enumeration data collected for the open-country areas of Seneca County in the years were used throughout the project. Five hundred fifty-six records were used, these being the records classified as full-time commercial farms (cases where 11.5 months or more male time household or hired, were available for farm work and where the farm work accomplished amounted to 100 or more productive-man-work units). Three farm-business characteristics were treated in all phase of the study: Total acres operated, acres in crops, and size of business in productive-man-work units. Twelve additional characteristics were introduced in the earlier phase of the study. Universe frequency distributions were constructed and parameters were computed for the 556 full-time farms for all fifteen selected farm-business characteristics. All of the universe Summary of award winning Ph.D. thesis filed at Cornell University. 982

2 FARM-BuSINESS CHARACTERISTICS 983 frequency distributions were highly skewed to the right, deviating significantly from the normal distribution. The universe distribution for productive-man-work units contained one farm with more than twice as many units as the next smaller farm. The presence of this extremely long, discontinuous tail greatly affected the sampling results for this characteristic. Simple Random Sampling! Actual research on the project started with simple random sampling and moved later to area-segment sampling. Simple random samples involve less complicated procedures in selection and can be analyzed in a more direct and simplified manner. Standard sample theory has been built around simple random sampling, while the theory used in estimating sampling precision when observational units are geographically grouped into sampling units involves special adaptations of simple random sample theory. In the first experiment, 100 samples of 100 full-time farms were drawn at random from the above universe (each approximately a 20 per cent sample). In the second experiment, 100 sub-samples of 25 full-time farms were drawn at random from the farms in the 100-farm samples of the first experiment (each approximately a five per cent sample of the universe). An examination of the empirical distributions of sample means, medians, variances, standard deviations and t values was made in both experiments. The evidence collected in the simple random sampling phase of this study supports the following general points: 1. The agricultural economist is justified in using tabulated probabilities for the normal distribution in testing the significance of differences between hypothesized universe means and sample means, and in setting confidence limits about sample means when he is working with simple random samples of farms in a general farming area like Seneca County. 2. The use of the chi-square distribution may not be valid in making tests of hypotheses and setting confidence limits for the sample variance under these circumstances. 3. Since the variances of usual interest to agricultural economists are not distributed as expected in normal theory, the suitability of the F distribution may be questioned in testing for relationships in analyses of variance that involve more than one degree of freedom in each variable. Distributions of actual F values for samples from farm-business universes 1 The following mimeograph presents a more detailed summary of the simple random sampling phase of the report: Quentin M. West, The ReMllts of Applying a Simple Random Sampling Process to Farm Management Data, A.E. 743, Agricultural Economics Department, Cornell University.

3 984 QUENTIN M. WEST must be investigated before a well founded working rule can be established on this point. It may be that, through compensating effects, the ratios of two variances are commonly distributed as F, even though the variances themselves are not distributed according to normal theory. 4. As the universe becomes more highly skewed, the median becomes a more valuable statistic than the mean for many practical purposes. This study suggests that the median also actually has smaller sampling variability than the mean when used to describe some farm universes. 5. It may be worthwhile in selecting farm universes for particular research projects to consider the likelihood that sampling distributions for some statistics are quite sensitive to the shapes of universe frequency distributions. It is usual in agricultural economic research to exclude small farms, out-of-type farms, and often atypically large farms in an effort to buy the most valuable information with the limited funds available. It is possible that further improvement in these choices might result from a more careful consideration of the shapes of resulting frequency distributions and the consequent suitability of standard statistical theory for interpreting the study results. Area-Segment Sampling Five area-segment sample designs were investigated. The particular designs selected were suggested as having practical application in actual farm management and land economic surveys. Three of these designs assumed a knowledge only of the location of open-country households in the county, together with some information on the proportion of full-time farms among the open-country households. For the other two designs, use was made of knowledge of the full-time farms available from the actual survey in the county and these farms were located on a map in their proper positions. For the first area-segment design, enough open-country residences were included in each segment to make the expected number of farms per segment closely approximate three. One hundred samples consisting of thirty-three segments were drawn at random from the universe, and all elements within the selected segments were enumerated. In the second area-segment experiment, the number of farms per sample was held as nearly constant as possible, the number of segments included in each sample being allowed to vary. The same segments were used as in the first experiment. Segments were added to the sample until the total number of farms enumerated was approximately equal to 100. One hundred samples were drawn using this design. Objective of the third method of area-segment sampling was to determine the effect of larger segments on the sampling precision. Segments

4 FARM-BUSINESS CHARACI'ERISTICS 985 were defined as in the previous designs, with the exception that the expected number of full-time farms in each segment was six instead of three. One hundred samples were drawn with the number of segments included in each sample varied so as to obtain approximately 100 farms. In constructing the sample frame for the fourth area-segment design, information at hand on the actual locations of individual farms was used. Segment boundaries were drawn to include three adjacent full-time farms. One hundred samples consisting of 33 segments were drawn and all farms within selected segments were enumerated. The fifth area-segment sample design involved the systematic selection of sampling units. For this experiment, sampling units consisted of the segments which were delineated for the fourth area-segment design. Segments were numbered in a serpentine fashion following the roads up and down or across the county. Twenty different orderings of the segments were made in this manner. Samples consisted of every fifth segment, thus, there were five possible samples for each pattern of segments. All possible samples were enumerated, so the exact sampling distribution of means was known for each pattern. Comparative Precision of Sample Designs The following results were obtained in an empirical evaluation of the six sample designs of approximately 100 farms (Table 1): The systematic area-segment design had somewhat less sampling variability than any of the other designs. Compared to the 20 per cent individual random design the systematic design showed an average gain (average of three farmbusiness characteristics) of one per cent, the first area-segment design had a loss of 15 per cent, the second a loss of 44 per cent, the third a loss of 37 per cent and the fourth area-segment design a loss of 12 per cent. There was a loss in precision of 27 per cent in the second area-segment design compared to the first area-segment design even though the sample size was held constant in the second design. The third area-segment design, similar to the second except that the segments had twice as many farms on the average, showed a gain in precision of 17 per cent over the second design. The fourth area-segment design, with constant segment size, resulted in less than one per cent gain over the first area-segment design but had a gain of 21 and 16 per cent, respectively, over the second and third designs. In comparing the systematic with the random selection of area-segments, the systematic design showed a 13 per cent gain over the fourth area-segment design.

5 986 QUENTIN M. WEST TABLE 1. VARIANCES OF 100 EXPERIMENTAL SAMPLE MEANS, SEVEN SAMPLE DESIGNS, THREE FARM-BUSINESS CHARACTERISTICS, SENECA COUNTY, NEW YORK, 1948 Sample Design Average Number of Total Acres Acres in Farms per Operated Crops Sample Productive- Man-Work Units Number Variances Twenty per cent individual random Five per cent individual random First segmented random i Second segmented random 100, Third segmented random !l8 Fourth segmented random !l0 Segmented systematic' 111 9!l Averages for ~O distributions of five sample means. Conclusions The evidence recorded in this study provides support for the following conclusions. It is hoped that these may be helpful in further sampling research and may serve as first approximations to satisfactory working guides in farm management and land economic surveys. 1. The extra cost usually' involved in obtaining a mapped pre-list of farms appears not to be justified by the slightly increased precision that may be gained through holding area-segments constant in number of farms. If a good map is available showing open-country dwellings and an estimate can be obtained for the proportion of these dwellings that are farms, satisfactory area-segments may be delineated without further information. 2. There is a loss in sampling precision when the number of segments per sample is varied to maintain a constant number of farms. Such a practice also complicates theoretical estimates of sampling variability. 3. In a farming situation similar to Seneca County, with area segments delineated as they were in this study, it is better to include an average of six farms per segment rather than three. The precision of six-farm segment designs may even exceed that of the three-farm segment designs and the costs of enumeration vary likely would be less with the larger segments. 4. On the average the systematic selection of area-segments is less variable than the random selection of similar segments. However, this design does not guarantee a gain in sampling precision over a random selection of segments. The sampling variance for some of the patterns used in the systematic sampling was very small, but for others it was relatively large as compared to the variance of random samples. This emphasizes the necessity for careful planning in setting up patterns of sampling units for systematic selection. Soil and land class maps and other available information might well be studied before the sampling units are arranged

6 FARM-BuSINESS CHARACfERISTICS 987 in patterns, in order to avoid as much as possible periodicities that create large differences among individual samples. In presenting results of these sampling experiments, it is recognized that they themselves also are subject to sampling fluctuations. The 100 samples drawn for each of the designs constitute a relatively small proportion of the total samples that might have been drawn. Twenty systematic patterns also are but a few of the possible patterns that might have been set up. Sources of variation in the sample designs used are very complex and there is much yet to be learned about them. Further investigation into the effect of the ordering of sampling units on systematic sample results would be profitable. This should included a study of the systematic selection of individual farms. The effect of stratification upon the precision of these sample designs needs investigation. The area-segments of this study were delineated within land classes, making it easily possible in future experiments to impose a stratification by land class upon these designs. Unless stratification by land class is used in drawing the sample, the area-segments probably should not be delineated within land class. Various modifications in the methods of delineating area-segments could well be looked into relative to their effect on sample results. Before these designs can be fully evaluated as to sampling efficiency, an investigation should be made of the relative costs of obtaining the desired information from surveys using the various sample designs. Caution should be observing in generalizing these results to other areas. If the farming situation is different, the effect of combining groups of Farms into area-segments may be different. The effect of the ordering of sampling units upon the precision of systematic samples may also vary with the character of the area in which the study is being made. General knowledge of farming in New York state suggests that sampling principles evolved for Seneca County might hold reasonably well in other areas of the Lake Plains Region, but may not be fully applicable elsewhere in the state and may well be considered only suggestive in other regions of the country.