A random sample is a selection of records chosen from a larger population in such a way that each record has an equal chance of being chosen. The goal is to ensure that the sample reflects the population as accurately as possible. If any one group is over-sampled or under-sampled, then it can skew the results and lead to inaccurate conclusions. Random sampling is essential for statistical work because it helps to eliminate any bias. This post will show you how to select random samples with and without duplicates. Get your copy of the example workbook used in this post to follow along!

Select a Random Sample with the RAND Function

This method is the most manual but if you only need one sample, then it’s the easiest to set up and use on a one-off basis. For this, you will need to add a helper column to your data that contains random numbers. This can be used to sort the data in random order and the top rows can be taken as your random sample. Here is a list of data for height and weight measurements. Suppose you want to get a random sample of 5 observations from this. Here is what you need to do. The RAND function will generate a random decimal number between 0 and 1 in the column. Now you can sort the data based on this random column. Sorting based on a random number will create a random ordering. 💡 Tip: You can also use the Ctrl + Shift + L keyboard shortcut to add the sort and filter toggles to your data. You can also select the Sort Largest to Smallest option, it doesn’t really matter as either way it will result in a random order for your data. The data will appear in a random order and you can copy and paste the top 5 rows of your data and take this as your random sampling. If you need another random sample, just repeat the process. When you sort the data again, this will cause the RAND function to recalculate and generate new random numbers so you will get a new random order.

Select a Random Sample with Data Analysis Add-in

There is a statistical analysis add-in available to all versions of Excel that has a sampling feature. You can enable the data analysis toolpak add-in to use this random sampling. But be aware, this will only allow you to sample data from a single column of numeric data. It also returns duplicate values in the sample and there is no option to sample without repeating values. Here’s how to use the Data Analysis Toolpak add-in to generate random samples. This will open the Data Analysis menu. This opens the Sampling menu where you can select from a few input and output options. When you press OK, the sampled values are added to your desired location.

Select a Random Sample using an Array Formula

A very interesting approach to getting a sample is using a dynamic array formula to return a random. Dynamic arrays allow you to return more than one value from a single formula, so you could use them to return multiple random rows of your data. The above formula will return 5 random rows without duplicates from the range B3:D17. The ROWS function determines how many rows are in the source data. In this example, range B3:D17 has 15 rows. The RANDARRAY function then creates a column of random values between 0 and 1 with the same number of rows as the range B3:D17. The SORTBY function then sorts the data range B3:D17 based on the column generated from the RANDARRAY function. Since this is a column of random values, the SORTBY function will return a randomly sorted range. The FILTER function is then used on the randomly sorted range to return the top 5 rows as the random sample. This is achieved by filtering based on the sequence from 1 to the number of rows in the data range. The filter condition SEQUENCE(ROWS(B3:D17)) <= 5 results in a column where the first 5 values are TRUE. This cause the FILTER function to return the first 5 rows of the randomly sorted range. This will get you a random sample and you will be able to easily generate new random samples by pressing the F9 key. This will cause the RANDARRAY function to recalculate and new random rows will be returned.

Select a Random Sample using an Array Formula [with Duplicates]

The previous formula method returns a random sample without any repeated rows, but you might want to allow them. This is also possible with a dynamic array approach. The above formula will return a sample with duplicates of 5 rows from the range B3:D17. The ROWS function is also used here to get the number of rows in the data. The RANDARRAY function is then used to generate 5 rows of random integer values ranging from 1 to the number of rows in the. This is the list of row numbers that will be included in the random sample. The RANDARRAY function allows for duplicate values to be returned, so this will result in the random sample potentially having duplicates. The SEQUENCE function will be used to ensure all the columns of the source data are returned by the INDEX function. This generates an array such as {1,2,3} in this example because there are 3 columns in the source data. The INDEX function then returns the given set of rows generated with the RANDARRAY function. This sample can be updated by pressing F9 on your keyboard to recalculate the formula.

Select a Random Sample with Power Query

Power Query is a tool baked into Excel meant for importing and transforming your data into the format you need. This method will use Power Query to get the random samples from the data with duplicates. Suppose your data is in an Excel table, then here are the steps to follow. This will open the Power Query editor. This creates a list with one random value in each row. This will get a single random value into each row of the column. You can also select the Sort Descending option and the result will be the same. The order of the data will be randomized. Now you can filter on the top few rows to get the required sample size. Now you can clean up the data by removing the Random column. Its job is done. You can then load the data back into your Excel workbook. Go to the Home tab and click on the Close and Load command for options to load the data back into a table. You can then generate another random sample by going to the Data tab and selecting the Refresh option.

Select a Random Sample with Power Query [without Duplicates]

This method will use Power Query to get the random samples from the data without duplicates. You will need to load the source data from your table into the Power Query editor as before. Once the data is in the power query editor, you can follow these steps to get the random samples without repeats. The first thing to do is add an index column that identifies the row number of the source data. This creates a column of whole numbers starting at 1 in the first row of the data. This will be used later to join another query that contains a random selection of row numbers. Now you will need to create a new blank query to randomly select row numbers for the sample. This creates a new one where you can build a random selection of row numbers. Replace the 5 with whatever number of items you want in your sample. Because the sample is with replacement, this number can be higher than the actual number of rows in the source data. When you create the list, the power query editor will display a List Tools Transform tab. This will create a random number between 0.5 and 15.5 for each row. These are going to be rounded to the nearest integer and that will result in whole numbers ranging from 1 to 15. This range is based on the total number of rows in the source data. The initial range of 0.5 to 15.5 is needed so that each row number has an equal chance of appearing. Now you have a random selection of row numbers with duplicates. This can be merged with the original data source. This creates a column of Tables. Each table should contain a single row corresponding to the value in the Random column. You can now clean this query and get rid of the columns you don’t need. Now you can load the results to Excel. Go to the Home tab and select Close and Load and load the results to a Table. You will be able to generate a new sample at any time by going to the Data tab and clicking on the Refresh command.

Conclusions

Random sampling is an important technique in statistics. It’s no doubt you will come across this need if you are doing any statistical analysis with Excel. Basic helper column techniques and add-ins can be used easily for one-off cases. But dynamic array formulas or Power Query solutions will be better suited when you require multiple samples from your data. Did you ever need to randomly select a sample from your data in Excel? How did you get this done? Let me know in the comments below!