On reference codes and your privacy / by Benjamin Woo

I’ve received a surprising amount of feedback about one particular feature of the survey: the request for the four letters and two digits to generate a personalized reference code. Some people simply find it too personal, but others have suggested that the questions' similarity to those used for security on credit cards and other accounts represents a risk of identity theft.

In response, I have changed the prompts and—as always—you can choose your own random letters or numbers. If this was stopping you from taking the survey, please feel free to go ahead now.

However, I wanted to take a moment to explain what this step is and why, as well as to say something about how we’re protecting your privacy.

What are these codes for?

Since it is possible to complete the survey totally anonymously, we need a way to find records within the survey data. We would need to do this for one of two reasons:

  1. If you change your mind about participating in the survey and wish to withdraw. This freedom is a key principle of research ethics. If someone chose not to provide any personal information at any point in the survey, this would not be possible to withdraw the information they provided without some identifier. Using personal reference codes, we can do it without needing to record your identity.
  2. The reference codes will be used to put together the second, interview-based phase of the project. At the end of the survey, you are asked if you would be willing to have a follow-up interview, and interviewees will be selected from those who agreed using what’s called quota sampling. That is to say, we’ll decide that we want, for instance, a certain number of people in each occupational role, a certain number of men and women, a certain number of younger and established creators, etc. These people will be randomly selected based on their survey responses. However, contact information is stored separately from the main survey data in order to make it harder for anyone to identify respondents. The codes are used to cross-reference a survey record with that person’s contact information.

Both of these functions are important, and that’s why it is required that all survey respondents take the step of generating a reference code. They also shape why I chose the particular prompts that I did. People are just not as good at picking things randomly as we think we are, and basing the reference codes on some piece of personal information makes them more memorable so they can be recreated if lost or forgotten. But they had to be based on pieces of information that were relatively variable so as to reduce the chances of duplicate codes being selected. 

Protecting your data

Protecting your privacy and the integrity of the information you provide is important to us. It’s another core principle of the policy framework that governs academic research. In particular, I know that many of you are freelancers who may have to watch what you say publicly about your working conditions, and that was taken into account when the study procedures were reviewed and approved by our Institutional Review Board. 

While the survey is open, the information you provide is stored on the servers of our survey provider, FluidSurveys. You can read about their privacy policy here, but this is the key bit:

The surveys and data collected by your surveys and other information you upload is yours. We will not use it or share it in any way.

It’s probably also worth noting that FluidSurveys' servers are physically located in Canada, and so they’re (theoretically!) not subject to Patriot Act snooping.

Meet Siegel and Shuster. 

Meet Siegel and Shuster. 

Once the survey is closed, the data will be downloaded and removed from the online platform. All of the files related to the project will be stored on a pair of encrypted, password-protected hard drives (a working drive and a back-up), which I keep under lock and key.

As mentioned above, the personally identifying contact information (name and email) of those who agreed to grant a follow-up interview will be stripped out. This information will be stored in hard copy and then securely destroyed once the interviews have been arranged. At that point, there will no longer be any way to associate a name with any specific record in the survey data set.

This fully anonymized version of the data is what I’ll be analyzing when I write up the study’s findings. But research reports will remain anonymous and aggregated, so it should not be possible for any reader to identify individual participants from anything we release.

At the study’s end, I also intend to archive the data set for future researchers to use. Professional archivists will be consulted in order to develop an access policy that will balance participants’ privacy rights with the scholarly benefit that can be gained from later analysis.

I hope this helps to explain a bit about why things were set up as they are and to put your mind at ease. Again, if the particular prompts I chose were keeping you from taking the survey or sharing it with your friends, peers, and collaborators, I hope you’ll reconsider. Every completed survey improves the quality of our results and of the information we will eventually be able to give back to the creator community.