When you run Find Experts, Elicit uses GPT-3 to brainstorm more names and then searches Google, LinkedIn, or MuckRack to get their profiles. GPT-3 will only suggest names that it knows from its training data.

How was GPT-3 trained?

GPT-3 was trained on 499 billion tokens (which are words or parts of words) gathered from scraping webpages, Reddit, Wikipedia, and internet-based books [1]. All the training data for GPT-3 was collected before 2020.

What does this mean for the names that are suggested?

Because of the data it was trained on, GPT-3:

  1. Doesn't know any people or content that was only published on the internet after 2020.
  2. Is more likely to suggest people who have a prolific online presence (as these people show up most frequently in its training data).

What can I do about this?

To get more up-and-coming people:

  1. We're working on a version of Find Experts that gets data from Google Scholar, and will help you find up-and-coming researchers more easily.
  2. You should supplement Elicit with other sources to find people who don't have any internet presence.

To get diverse results:

  1. GPT-3 will pick up on patterns in the examples you give. To make sure you get diverse results, include diverse examples. For example, including more names of women or minorities in the initial list will cause Elicit to return more women and minorities in the final list.