Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board

Operating experiments is a job that usually falls to knowledge scientists. If that’s you, congrats! It may be a rewarding and high-impact space of labor, but additionally requires instruments discovered exterior the everyday ML-heavy knowledge science curriculum.

Even with the most effective instruments, solely a small share of experiments ship significant enterprise worth. I’ve been fortunate to design and execute many experiments. Of these, I’ve a couple of winners. From these, I’m sharing some tales as an example key ideas associated to experiments.

Background: I work at an organization referred to as IntelyCare. We assist join nurses with varied work alternatives (full-time, part-time, contracts, per-diem… the entire menu).

  • One among our core choices is a nursing-only job board. For those who have a look within the 12 months 2025, you’ll discover two potential methods of sorting jobs by date and by relevance.

Why it issues: The kind-by-relevance function is our present greatest lever to ensure a superb expertise for paying clients. It additionally provides us a chance to enhance the general effectivity of our job board by steering eyeballs away from low-quality jobs.

Sadly, we will’t put each job on the high of a search outcome. We face a tradeoff between the amount of top-page listings and the high quality of the expertise within the type of elevated applies.

The way it works: “Relevance” doesn’t imply what it usually means. Sorry!

We give every job a rating between 0 and 100. When filling a web page with jobs, sorting by relevance means we kind the outcomes by that rating. That’s it! For brevity, we’ll say any job with a rating greater than 0 is “boosted.”

I do know what you’re pondering, “This isn’t relevance!” And also you’re proper, a minimum of within the regular sense of the phrase. The rating doesn’t fluctuate throughout job-seekers or search phrases. A greater title can be “related to Google.” We’re OK with that as a result of an enormous share of our job-board visitors comes from Google, as proven beneath.

Screenshot of Google search results
“Type by relevance” right here is shorthand for “related to Google.” (Picture by writer)

In Math: We now have N jobs. Every single day we generate a vector of N integers between 0 and 100. We feed this vector right into a black field named Google. If we do a superb job, the black field rewards us with many job purposes.

By placing the “proper” jobs on the high of the web page (loaded phrase there), we will enhance upon a chronological kind. Earlier than we will determine the proper jobs, we have to know the way a lot Google truly rewards higher-placed jobs.

Day 0: Making progress when you realize nothing

Generally, simply to justify all of the simplifying assumptions I’m going to make later, I begin a venture by writing down the maths equation I’d like to resolve. I think about ours seems to be one thing like this:

  • S is our vector of relevancy scores. There are N jobs, so every s_i (a component of S) corresponds to a special job. A operate referred to as applies turns S right into a scalar. Every day we’d like to seek out the S that makes that quantity as massive as potential — the relevancy scores that generate the best variety of job purposes for intelycare.com/jobs.
  • applies is a high quality goal operate on Day 0. In a while our goal operate might change (e.g. income, lifetime worth). Applies are simple to depend, although, and lets me spend my complexity tokens elsewhere. It’s Day 0, folks. We’ll come again to those questions on Day 1.
  • Downside. We all know nothing concerning the applies operate till we begin feeding it relevancy scores. 😱

First issues first: Seeing that we all know nothing concerning the applies operate, our first query is, “how will we select an ongoing wave of every day S vectors so we will be taught what the applies operate seems to be like?”

  • We all know (1) which jobs are boosted and when, (2) what number of applies every job receives every day. Observe the absence of page-load knowledge. It’s Day 0! You won’t have all the info you need on Day 0, but when we’re intelligent, we will make do with what now we have.
  • Observe the delicate change in our goal. Earlier, our purpose was to perform some enterprise goal (maximize applies), and ultimately, we’ll come again to that purpose. We took off our enterprise hat for a minute and placed on our science hat. Our solely purpose now could be to be taught one thing. If we will be taught one thing, we will use it (later) to assist obtain some enterprise goal.🤓
  • Since our purpose is to be taught one thing, above all we wish to keep away from studying nothing. Keep in mind it’s Day 0 and now we have no assure that the Google Monster can pay any consideration to how we kind issues. We could as nicely go for broke and ensure this factor works earlier than throwing extra time at enhancing it.

How will we select an preliminary wave of every day S vectors? We’ll give each job a rating of 0 (default rating), and select a random subset of jobs to spice up to 100.

  • Possibly I’m stating the plain, however it must be random if you wish to isolate the impact of page-position on job purposes. We would like the one distinction between boosted jobs and different jobs to be their relative ordering on the web page as decided by our relevance scores. [I can’t tell you how many phone screens I’ve conducted where a candidate doubled down on running an A/B test with the good customers in one group and the bad customers in the other group. In fairness, I’ve also vetted marketing-tech vendors who do the same thing 😭].
  • The randomness will probably be good in a while for different causes. It’s possible that some jobs profit from page-placement greater than others. We’ll have a neater time figuring out these jobs with a giant, randomly-generated dataset.

The plan: Refined however vital particulars

We all know we will’t increase each job. Anytime I put a job on the high of the web page, I bump all different jobs down the web page (basic instance of a “spillover”).

The spillover will get worse as I increase increasingly more jobs, I impose a better and better punishment on all different jobs by pushing them down within the kind (together with different boosted jobs).

  • With little exception, nursing jobs are in-person and native, so any boosting spillovers will probably be restricted to different close by jobs. That is vital.

How will we select an preliminary wave of every day S vectors? (ultimate reply) We’ll give each job a rating of 0 (default rating), and select a random subset of jobs to spice up to 100. The scale of the random subset will fluctuate throughout geographies.

  • We create 4 teams of distinct geographies with roughly the identical quantity of net visitors in every group. Every group is balanced alongside the important thing dimensions we expect are vital. We randomly increase a special proportion of jobs in every group.

Right here’s the way it regarded…

Every day Applies for boosted vs unboosted Jobs. Observe how boosted jobs do higher when there are fewer of them. (Picture by writer)
  • Every black circle represents a special geography. Its elevation reveals the distinction in applies-per-job between boosted jobs and all different jobs (measured as a %).
  • Whereas teams are balanced in mixture, the person geographies fluctuate significantly. The steadiness continues to be vital although. In any other case, what you see within the chart could possibly be an artifact of the combo of city/rural or massive/small geographies in every group. As it’s, we’re assured the outcomes come from our relevancy scores.
  • A fast-and-dirty interpretation of this chart is one thing like, “the 5% of jobs on the high of the web page have ~26% extra applies per day than the 95% of jobs positioned beneath. The ten% of jobs on the high of the web page have ~21% extra applies per day than the 90% of jobs beneath…” and so forth. I’d by no means be so daring as to say that in actual life, however in a perfect-experiment world it might be true.
  • By the point we increase 25% of jobs, the increase expertise is solely averaged out! We diluted the perks of premium placement to virtually nothing for the median geography. “And when everyone seems to be tremendous, nobody will probably be! <evil chortle>.” Are you able to think about studying this the laborious approach?
  • There are a lot of different layers to peel again. Maybe dilution occurs extra rapidly for nursing specialties with many pages of listings? What about states that overlap with our long-standing per-diem staffing enterprise? Many high quality questions, now we have solutions for some, however all greater than I can embrace on this put up.

What comes subsequent? Day 1 is when the actual enjoyable begins! 🎉

  • We now have guardrails towards diluting our premium expertise (tremendous vital), however what’s the greatest ~10% of jobs to spice up every day? Clearly our paying clients have precedence, however then what?
  • Does increase assist some jobs greater than others? The randomly-generated knowledge from our experiment is nicely suited to reply this and lots of different questions. We’ll save these questions for future posts.
  • As soon as now we have a technique for reinforcing, is our goal actually to maximise the whole variety of applies? Or will we solely care concerning the applies for boosted jobs? 🤔 (Generally I miss the Day 0 days when all the roles have been equally related. May be time to revisit these equations on the high of the put up.)

Key takeaways for many who made it this far

  • By being considerate about how we generated our preliminary knowledge, we rapidly discovered a convincing reply to our query, set ourselves as much as reply many future questions, and saved ourselves a ton of time attempting to construct an uplift mannequin on non-existent historic knowledge.
  • Considering of a check? Go for it! For those who execute nicely, you may see the outcomes clearly in a chart and keep away from all of the sophisticated statistics (compulsory xkcd reference). [hmm, maybe *most* of the statistics. I still love a good regression table.]
  • Spillovers are all over the place. Generally various the remedy throughout an aggregated group may also help prefer it did right here. That may rapidly axe your sample-size, however I discover it higher to have a small knowledge set with that means than a giant knowledge set that’s sizzling rubbish.

Bonus: We ran this experiment in 2023. How are issues now?

On the time of our little geo-randomized experiment, you see within the charts that our premium job openings carried out ~25% higher than common jobs (that means they’d 25% extra applies on common).

Why it issues: We’ve taken over a 12 months to develop and iterate our product to make sure our premium listings ship the very best expertise. Taking a look at some latest numbers… (actually operating the queries as I write this)

  • Boosted job openings obtain 425% extra applies than common openings
  • Boosted jobs are 450% extra more likely to have obtain a minimum of one apply in comparison with common openings

Not dangerous! This isn’t randomized, in order that 425% consists of all types of choice bias, extra product work, a crack search engine optimisation staff, and a profitable e-mail operation, all along with the incremental results from premium web page place. Importantly, all the additional product and advertising work is concentrated on a small variety of jobs as our preliminary testing recommends. 🏆