{"id":36806,"date":"2023-10-06T05:08:24","date_gmt":"2023-10-06T12:08:24","guid":{"rendered":"https:\/\/coderpad.io\/?p=36806"},"modified":"2023-10-23T10:33:12","modified_gmt":"2023-10-23T17:33:12","slug":"8-red-flags-to-watch-out-for-when-hiring-data-scientists","status":"publish","type":"post","link":"https:\/\/coderpad.io\/blog\/data-science\/8-red-flags-to-watch-out-for-when-hiring-data-scientists\/","title":{"rendered":"8 Red Flags To Watch Out For When Hiring Data Scientists"},"content":{"rendered":"\n<figure class=\"wp-block-pullquote has-large-font-size\"><blockquote><p>&#8220;<strong>Analytics is 50% math and 50% communication. If a person cannot express their ideas in written or presentation format, it doesn&#8217;t matter if they can do the math.<\/strong>&#8220;<\/p><cite>Mia Umanos, CEO of Clickvoyant<\/cite><\/blockquote><\/figure>\n\n\n\n<p>A CV filled with impressive credentials can capture attention, but it&#8217;s the subtleties during an interview that reveal the most about a candidate. Unlike many other fields, data science requires a unique blend of technical expertise, business acumen, and interpersonal skills.<\/p>\n\n\n\n<p>Every hiring manager knows the gravity of a wrong hire, especially in a domain as critical as data science.&nbsp;<\/p>\n\n\n\n<p>A misfit can not only hinder project progress but can also disrupt team dynamics, making the interview process all the more crucial.&nbsp;<\/p>\n\n\n\n<p>It&#8217;s not just about assessing the candidate&#8217;s knowledge of algorithms or programming languages, but also understanding their problem-solving approach, communication style, and adaptability.<\/p>\n\n\n\n<p>In this landscape, knowing what to look for during interviews becomes paramount. Red flags can sometimes be subtle, easily masked by a candidate&#8217;s confidence or eloquence. However, a keen eye can spot these signs, which often hint at deeper underlying issues.<\/p>\n\n\n\n<p>While no single interview technique guarantees a perfect hire, being aware of potential pitfalls can significantly enhance the hiring process&#8217;s effectiveness.&nbsp;<\/p>\n\n\n\n<p>This post delves into eight red flags that job candidates might display during data science interviews, helping you make informed decisions and securing the best talent for your team.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"build-model\">1. They build models without business context<\/h2>\n\n\n\n<p>Many technical projects for data science interviews involve having the candidate <a href=\"https:\/\/coderpad.io\/blog\/data-science\/from-data-chaos-to-actionable-insights-my-teams-quest-to-hire-our-first-data-scientist\/\">working with real or simulated data to help solve an actual business problem<\/a> that the hiring company may face.<\/p>\n\n\n\n<p>This is a great way to see how a candidate would work on your team by seeing what actual insights they can drive, given some information about your business.&nbsp;<\/p>\n\n\n\n<p>However, some candidates will ignore the business problem and instead focus on showing off their modeling skills in an effort to show you what kind of insights they can deliver with the little bit of information you gave them.<\/p>\n\n\n\n<p>The problem with this is precisely that they\u2019re only working with<em> a little bit <\/em>of information.&nbsp;<\/p>\n\n\n\n<p>Unless you\u2019ve spent a few hours with them going over your business model, all the various pieces of data you collect, the nuances of the business, and all the relevant business contexts of the data, then their model is going to be useless at best or drive harmful business decisions at worst.&nbsp;<\/p>\n\n\n\n<p>When candidates create predictive models without knowing the business, they display a lack of humility and an inclination to jump to conclusions based on possibly faulty assumptions.&nbsp;<\/p>\n\n\n\n<p>This careless behavior can waste a lot of resources for your team and your business.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. They show a lack of curiosity about stakeholders<\/h2>\n\n\n\n<p>This is a requirement for every data role \u2013 a data scientist who doesn\u2019t understand internal stakeholders and customers will fail to produce valuable data insights.<\/p>\n\n\n\n<p>The logic behind this is similar to the first red flag. Without learning about how the business operates and who the primary users are, the candidate is forced to rely upon assumptions about the context of the data within the company.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><a href=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2023\/10\/img_651d86478d3e4.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2023\/10\/img_651d86478d3e4.png\" alt=\"A four panel comic of a man next to a whiteboard. For the first three panels, who looks confident. In the first the whiteboard says &quot;i'll create amazing dashboards for your stakeholders&quot;. The next says &quot;they'll use advanced predictive modeling techniques.&quot; the third says &quot;all without stakeholder input&quot;. in the last panel the man looks unsure as he looks at the whiteboard that again says &quot;all without stakeholder input&quot;.\"\/><\/a><\/figure>\n<\/div>\n\n\n<p>Without input from the people actually utilizing the data, this candidate would be working in a black box with zero feedback from others. That\u2019s a recipe for disaster and will undoubtedly lead to useless data insights.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. They seem unwilling to learn and grow<\/h2>\n\n\n\n<p>Data science is a healthy balance of programming, stakeholder communication, good judgment, and some applied statistics.&nbsp;<\/p>\n\n\n\n<p>No matter how senior, the candidate should show a willingness to improve those skills.<\/p>\n\n\n\n<p>You can usually gauge this in an interview by asking them what they\u2019re currently learning about or about a lesson they recently learned based on a mistake they made. If they are unwilling to learn or can\u2019t tell you a story about improving on their mistakes, that is a noticeable red flag.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. They are unwilling to receive feedback<\/h2>\n\n\n\n<p>Often, a data scientist is a black box to stakeholders. \u201cI don\u2019t know how they do it, but they make these models that predict the future, and it\u2019s basically magic to me, but it works\u201d is a sentiment a data scientist has likely heard at least once.<\/p>\n\n\n\n<p>Data scientists, then, have to accept that stakeholders will regularly ask them to explain their output and conclusions in an easy-to-understand way \u2013 this is especially true when they deliver insights that go against common business intuition.<\/p>\n\n\n\n<p>They will need to be able to field these kinds of curious questions as well as handle constructive feedback from others. If they\u2019re unable to respond to these kinds of responses \u2013 they shut down or react with defensive anger \u2013 then this shows an unwillingness to either defend their ideas or have the humility to admit that they might be wrong.<\/p>\n\n\n\n<p>You also probably won\u2019t see these candidates interested in working with other teams or seeking feedback about their work if they were to join your team. Be careful if you choose to hire them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5. They are unable to communicate with non-technical stakeholders<\/h2>\n\n\n\n<p>Non-technical stakeholders will always be involved in some aspect with data decision-makers \u2013 whether as the consumer of the data insights or someone responsible for sharing the context behind a new data source.<\/p>\n\n\n\n<p>The ability to break down very technical information into a format that won\u2019t overload people is crucial.&nbsp;<\/p>\n\n\n\n<p>Some stakeholders won\u2019t have the knowledge base to understand (or care to understand) the statistical methods behind your conclusion.<\/p>\n\n\n\n<p>&nbsp;Frequently, they\u2019re busy enough that they just want to know what insights your candidates will be able to provide them to make their lives or the finances of the company better.<\/p>\n\n\n\n<p>Data science candidates should be willing and able to break down complex information for teams outside their own \u2013 whether for accounts payable, sales, marketing, or any other department that needs to utilize the information.<\/p>\n\n\n\n<p>If candidates can\u2019t do that, it\u2019s a massive red flag because it means they likely won\u2019t be able to hold on to stakeholder trust for long. If stakeholders don\u2019t trust the source of their new insights, they won\u2019t be willing to take action on it, and you have a big problem.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\ud83d\udd16&nbsp;<strong>Related resource<\/strong>: <a href=\"https:\/\/coderpad.io\/blog\/data-science\/mastering-jupyter-notebooks-best-practices-for-data-science\/\">Mastering Jupyter Notebooks: Best Practices for Data Science<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">6. They cannot justify their technical decisions<\/h2>\n\n\n\n<p>Just like being unable to communicate with non-technical stakeholders, if a candidate can\u2019t describe and reasonably defend their choices at a technical level, then they will not be able to hold on to technical stakeholder trust.<\/p>\n\n\n\n<p>A data scientist should be able to describe steps taken to clean, transform, and operate over data at a reasonably technical level. Some examples candidates could use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>With the help of a developer or data engineer, they used SQL queries to clean up the data they wanted to use.<\/li>\n\n\n\n<li>They noticed a heavy imbalance of labeled data for model training, so they added medians wherever they found missing data (~&lt;15% of all rows). They explain that doing this mitigates bias in the final model predictions.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>They artificially standardized the metric they want to predict to a scale between 0 and 1 so that they have more easily interpretable prediction output.<\/li>\n\n\n\n<li>They coded all categorical columns into a sparse dataset of 0s and 1s to include non-numerical predictors in the model, some of which help raise prediction accuracy.<\/li>\n<\/ul>\n\n\n\n<p>Fortunately, this red flag is pretty easy to detect in an interview \u2013 you set up your question in something like <a href=\"https:\/\/coderpad.io\/use-case\/jupyter-notebook-data-science-interview\/\">a Jupyter Notebook<\/a>, hand it off to the candidate, and then have them walk you through the logic behind their models or algorithms. You can ask them to explain things that don\u2019t make sense to you or that you would have done differently, and if they can\u2019t explain it, you may want to move on to the next candidate.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">7. They lack proficiency in SQL and don\u2019t understand databases<\/h2>\n\n\n\n<p>This goes along with the previous point, but <em>anyone<\/em> working in data should understand how to query that data and how it is collected and stored.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><a href=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2023\/10\/img_651d86497bd0d.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2023\/10\/img_651d86497bd0d.png\" alt=\"Panel comic. On the top a man is speaking to an audience and says &quot;who wants to be a data scientist?&quot;; everyone in the crowd has their hand raised. On the bottom the speaker now says &quot;who wants to learn sql?&quot;, and no one in the crowd has their hand raised.\"\/><\/a><\/figure>\n<\/div>\n\n\n<p>For junior candidates, you may want to include a few SQL questions in the coding portion of the interview. For more senior candidates, a few verbal questions about database design or query structure should suffice \u2013 they may be insulted if you hand them a technical question that is too simple. It is a waste of their time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8. They have shiny object syndrome<\/h2>\n\n\n\n<p>Part of the appeal of getting into data science these days is all the new technologies and tools you work with.<\/p>\n\n\n\n<p>That\u2019s fine. In fact, that curiosity can be a boon to your team.&nbsp;<\/p>\n\n\n\n<p>However, data scientist candidates should also be willing to show how they\u2019ve done the tedious but essential grind work that often comes with algorithm and model development.<\/p>\n\n\n\n<p>If they\u2019re always worried about learning the newest technologies at the expense of doing necessary work (i.e., shiny object syndrome), you may want to pass on adding them to your team.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Detecting these red flags in an interview is easier than you think.<\/p>\n\n\n\n<p>You can easily test the candidate\u2019s ability to understand and communicate data and basic programming skills by using a tool like Jupyter Notebooks in your interviews.<\/p>\n\n\n\n<p>CoderPad has an integration that allows you to do just that \u2013 check out the pad below for an example question you can use in your own data science interviews.<\/p>\n\n\n<div\n\tclass=\"sandbox-embed responsive-embed  sandbox-embed--full-width\"\n\tstyle=\"padding-top: 125%\"\ndata-block-name=\"coderpad-sandbox-embed\">\n\t<iframe src=\"https:\/\/embed.coderpad.io\/sandbox?question_id=257738&#038;use_question_button\" width=\"640\" height=\"800\" loading=\"lazy\" aria-label=\"Try out the CoderPad sandbox\"><\/iframe>\n<\/div>\n\n\n\n<p><em>Some parts of this blog post were written with the assistance of <\/em><a href=\"https:\/\/coderpad.io\/resources\/docs\/interview\/pads\/chatgpt-integration\/\"><em>ChatGPT<\/em><\/a><em>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Don&#8217;t let these warning signs pass you by.<\/p>\n","protected":false},"author":12,"featured_media":36876,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[72],"tags":[],"persona":[27],"blog-programming-language":[],"keyword-cluster":[69],"class_list":["post-36806","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science"],"acf":[],"_links":{"self":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/36806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/comments?post=36806"}],"version-history":[{"count":23,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/36806\/revisions"}],"predecessor-version":[{"id":37073,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/36806\/revisions\/37073"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/media\/36876"}],"wp:attachment":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/media?parent=36806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/categories?post=36806"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/tags?post=36806"},{"taxonomy":"persona","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/persona?post=36806"},{"taxonomy":"blog-programming-language","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/blog-programming-language?post=36806"},{"taxonomy":"keyword-cluster","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/keyword-cluster?post=36806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}