Nikhil Mishra’s Journey to Turning into a Kaggle Grandmaster

Introduction

Have you ever ever participated in a Kaggle competitors? Have you ever ever puzzled what it takes to win one or to develop into a Kaggle Grandmaster? H2O.ai’s Senior Information Scientist, Nikhil Kumar Mishra, just lately achieved the Kaggle Grandmaster title together with his fifth Gold in competitions. He spoke to Analytics Vidhya following the win to share with us his journey, struggles, milestones, and what it’s prefer to be a Kaggle Grandmaster.

Nikhil Mishra’s Journey to Turning into a Kaggle Grandmaster

Key Takeaways

  • Kaggle provides you entry to the most recent applied sciences and strategies to check out for all types of initiatives.
  • Kaggle competitions train you collaboration and assist you to construct a community, create a portfolio, and even discover jobs.
  • If you happen to’re questioning tips on how to begin on Kaggle, simply begin and also you’ll discover your means by way of.
  • One of the simplest ways to realize data and climb up the leaderboard is to undergo the options of earlier competitions and observe them in your information.
  • 3 abilities to achieve a Kaggle competitors: being an early starter, mastering useful resource and time planning, and studying up on analysis papers and options.
  • Nikhil’s course recommendations: Andrej Karpathy’s CS231, Andrew Ng’s programs on machine studying and AI, and Gilbert Strang’s movies on Linear Algebra.

And right here’s the interview.

Analytics Vidhya (AV): Congratulations on successful yet one more Gold after being a Kaggle Grandmaster! So how do you’re feeling proper now, particularly after you bought your golden badge?

Nikhil Mishra (NM): Thank You. I believe it’s been a dream for me for the reason that time I began with information science, which is after I began taking part in competitions. So yeah, it’s lastly a dream come true and I believe it’s the identical feeling for best information scientists on the market after they develop into a Grandmaster – it’s simply pure happiness and pleasure.

AV: What was your journey like and what stored you going for 7 years behind one dream?

NM: I believe my journey is just like many information scientists again at the moment. We began with Andrew Ng’s well-known Machine Studying course, which everybody mentioned ‘If you realize this, you most likely know greater than what half the engineers know’ or so, which was motivational for us. Across the similar time, I found that information science competitions have been a great way to earn cash – though I by no means made any cash within the first 3 or 4 years.

There have been hackathons happening in faculty at the moment. And though I used to be not too good at these hackathons, I used to be occupied with information science. So I began taking part in information science competitions on platforms like Analytics Vidya and Kaggle clearly. That’s the place I got here throughout folks like Rohan Rao, SRK, Sahil Verma, and Mohsin – who have been all No.1 on Analytics Vidhya at the moment. I noticed them doing properly in nearly each competitors and felt if they may do it, then possibly even I may. So, that simply stored me going.

I’m not going to lie, initially, it was the cash that received me into competing. However even if you lose you be taught one thing from it. And if you win, you make investments it again in – purchase extra GPUs, or extra cloud computing time, or a greater system. It’s a cycle of investing and earning profits out of it.

The opposite motivation is the chance to check out the most recent expertise within the subject and find out about information science because it evolves. Kaggle competitions allow you to try this they usually additionally train you issues that you could be later use in your work as properly. So, I assume, that’s what retains me going.

AV: Do you bear in mind your first competitors?

NM: I most likely don’t bear in mind my first competitors a lot, however I do bear in mind one competitors vividly, which I critically took half in Kaggle for a month and a half. It was a Microsoft Malware Prediction competitors during which we have been positioned twenty fifth. What makes it memorable is that it was the primary time I collaborated with so many individuals, and that too from completely different international locations.

Considered one of my teammates was from Vietnam, one other was from England, and the third was from the US. Additionally, they have been all very senior to me. Seeing this side of competitions, the place you get to collaborate with folks everywhere in the world, and be taught from them – was additionally very motivating for me.

AV: And what did your first win really feel like?

NM: My first win, I believe 4000 or 5000 rupees, which felt okay. However seeing your self on the highest of the leaderboard for the primary time, that too after so many days, so many makes an attempt – that was one thing. I believe there have been 3 or 4 instances earlier than that after I got here within the high 2 or high 3, and even No. 1 on the general public leaderboard. However then I stored falling on the non-public leaderboard. So lastly after I got here on high of the non-public leaderboard, it was a surreal feeling. It was like, “Okay, even I can do that!”

AV: What are the three biggest stuff you’ve discovered out of all these competitions?

NM: Firstly, as I discussed, Kaggle competitions are very a lot about collaboration. I believe if you collaborate with folks from completely different components of the world or completely different walks of life, you get to be taught so much. You get to see by way of different folks’s minds – how they assume, how they attempt to clear up issues. And if you put that into your personal methods, I believe it makes you 4x or 5x of what you already are.

The second factor about competitions which I actually like is that it’s important to strive quite a lot of issues in a really brief time frame. That basically helps you evolve as a information scientist. You see, in a lot of the initiatives we do ourselves, we’ve got quite a lot of time to work, however we don’t have some leaderboard to race towards. So we often take it slowly. We strive a number of experiments and see in the event that they work or reset until we’re glad with the outcomes. However for competitions, you’ve got so many alternative issues to strive in a really brief time frame. So the learnings you get in a contest are rather more and a lot better than if you do exactly issues by your self at work.

The third factor that I believe these competitions actually assist with, is your profession. A minimum of for me, my complete journey, all the roles I received, have been all due to the references that I did properly in competitions. They have been as a result of folks knew me from competitions they usually noticed that I used to be good at competitions. It helped me construct a very good community of useful information scientists and buddies. That’s an ideal takeaway for novices and aspiring information scientists.

AV: How comparable are these Kaggle competitions to real-world information science or AI initiatives?

NM: As I discussed earlier, In Kaggle competitions you continually must evolve in a really brief time frame since you’re racing towards lots of people and even the smallest variations matter. However in the true world, you don’t know the boundaries, and doubtless you would possibly get glad after reaching some sure accuracy in your mannequin. And you then say okay, ‘that is sufficient.’ However for a contest, you’ll must continually check out quite a lot of issues; you’ll must continually push your self to be higher. And after you compete on a number of platforms, you’ll really feel that the initiatives in the true world develop into rather more less complicated to you as a result of you realize what to try to what’s going to work, as a result of you’ve got tried it earlier than.

One other factor is, in Kaggle, it’s at all times in regards to the state-of-the-art options. Even when the issues are easy, the options are cutting-edge or beating edge. You’ve the perfect and newest applied sciences at your fingertips to check out and see in the event that they work. That’s one actually large benefit of Kaggle, which you don’t get in any other case.

You’ll even get to reinvent, say, some architectures if you happen to discuss deep studying, or strive some actually fancy methodology and share it after the competitors. So when any drawback of the same area involves you at work, it turns into very simple.

AV: How has the extent of Kaggle competitions modified over time?

NM: Once I initially began it was principally about structured information issues, and I believe the competitors was comparatively simpler in comparison with what it’s now. Not taking something away from the individuals who have performed it earlier than, they too have labored actually laborious. However I believe it’s a lot harder now to safe a very good place as in comparison with, say, six or seven years again. There are much more folks actively taking part on Kaggle now, which makes it more difficult. Additionally, the form of assets that have been out there to us again then is way completely different than what we’ve got now.

AV: You’ve received round 18 competitions by yourself and 32 as a part of groups. How completely different is your preparation or expertise in the case of a solo competitors vs working with a staff?

NM: I believe In solo competitions, proper from the start, it’s important to strive issues by yourself. You’ll must map out the way you need issues to go. As an illustration, if it’s a three-month competitors on Kaggle, you’ll must resolve tips on how to progress, what sort of experiments you need to strive, and the way you’ll put them collectively on the finish, if you solely have one or two weeks left. In solo competitions, all of this solely will depend on you.

Once you work with groups, if you happen to get caught someplace or can’t discover one thing, there’s at all times a teammate who’ll discover it or information you. Additionally, it provides you quite a lot of publicity to how different folks assume and the way the identical drawback might be approached in a different way. Every individual within the staff may have their very own means of coding and their mind-set. The training is extra on this case. The competitors additionally turns into comparatively simpler since you break up the work and energy, and it’s extra thrilling to see how all our completely different concepts come collectively on the finish.

AV: Do you like engaged on structured or unstructured datasets?

NM: Once I started taking part in Information Science competitions, a lot of the issues on Kaggle and even on Analytics Vidhya have been on structured information. So I developed a knack for fixing these. So, not speaking about choice, however I’m positively a lot better at fixing structured information issues. However I’ve received 2 or 3 gold medals in traditional sequence issues, which aren’t fully structured. So I assume I deal with unstructured datasets fairly properly too. I positively need to evolve extra in them although.

AV: Do you like engaged on a neighborhood workstation or a cloud system on your competitions?

NM: I believe in my preliminary days, say, from 2018 to 2021, you would simply handle most competitions on a neighborhood workstation, or possibly with a extremely high-end laptop computer. However now, a lot of the competitions require quite a lot of assets.

See, the variety of assets that you just’ll want initially of the competitors is so much completely different than in the direction of the top of the competitors. In the direction of the top, you need to strive quite a lot of concepts collectively and run some large experiments. And for that you will want larger assets, like what a cloud setup can present. However that requires an enormous funding, which I really feel will ultimately repay if you win competitions.

AV: There are completely different phases of a contest proper – the place you first do the planning, then check out a number of issues, after which convey collectively all of the concepts that work, and so forth. So, what a part of a contest do you assume takes probably the most period of time?

NM: So, if you happen to break up a three-month competitors – the time we spend each month is equal. However talking of the hassle we put in as information scientists, I believe it’s probably the most in the course of the finish of the competitors. Within the final one or two weeks, our effort is double, or triple, and even 10 instances extra as in comparison with the remainder of it.

At the start of the competitors, we’re all chill, simply eager about which experiments to run. After which we take a look at them out slowly and observe the outcomes. Within the center, we check out completely different concepts, change some parameters, and determine what works. However by the top, we’ve got tons of of concepts to try to solely 10 days left! Then it’s principally simply sleepless nights and coffees.

NM: It’s quite a lot of enjoyable to have interaction in Kaggle dialogue boards and even on LinkedIn or Twitter. We share a few of our concepts and updates on the place we’re on the leaderboard. We generally even problem one another on social media.

Other than that, I believe the learnings shared by the Kaggle group are fully completely different from what you discover on every other platform. The wealth of information you get from these discussions and the options on the finish of competitions could be very beneficial. On Kaggle, you could find the most recent paper on state-of-the-art expertise or a extremely fancy method you might need to strive. Additionally, you will discover the outcomes of experiments tried out by completely different folks and the completely different approaches they take. All of that provides to who you might be as a knowledge scientist. And the perfect half I believe is that it’s fully open for anyone to entry.

Then once more, if you compete, you discover teammates from world wide who share their data with you. That additionally helps you along with your networking and future jobs, which I believe is an enormous bonus for aspiring and upcoming information scientists.

AV: What recommendation would you give to novices who’re simply beginning their Journey?

NM: Most novices hold questioning tips on how to begin on Kaggle, and I inform them that a very powerful half is to begin. It’s not about the way you begin, what’s essential is that you just begin. When you begin, you’ll ultimately discover your means.

The opposite concern I usually hear from novices is that they get low ranks though they compete so much. Hear me out – that’s how it’s for most individuals.

Even if you happen to examine my profile, you’ll see that my first few competitions have been actually dangerous. However that’s the way you begin, and from there you’ll evolve. Now, tips on how to get higher and enhance this? Learn options from previous competitions and attempt to implement them by yourself. Maintain doing this and also you’ll discover that your ranks enhance. It positively requires that effort out of your finish.

That’s what I did. I’d go loopy experimenting and attempting out previous options. This helped me perceive how others assume and the way they go about fixing issues. All of that added to my expertise and step by step helped me transfer up the leaderboard.

AV: In your opinion, what are the three predominant abilities required to achieve a Kaggle competitors?

NM: The very first thing is in case you are beginning in a Kaggle competitors, begin early. Most competitions are 3 months lengthy and beginning early provides you ample time to experiment, run exams, and do rather well on a mission.

The second factor is to plan out your time rather well. Kaggle competitions are all about doing good experiments and doing quite a lot of experiments. If you wish to try this, it’s essential to plan out what sort of experiments you need to try to determine tips on how to make your iteration quicker. You can do that by sampling the info, by way of higher allocation of the assets, and so on.

The third factor I believe you must do is quite a lot of studying. This may very well be the most recent analysis papers, or options of earlier issues, or simply skimming the web to see what’s new. And as you learn, see how you should use these new fashions and strategies in your initiatives. Maintain asking your self, Can I take advantage of that mannequin? Can I prepare it on my information? What sort of outcomes would I get? and so forth.

That being mentioned, one can not keep up to date on every part, on a regular basis. You possibly can achieve surface-level data of the most recent giant language fashions and applied sciences from studying, and in addition from the dialogue boards on Kaggle. From that, it’s essential to decide what matters to concentrate on and discover them additional, relying in your mission or work. However even that surface-level data will assist you to keep forward within the competitors.

AV: You’ve a full-time job and you’ve got these competitions on the aspect. How do you handle all of it? What’s your typical day like?

NM: Fortunately for me, my firm actually motivates everybody to take part in competitions. A lot, that it has its personal staff of Grandmasters! So my work and colleagues actually encourage me and recognize me after I do properly in competitions.

My standard day throughout competitions would principally be in entrance of two screens – one for work and the opposite working experiments for the competitors. However over the last a part of the competitors, it’s simply sleep-competitions-eat-repeat! Throughout that point, the remaining and enjoyable a part of life goes on maintain. That’s the one lodging I’ve to make.

AV: How usually do you compete? What number of competitions do you take part in yearly?

NM: I believe by now I’d have participated in over 100 competitions. Now that I’m at H2O, I’m extra actively taking part – so, about 20-25 competitions per 12 months. Clearly, on Kaggle you can’t take part in additional than 5-6 competitions as a result of size. However there are platforms with smaller competitions lasting every week or two, and even over weekends.

AV: Talking of H2O; what’s it prefer to work alongside a bunch of different Kaggle Grandmasters?

NM: It’s actually motivating if you work with people who find themselves rather more proficient than you and even some who have been your Idols if you started your journey. Again in 2019, there was a convention close to my faculty, the place Rohan Rao was one of many audio system, and Sanyam Bhutani was an organizer. At the moment, they didn’t even know me and I simply attended as a university pupil. And now I’m taking part with Rohan frequently.

It’s a distinct feeling if you get to work alongside such folks. And they’re continually pushing the boundaries at work whereas doing rather well in competitions. When you’ve got such an ideal circle to work with, it positively pushes you.

AV: Talking of idols, who do you see as an inspiration within the trade?

NM: For me, like I mentioned, in my preliminary years of competing, Rohan, SRK, Sahil, Mohsin – all of those folks have been those who actually impressed me. I’ve discovered so much from no matter they’ve posted – be it articles or notebooks, or options to issues.

Throughout my faculty time, there was Josh Starmer, whose brief movies helped me be taught issues rapidly and put together for faculty exams and interviews. These days there are quite a lot of good YouTubers like 3Blue1Brown who put up attention-grabbing and informational content material. There’s Andrej Karpathy educating about LLMs and the world is transferring in the direction of open sourcing the data hidden behind LLMs. So there’s data and inspiration all over the place!

Don’t miss out the chance to be taught to construct a ChatGPT-style language mannequin from Josh Starmer on the DataHack Summit 2024!

AV: What are your greatest assets (books/instruments/programs) which have helped you increase your data in information science and machine studying?

NM: Other than studying dialogue boards, as I discussed earlier, I prefer to learn analysis papers, which is now simpler than ever, due to instruments like ChatGPT. That retains me up to date with the most recent developments in machine studying.

I haven’t actually learn many books, however I’m certain these are nice sources of information too. I favor articles posted on Twitter or Reddit since you get them as quickly as one thing new comes out.

For programs, I’d positively advocate Andrej Karpathy’s CS231 and Andrew Ng’s programs on machine studying and AI. Even Gilbert Strang’s movies on Linear Algebra, I believe are fairly useful.

And for aggressive information science particularly, I counsel you learn the options to earlier issues and get the most recent updates from analysis papers.

NM: I don’t assume I ready myself for this query. Nicely, I’m typically occupied with multimodal LLMs. Other than that, I examine Agentic AI. I attempt to learn the way we are able to use completely different brokers to automate our duties. Then, if I begin with a Kaggle competitors, I get occupied with figuring out extra in regards to the LLMs or generative AI associated to that drawback.

AV: Now that you just lastly achieved the Grandmaster standing, what are your subsequent targets and initiatives?

NM: I used to be speaking to Nischay about this the opposite day. He’s a buddy and I compete so much with him. So, I used to be telling him now that I’ve come within the high 100, on the 63rd rank, his being fifth on the earth pushes me to take part extra and get higher. So I’m positively wanting ahead to extra competitions and pushing myself to be within the high 10 or high 20 by subsequent 12 months.

I haven’t actually set targets for the far future, however I’d positively prefer to hold taking part in competitions and construct some actually good AI merchandise. I additionally hope to make some good open supply contributions sooner or later.

Conclusion

With 6 gold, 9 silver, and a bronze medal below his belt, Nikhil Kumar Mishra lastly earned his Kaggle Competitions Grandmaster title! On this interview, he instructed us how Kaggle as a platform helps information scientists showcase their abilities, be taught from others, and deal with real-world issues. He additionally shared with us some nice suggestions and course suggestions for people who find themselves simply beginning out their Kaggle or information science journeys.

Nonetheless, approaching Kaggle competitions might be overwhelming, particularly for novices with restricted area data. That can assist you out, we’re bringing you Kaggle Grandmaster Nischay Dhankhar for a GenAI Hack Session on “Mastering Kaggle Competitions: Methods and Methods for Success,” Don’t miss out on this nice alternative on the DataHack Summit 2024!

Leave a Reply