Welcome to Half 2 of our NLP collection. For those who caught Half 1, you’ll do not forget that the problem we’re tackling is translating textual content into numbers in order that we are able to feed it into our machine studying fashions or neural networks.
Beforehand, we explored some primary (and fairly naive) approaches to this, like Bag of Phrases and TF-IDF. Whereas these strategies get the job achieved, we additionally noticed their limitations — primarily that they don’t seize the deeper that means of phrases or the relationships between them.
That is the place phrase embeddings are available in. They provide a better strategy to characterize textual content as numbers, capturing not simply the phrases themselves but in addition their that means and context.
Let’s break it down with a easy analogy that’ll make this idea tremendous intuitive.
Think about we need to characterize motion pictures as numbers. Take the film Knives Out for example.
We are able to characterize a film numerically by scoring it throughout completely different options, such…