Setup Gensim for learning word2Vec

Sagiruddin Mondal
1 min readSep 6, 2018

--

Hi! This is very straightforward. I am making a note of my gensim setup for doing word2vec on a small dataset. So here it is,

I will use anaconda for my platform management. so, here is the platform setup,

Installations:

  1. https://docs.anaconda.com/anaconda/install/mac-os#macos-graphical-install
  2. Conda update conda
  3. Conda update anaconda
  4. conda install -c anaconda gensim

I am going to create one virtual environment for the operation so the dependencies can be managed well.

Virtual Conda Environment

  1. https://conda.io/docs/user-guide/tasks/manage-environments.html
  2. conda create -n myenv python=3.6
  3. Conda install –n myenv gensim

External Packages I need,

  1. pandas,
  2. nltk,
  3. gensim

And here is the code,

import os
import pandas as pd
import nltk
import gensim
from gensim import corpora, models, similarities

os.chdir("/Users/sagir/Documents/data/ai/deeplearning")
df = pd.read_csv('jokes.csv')


x = df['Question'].values.tolist()
y = df['Answer'].values.tolist()

corpus = x + y

print(corpus)

token_corpus = [nltk.word_tokenize(data) for data in corpus]

#-----------------------------------------------
#For trainnig and creating the model
#-----------------------------------------------
model = gensim.models.Word2Vec(token_corpus, min_count=5, size=32)

#-----------------------------------------------
#For Saving the model
#-----------------------------------------------
model.save('jokeModelSaved')

model = gensim.models.Word2Vec.load('jokeModelSaved')

print(model.most_similar('Hi'))
#model.most_similar([0.8904996514320374])

For full implementation and code please visit:

https://github.com/beingsagir/basic-ai-practices/blob/master/gensim-basics/main.py

--

--

Sagiruddin Mondal
Sagiruddin Mondal

Written by Sagiruddin Mondal

When my science will force you to ride a roller coaster, my art will be there to sit beside you.

No responses yet