Setup Gensim for learning word2Vec
1 min readSep 6, 2018
Hi! This is very straightforward. I am making a note of my gensim setup for doing word2vec on a small dataset. So here it is,
I will use anaconda for my platform management. so, here is the platform setup,
Installations:
- https://docs.anaconda.com/anaconda/install/mac-os#macos-graphical-install
- Conda update conda
- Conda update anaconda
- conda install -c anaconda gensim
I am going to create one virtual environment for the operation so the dependencies can be managed well.
Virtual Conda Environment
- https://conda.io/docs/user-guide/tasks/manage-environments.html
- conda create -n myenv python=3.6
- Conda install –n myenv gensim
External Packages I need,
- pandas,
- nltk,
- gensim
And here is the code,
import os
import pandas as pd
import nltk
import gensim
from gensim import corpora, models, similarities
os.chdir("/Users/sagir/Documents/data/ai/deeplearning")
df = pd.read_csv('jokes.csv')
x = df['Question'].values.tolist()
y = df['Answer'].values.tolist()
corpus = x + y
print(corpus)
token_corpus = [nltk.word_tokenize(data) for data in corpus]
#-----------------------------------------------
#For trainnig and creating the model
#-----------------------------------------------
model = gensim.models.Word2Vec(token_corpus, min_count=5, size=32)
#-----------------------------------------------
#For Saving the model
#-----------------------------------------------
model.save('jokeModelSaved')
model = gensim.models.Word2Vec.load('jokeModelSaved')
print(model.most_similar('Hi'))
#model.most_similar([0.8904996514320374])
For full implementation and code please visit:
https://github.com/beingsagir/basic-ai-practices/blob/master/gensim-basics/main.py