Introduction

Word Embedding is a technique in NLP which maps the words or phrases from the vocabulary to vectors of real numbers. Word Embeddings help represent words in a vector space of D dimensions, where D can be chosen by you. This vector representation can be used to perform mathematical operations on words, find word analogies, perform sentiment analysis, etc.

The most basic embedding that is widely used is the One Hot Encoding technique, which represents categorical features in vector space by dedicating each word a column. This One-Hot Encoded vector is of size N x V, where N is the number of observations and V is the vocabulary size.

Word embeddings have been shown to boost the performance fn NLP tasks such as syntactic parsing and sentiment analysis.

There are many techniques to create Word Embeddings. Some of the popular ones are:

  • Binary Encoding
  • TF Encoding
  • TF-IDF Encoding
  • Word2Vec Encoding
    • Skip-Gram
    • CBOW (Continuous Bag of Words)
  • FastText

Different Ways of Using Word Embeddings:

  1. Learning the Embedding The embeddings can be learnt from the corpus but a large amount of text data is required to ensure that useful embeddings are learned. Word Embeddings can either be trained using a standalone language model algorithm like Word2Vec, GLoVe, etc., which proves more useful in case we want to use the embeddings in multiple models, or we can train the embeddings as a part of a task-specific model like classification, the main issue of this method is that the learnt embeddings are only specific to the task at hand and thus can’t be reused.

  2. Reusing Pretrained Embedding Most of the word embeddings trained by researchers using the above-mentioned algorithms are available for download and can be used in projects depending on the license of embeddings. The embeddings can be reused either by keeping them as non-trainable in your models if you want to use for general tasks for which these embeddings have been trained for, or you can allow the embeddings to be updated which gives better results for the task at hand.

Lets step into live demo for to create word embeddings.

Loading the Data

We will be working with the Amazon Reviews dataset that was downloaded from kaggle. The data has labels assigned for sentiment of the review i.e. Positive or Negative Review. We will be going through the reviews and try creating embeddings for the reviews. We will only be using 1K observations for this exercise. You can use as much as your machine permits you to run without crashing.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import spacy
nlp=spacy.load('en_core_web_md')
df=pd.read_csv('data/amazonreviews.tsv',sep='\t')
df.head()
label review
0 pos Stuning even for the non-gamer: This sound tra...
1 pos The best soundtrack ever to anything.: I'm rea...
2 pos Amazing!: This soundtrack is my favorite music...
3 pos Excellent Soundtrack: I truly like this soundt...
4 pos Remember, Pull Your Jaw Off The Floor After He...
from sklearn.preprocessing import LabelEncoder
enc=LabelEncoder()
X=df.review.values[:1000]
y=enc.fit_transform(df.label)[:1000]

Basic Preprocessing

Creating Stopwords Corpus

## Combining Spacy, NLTK, and WordCloud Stopword List
from nltk.corpus import stopwords
from wordcloud import STOPWORDS
stopword_corpus=set(nlp.Defaults.stop_words)
stopword_corpus=stopword_corpus.union(set(STOPWORDS))
stopword_corpus=stopword_corpus.union(set(stopwords.words('english')))

Preprocessing Text

  • Removing Punctuations
  • Lemmatizing
  • Convertig to Lower Case
  • Removing Pronouns
  • Removing Urls
reviews=[]
for t in X:
    reviews.append([i.lemma_ for i in nlp(t.lower()) if not i.is_punct and 
                    i.pos_!='PRON' and 
                    not i.like_url and 
                    i.text not in stopword_corpus])

Tokenizing the Reviews

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer=Tokenizer(filters=[])
tokenizer.fit_on_texts(reviews)

word2id = tokenizer.word_index
word2id['PAD']=0

id2word={v:k for k,v in word2id.items()}

max_len=max([len(s) for s in reviews])

tokenized_reviews=[]
for r in reviews:
    tokenized_reviews.append([tokenizer.word_index[w.lower()] for w in r])
tokenized_reviews_idx=pad_sequences(tokenized_reviews,maxlen=max_len)
V=len(word2id)

TF (Count Vectorizer)

Word vectors by counting contexts

So how do we turn this insight from the Distributional Hypothesis into a system for creating general-purpose vectors that capture the meaning of words? Maybe you can see where I’m going with this. What if we made a really big spreadsheet that had one column for every context for every word in a given source text. Let’s use a small source text to begin with, such as this excerpt from Dickens:

It was the best of times, it was the worst of times.

Such a spreadsheet might look something like this:

dickens contexts

The spreadsheet has one column for every possible context, and one row for every word. The values in each cell correspond with how many times the word occurs in the given context. The numbers in the columns constitute that word’s vector, i.e., the vector for the word of is

[0, 0, 0, 0, 1, 0, 0, 0, 1, 0]

Because there are ten possible contexts, this is a ten dimensional space! It might be strange to think of it, but you can do vector arithmetic on vectors with ten dimensions just as easily as you can on vectors with two or three dimensions, and you could use the same distance formula that we defined earlier to get useful information about which vectors in this space are similar to each other. In particular, the vectors for best and worst are actually the same (a distance of zero), since they occur only in the same context (the ___ of):

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

Of course, the conventional way of thinking about “best” and “worst” is that they’re antonyms, not synonyms. But they’re also clearly two words of the same kind, with related meanings (through opposition), a fact that is captured by this distributional model.

from sklearn.feature_extraction.text import CountVectorizer
# Initialize a CountVectorizer object: count_vectorizer
count_vec = CountVectorizer(stop_words="english", analyzer='word', 
                            ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None)

# Transforms the data into a bag of words
count_train = count_vec.fit([" ".join(r) for r in reviews])
bag_of_words = count_vec.transform([" ".join(r) for r in reviews])

# Print the first 10 features of the count_vec
# print("Every feature:\n{}".format(count_vec.get_feature_names()))
# print("\nEvery 3rd feature:\n{}".format(count_vec.get_feature_names()[::3]))
print("Vocabulary size: {}".format(len(count_train.vocabulary_)))
print("Vocabulary content:\n {}".format(count_train.vocabulary_))
Vocabulary size: 6687
Vocabulary content:
 {'stun': 5692, 'non': 4032, 'gamer': 2545, 'sound': 5498, 'track': 6069, 'beautiful': 626, 'paint': 4232, 'senery': 5219, 'mind': 3827, 'recomend': 4811, 'people': 4321, 'hate': 2779, 'vid': 6383, 'game': 2544, 'music': 3943, 'play': 4422, 'chrono': 1086, 'cross': 1443, 'good': 2632, 'away': 546, 'crude': 1449, 'keyboarding': 3358, 'fresh': 2488, 'step': 5617, 'grate': 2670, 'guitar': 2720, 'soulful': 5495, 'orchestra': 4141, 'impress': 3044, 'care': 942, 'listen': 3543, 'soundtrack': 5501, 'read': 4770, 'lot': 3597, 'review': 4972, 'say': 5124, 'figure': 2322, 'write': 6626, 'disagree': 1713, 'bit': 703, 'opinino': 4132, 'yasunori': 6651, 'mitsuda': 3865, 'ultimate': 6180, 'masterpiece': 3722, 'timeless': 6006, 'year': 6654, 'beauty': 629, 'simply': 5354, 'refuse': 4841, 'fade': 2228, 'price': 4570, 'tag': 5829, 'pretty': 4566, 'staggering': 5574, 'buy': 893, 'cd': 983, 'money': 3889, 'feel': 2298, 'worth': 6610, 'penny': 4318, 'amazing': 315, 'favorite': 2279, 'time': 6005, 'hand': 2743, 'intense': 3163, 'sadness': 5077, 'prisoner': 4586, 'fate': 2273, 'mean': 3745, 'hope': 2914, 'distant': 1780, 'promise': 4625, 'girl': 2600, 'steal': 5609, 'star': 5584, 'important': 3038, 'inspiration': 3136, 'personally': 4350, 'teen': 5880, 'high': 2855, 'energy': 2027, 'scar': 5134, 'dreamwatch': 1865, 'chronomantique': 1087, 'indefinably': 3078, 'remeniscent': 4887, 'trigger': 6119, 'absolutely': 155, 'superb': 5745, 'probably': 4593, 'composer': 1254, 'work': 6595, 'hear': 2805, 'xenogears': 6638, 'sure': 5765, 'twice': 6162, 'wish': 6564, 'excellent': 2131, 'truly': 6139, 'enjoy': 2040, 'video': 6384, 'relaxing': 4862, 'peaceful': 4300, 'disk': 1753, 'life': 3513, 'death': 1536, 'forest': 2429, 'illusion': 3013, 'fortress': 2450, 'ancient': 340, 'dragon': 1848, 'lost': 3596, 'fragment': 2466, 'drown': 1877, 'valley': 6335, 'draggon': 1847, 'galdorb': 2538, 'home': 2896, 'gale': 2539, 'girlfriend': 2602, 'like': 3523, 'zelbessdisk': 6677, 'garden': 2550, 'god': 2627, 'chronopolis': 1088, 'jellyfish': 3283, 'sea': 5185, 'burn': 875, 'orphange': 4156, 'prayer': 4529, 'tower': 6065, 'radical': 4735, 'dreamer': 1863, 'unstealable': 6284, 'jewel': 3294, 'overall': 4183, 'bring': 833, 'xander': 6637, 'remember': 4886, 'pull': 4677, 'jaw': 3273, 'floor': 2385, 'know': 3392, 'divine': 1793, 'single': 5364, 'song': 5477, 'tell': 5887, 'story': 5648, 'great': 2675, 'doubt': 1833, 'magical': 3647, 'wind': 6557, 'unstolen': 6287, 'translation': 6092, 'vary': 6345, 'perfect': 4328, 'ask': 471, 'pour': 4514, 'heart': 2808, 'paper': 4246, 'absolute': 154, 'actually': 200, 'aware': 545, 'contribute': 1331, 'greatly': 2676, 'mood': 3897, 'minute': 3838, 'compose': 1253, 'exact': 2121, 'count': 1381, 'render': 4898, 'impressively': 3050, 'remarkable': 4882, 'assure': 488, 'forget': 2434, 'listener': 3545, 'fast': 2268, 'paced': 4217, 'energetic': 2024, 'dance': 1503, 'tokage': 6030, 'termina': 5901, 'slow': 5420, 'haunting': 2781, 'purely': 4689, 'beautifully': 628, 'fantastic': 2256, 'vocal': 6420, 'dreamers': 1864, 'videogame': 6385, 'surely': 5766, 'buyer': 894, 'beware': 681, 'self': 5211, 'publish': 4669, 'book': 759, 'want': 6462, 'paragraph': 4252, 'ms': 3926, 'haddon': 2732, 'family': 2249, 'friend': 2492, 'imagine': 3021, 'thing': 5956, 'spend': 5533, 'evening': 2107, 'hysteric': 2979, 'piece': 4386, 'definitely': 1572, 'bad': 567, 'enter': 2052, 'kind': 3376, 'contest': 1317, 'believe': 654, 'amazon': 317, 'sell': 5212, 'maybe': 3737, 'offer': 4107, '8th': 128, 'grade': 2649, 'term': 5900, 'kill': 3372, 'mockingbird': 3874, 'send': 5217, 'joke': 3313, 'stay': 5608, 'far': 2258, 'glorious': 2618, 'love': 3604, 'whisper': 6531, 'wicked': 6539, 'saint': 5083, 'pleasantly': 4430, 'surprised': 5776, 'change': 1013, 'normaly': 4043, 'romance': 5028, 'novel': 4056, 'world': 6602, 'rave': 4765, 'brilliant': 831, 'true': 6137, 'wonderful': 6581, 'typical': 6171, 'crime': 1428, 'becuase': 631, 'miss': 3855, 'warm': 6469, 'finish': 2339, 'fall': 2243, 'caracter': 937, 'expect': 2166, 'average': 534, 'instead': 3144, 'think': 5958, 'predict': 4539, 'outcome': 4167, 'shock': 5303, 'writting': 6631, 'descriptive': 1625, 'break': 814, 'julia': 3325, 'reader': 4772, 'lover': 3609, 'let': 3488, 'cover': 1396, 'fool': 2415, 'spectacular': 5527, 'easy': 1930, 'leave': 3468, 'follow': 2408, 'come': 1198, 'soon': 5479, 'enjoyable': 2041, 'complete': 1246, 'waste': 6482, 'typographical': 6174, 'error': 2087, 'poor': 4474, 'grammar': 2653, 'totally': 6056, 'pathetic': 4283, 'plot': 4437, 'add': 205, 'embarrassed': 1990, 'author': 522, 'disappointed': 1718, 'pay': 4295, 'boy': 796, 'twist': 6166, 'turn': 6154, 'guess': 2711, 'happen': 2758, 'make': 3661, 'heat': 2810, 'angery': 349, 'throu': 5981, 'emotion': 1998, 'quick': 4716, 'end': 2013, 'day': 1524, 'night': 4020, 'realistic': 4778, 'human': 2953, 'fact': 2222, 'writer': 6627, 'loving': 3610, 'revengeful': 4968, 'glass': 2609, 'castle': 967, 'oh': 4113, 'discerning': 1726, 'drivel': 1873, 'trouble': 6130, 'typo': 6173, 'prominently': 4623, 'feature': 2288, 'page': 4227, 'remove': 4895, 'wait': 6446, 'point': 4451, 'beginning': 645, 'clear': 1125, 'intentional': 3166, 'churning': 1092, 'heated': 2811, 'prose': 4644, 'satiric': 5110, 'purpose': 4697, 'phew': 4365, 'glad': 2606, '10': 5, '95': 131, 'awful': 550, 'belief': 651, '7th': 126, 'grader': 2650, 'grammatical': 2654, 'skill': 5392, 'age': 251, 'reviewer': 4973, 'misspelling': 3858, 'chapter': 1016, 'example': 2128, 'mention': 3788, 'lean': 3463, 'house': 2940, 'distract': 1785, 'writing': 6628, 'weak': 6494, 'decide': 1548, 'pencil': 4312, 'mark': 3699, 'horrible': 2922, 'spelling': 5532, 'relative': 4858, 'faith': 2239, 'try': 6142, 'fake': 2242, 'glaringly': 2608, 'obvious': 4089, 'glow': 2623, 'person': 4346, 'sentence': 5228, 'structure': 5678, 'veronica': 6364, 'romantic': 5031, 'zen': 6678, 'baseball': 600, 'comedy': 1202, 'folk': 2406, 'anymore': 381, 'talk': 5833, 'cool': 1348, 'young': 6666, 'cuban': 1460, 'search': 5190, 'idenity': 2989, 'stumble': 5691, 'coastal': 1168, 'resort': 4936, 'kitchen': 3386, 'gig': 2594, 'motorcycle': 3918, 'maintenance': 3658, 'man': 3672, 'hysterical': 2980, 'italian': 3243, 'chef': 1044, 'latino': 3437, 'fireballing': 2343, 'right': 4998, 'handed': 2744, 'pitcher': 4403, 'team': 5866, 'sponsor': 5552, 'owner': 4211, 'case': 959, 'honest': 2901, 'comical': 1211, 'emotional': 1999, 'interaction': 3171, 'sizzling': 5385, 'roster': 5039, 'player': 4425, 'mix': 3866, 'special': 5520, 'effect': 1956, 'salsa': 5088, 'flashback': 2362, 'big': 690, 'fashionable': 2267, 'compression': 1261, 'stocking': 5635, 'dvt': 1912, 'doctor': 1804, 'require': 4925, 'wear': 6498, 'ugly': 6176, 'white': 6532, 'ted': 5878, 'hose': 2930, 'yucky': 6673, 'brown': 845, 'jobst': 3306, 'ultrasheer': 6182, 'need': 3998, '15': 27, '20': 59, 'look': 3583, 'regular': 4846, 'pantyhose': 4245, 'blood': 729, 'clot': 1153, 'support': 5757, 'leg': 3473, 'nice': 4012, 'note': 4049, 'problem': 4595, 'rubberized': 5050, 'roll': 5025, 'thigh': 5955, 'adhesive': 217, 'skin': 5395, 'inexpensive': 3096, 'garter': 2556, 'belt': 661, 'fine': 2336, 'help': 2827, 'product': 4602, 'difficult': 1683, 'old': 4118, 'workout': 6599, 'begin': 641, 'create': 1419, 'deep': 1556, 'ridge': 4993, 'difficulty': 1684, 'address': 213, 'size': 5381, 'recomende': 4812, 'chart': 1030, 'real': 4777, 'small': 5423, 'sheer': 5286, 'item': 3248, 'internet': 3186, 'store': 5644, 'check': 1036, 'mens': 3785, 'model': 3876, 'ok': 4115, 'sedentary': 5204, 'type': 6170, 'active': 196, 'alot': 301, 'job': 3305, 'consistently': 1302, 'ankle': 358, 'solution': 5469, 'standard': 5582, '30': 89, 'stock': 5634, '114622': 18, 'pair': 4233, 'tear': 5867, 'struggle': 5680, 'riddance': 4991, 'investment': 3210, 'delicious': 1586, 'cookie': 1346, 'funny': 2525, 'header': 2797, 'quickly': 4718, 'package': 4222, 'notice': 4052, 'title': 6022, 'bake': 573, 'convenience': 1336, 'dough': 1834, 'wrap': 6617, 'plastic': 4419, 'log': 3573, 'surprise': 5775, 'messy': 3798, 'extremely': 2209, 'sticky': 5626, 'flexibility': 2375, 'ratio': 4762, 'ingredient': 3108, 'extra': 2204, 'butter': 887, 'baked': 574, 'chewy': 1052, 'large': 3429, 'chocolate': 1071, 'chip': 1068, 'addition': 210, 'natural': 3979, 'flavor': 2367, 'abysmal': 166, 'digital': 1688, 'copy': 1355, 'scratch': 5174, 'insect': 3125, 'dropping': 1876, 'random': 4747, 'pixelation': 4406, 'combine': 1196, 'muddy': 3930, 'light': 3519, 'vague': 6331, 'image': 3017, 'resolution': 4935, 'cue': 1462, 'packaging': 4223, 'straight': 5651, 'street': 5664, 'corner': 1360, 'bootleg': 769, 'dealer': 1534, 'reasonably': 4787, 'condition': 1279, 'film': 2326, 'define': 1570, 'visual': 6410, 'crystal': 1457, 'lighting': 3520, 'contrast': 1329, 'black': 709, 'surround': 5781, 'countryside': 1386, 'scene': 5147, 'set': 5246, 'early': 1922, 'morning': 3906, 'ground': 2698, 'mist': 3859, 'haze': 2791, 'memory': 3783, 'event': 2109, 'bridge': 826, 'water': 6485, 'bright': 828, 'immediate': 3027, 'dull': 1893, 'dark': 1512, 'clouded': 1157, 'timbre': 6004, 'enunciation': 2066, 'captain': 931, 'command': 1212, 'visuals': 6413, 'hard': 2765, 'award': 544, 'win': 6556, 'critically': 1436, 'acclaim': 173, 'presentation': 4556, 'youtube': 6670, 'dvd': 1905, '16': 29, 'mm': 3870, 'public': 4667, 'library': 3505, 'reel': 4832, 'just': 3334, 'appear': 402, 'fascinating': 2264, 'insight': 3130, 'modern': 3877, 'japanese': 3269, 'thoroughly': 5968, 'rise': 5005, 'son': 5476, 'daughter': 1520, 'society': 5455, 'view': 6387, 'poise': 4454, 'parent': 4257, 'culture': 1465, 'restraint': 4950, 'obedience': 4079, 'community': 1226, 'peer': 4307, 'adulation': 224, 'western': 6519, 'form': 2440, 'new': 4008, 'japan': 3268, 'international': 3185, 'blend': 720, 'ando': 342, 'demonstrate': 1603, 'vignette': 6389, 'private': 4587, 'member': 3780, 'steven': 5624, 'wardell': 6466, 'clearly': 1126, 'talented': 5832, 'adopt': 221, 'schooling': 5159, 'able': 146, 'inside': 3129, 'album': 279, 'blue': 733, 'angel': 346, 'lanna': 3422, 'mama': 3671, 'hair': 2733, 'neck': 3995, 'roy': 5047, 'trully': 6138, 'singer': 5361, 'talent': 5831, 'charge': 1024, 'aaas': 137, 'charger': 1025, 'aa': 135, 'battery': 609, 'huge': 2950, 'secure': 5203, 'aaa': 136, 'flip': 2381, 'little': 3554, 'button': 890, 'positive': 4490, 'pop': 4476, 'hold': 2884, 'mechanism': 3757, 'loose': 3588, 'horizontal': 2918, 'pressure': 4562, 'push': 4699, 'duct': 1889, 'tape': 5844, 'segment': 5208, 'crayon': 1415, 'apply': 409, 'painful': 4230, 'advertise': 237, 'instruction': 3147, '24': 77, 'hour': 2938, 'charging': 1026, 'return': 4962, 'unit': 6246, 'useless': 6320, 'backup': 563, 'manage': 3673, 'drain': 1851, 'aas': 138, 'purchase': 4685, 'convenient': 1337, 'short': 5312, 'long': 3580, 'kodak': 3398, 'nimh': 4024, 'dear': 1535, 'excited': 2141, 'ostensibly': 4160, 'muslim': 3948, 'feminism': 2303, 'volume': 6425, 'live': 3555, 'expectations': 2168, 'essay': 2093, 'veil': 6356, 'potentially': 4507, 'liberating': 3501, 'explain': 2178, 'woman': 6578, 'cape': 925, 'town': 6066, 'claim': 1107, 'separate': 5229, 'equal': 2073, 'gee': 2565, 'whiz': 6536, 'disappointment': 1720, 'feminist': 2304, 'condemnation': 1276, 'gender': 2568, 'apartheid': 387, 'extoll': 2203, 'virtue': 6403, 'female': 2302, 'genital': 2577, 'mutilation': 3949, 'alyssa': 309, 'lappen': 3426, 'base': 599, 'vcr': 6350, 'christmas': 1083, 'present': 4555, 'join': 3311, 'rest': 4946, 'land': 3419, 'vhs': 6373, 'movie': 3922, 'jvc': 3341, 'tv': 6156, 'choice': 1072, 'agree': 259, 'awkward': 553, 'selection': 5210, 'option': 4140, 'hang': 2752, 'comment': 1215, 'intuitive': 3204, 'complicated': 1250, 'remote': 4893, 'technically': 5872, 'minded': 3828, 'rely': 4879, 'heavily': 2814, 'manual': 3688, 'timer': 6008, 'start': 5593, 'scroll': 5181, 'complaint': 1245, 'incorrect': 3070, 'disc': 1725, 'fan': 2251, 'suspiscious': 5794, 'section': 5201, 'happy': 2763, 'click': 1133, 'receiver': 4799, 'transition': 6091, 'smooth': 5436, 'pause': 4293, 'fairly': 2236, 'headcleaner': 2796, 'message': 3797, 'nut': 4075, 'television': 5886, 'bookshelf': 763, 'audio': 511, 'car': 936, 'room': 5035, 'combo': 1197, 'longer': 3581, 'things': 5957, 'cable': 900, 'box': 793, 'compatability': 1234, 'control': 1334, 'seperate': 5232, 'input': 3120, 'coax': 1169, 'programming': 4618, 'mono': 3890, 'wife': 6545, 'difference': 1681, 'hollywood': 2893, 'debacle': 1537, 'ridiculous': 4995, 'wonder': 6580, 'script': 5180, 'mountain': 3919, 'lion': 3540, 'trailer': 6081, 'capture': 935, 'jail': 3259, 'cell': 990, 'utterly': 6328, 'completely': 1247, 'stupid': 5694, 'bet': 677, 'hotel': 2936, 'babylon': 558, 'incredible': 3073, 'acting': 193, 'tamzin': 5838, 'outhwaite': 4172, 'eastenders': 1928, 'bbc': 613, 'soap': 5448, 'max': 3736, 'beesley': 637, 'ill': 3006, 'fated': 2274, 'glitter': 2614, 'mariah': 3696, 'carey': 946, 'drama': 1852, 'series': 5238, 'opera': 4130, 'air': 271, 'america': 323, 'episode': 2071, 'season': 5192, 'finale': 2331, 'interesting': 3176, 'watch': 6483, 'remind': 4888, 'abc': 141, '1983': 51, '1988': 53, 'reason': 4785, 'fictional': 2318, 'san': 5095, 'francisco': 2472, 'luxury': 3628, 'england': 2035, 'recommend': 4814, 'willing': 6555, 'casually': 969, 'law': 3448, 'school': 5157, 'seriously': 5239, 'unfortunately': 6231, 'entertaining': 2055, 'order': 4143, 'hip': 2867, 'daddy': 1498, 'vibe': 6374, 'dismay': 1756, 'fourth': 2463, 'class': 1115, 'main': 3654, 'jist': 3302, 'xylaphone': 6645, 'voice': 6422, 'replicate': 4914, 'party': 4272, 'neighborhood': 4003, 'laugh': 3440, 'beach': 616, 'grow': 2701, 'surfer': 5769, 'diego': 1678, 'southern': 5505, 'california': 904, 'brother': 843, 'honestly': 2902, 'kinda': 3377, 'absolutle': 156, 'epitimy': 2072, 'surf': 5767, 'cha': 1005, 'rochelle': 5016, 'hell': 2825, 'moral': 3902, 'aspect': 473, 'american': 324, 'lucid': 3618, 'argue': 429, 'explanation': 2180, 'simple': 5349, 'focused': 2401, 'individual': 3088, 'ignore': 3001, 'mock': 3873, 'personal': 4347, 'responsibility': 4944, 'final': 2330, 'response': 4943, 'indictment': 3085, 'robert': 5013, 'ringer': 5001, 'seller': 5213, 'disgusted': 1746, 'boorish': 767, 'state': 5601, 'medium': 3766, 'politic': 4463, 'discourse': 1731, 'general': 2570, 'head': 2794, 'substantial': 5709, 'challenge': 1008, 'lie': 3512, 'americans': 326, 'playing': 4426, 'larry': 3431, 'muse': 3941, 'label': 3406, 'late': 3433, '80': 127, '90': 130, 'explore': 2186, 'rich': 4985, 'catalog': 971, 'jazz': 3277, 'musician': 3946, 'relaxed': 4861, 'valentine': 6333, 'stand': 5581, 'chet': 1049, 'baker': 575, 'mile': 3821, 'mac': 3634, 'line': 3534, 'os': 4158, 'window': 6558, 'frustrating': 2506, 'attempt': 498, 'touch': 6057, 'use': 6318, 'mouse': 3920, 'power': 4518, 'arrow': 447, 'keyboard': 3357, 'fun': 2514, 'disapointed': 1715, 'numbing': 4071, 'attention': 500, 'level': 3491, 'rescue': 4927, 'hero': 2838, 'dog': 1810, 'alaskan': 278, 'repeatedly': 4907, 'allman': 294, 'recipe': 4807, 'throw': 5983, 'cup': 1469, 'flour': 2390, 'miscellaneous': 3843, 'nearly': 3990, 'compare': 1231, 'ed': 1943, 'wood': 6586, 'flawlessly': 2372, 'lisa': 3541, 'rayner': 4766, 'wild': 6549, 'bread': 812, 'sourdough': 5503, 'artisan': 457, 'fail': 2231, 'novice': 4060, 'reliably': 4870, 'pancake': 4241, 'place': 4410, 'reliable': 4868, 'concise': 1271, 'information': 3105, 'maintain': 3657, 'fabulous': 2218, 'tome': 6038, 'pass': 4273, 'historical': 2871, 'warrant': 6474, 'cost': 1374, 'alaska': 277, 'visit': 6407, 'starter': 5594, 'collection': 1180, 'advise': 242, 'ruth': 5065, 'picture': 4384, 'past': 4279, '100': 6, 'ago': 258, 'mixer': 3868, 'civilized': 1106, 'stuff': 5687, 'pot': 4504, 'autumn': 527, 'nc': 3987, 'prefect': 4543, 'closer': 1149, 'mp3': 3923, 'download': 1838, 'wax': 6490, 'decade': 1542, '26': 82, 'background': 561, 'gymnastic': 2728, 'feat': 2286, 'russia': 5064, 'hunt': 2962, 'melody': 3776, 'english': 2036, 'lyric': 3631, 'downloader': 1839, 'happily': 2761, 'intersperse': 3189, 'ipod': 3217, 'experience': 2172, 'fm': 2397, 'rattle': 4764, 'loudly': 3600, 'quality': 4704, 'screen': 5177, 'middle': 3813, 'plus': 4446, 'dead': 1530, 'layout': 3454, 'sense': 5222, 'engineering': 2034, 'imo': 3033, 'refund': 4840, 'emerson': 1994, '400': 99, 'wm': 6574, 'hesitant': 2843, 'sylvania': 5814, '6620ldg': 121, 'flat': 2363, 'panel': 4242, 'lcd': 3458, 'build': 861, 'weight': 6513, 'space': 5507, 'save': 5119, 'attractive': 506, 'design': 1629, 'sharp': 5283, 'playback': 4423, 'negative': 4001, 'function': 2515, 'key': 3356, 'color': 1188, 'placement': 4411, 'illogical': 3011, 'carefully': 944, 'imbecil': 3023, 'terrible': 5905, 'birthday': 702, 'toreturn': 6047, 'palyer': 4240, 'repalcement': 4905, 'junk': 3333, 'capable': 924, 'recall': 4793, 'flash': 2361, 'channel': 1014, 'sony': 5478, 'universal': 6249, 'better': 679, 'learn': 3465, 'factory': 2224, 'capability': 923, 'deal': 1533, 'user': 6321, 'annoying': 366, 'replace': 4911, '13': 23, 'tube': 6145, 'gain': 2537, 'relcaime': 4863, 'counter': 1382, 'internal': 3183, 'bonus': 757, 'defective': 1561, 'electronic': 1973, 'express': 2192, 'contact': 1311, 'rma': 5011, 'number': 4069, 'fedex': 2293, 'pick': 4382, 'arrive': 445, 'week': 6510, 'later': 3435, 'clarity': 1112, 'mode': 3875, 'loud': 3599, 'rv': 5066, 'run': 5058, 'barely': 593, 'solve': 5470, 'jack': 3254, 'stereo': 5620, 'speaker': 5518, 'lightweight': 3522, 'antenna': 371, 'disappoint': 1717, 'romanian': 5030, 'opinion': 4133, 'biased': 685, 'angle': 351, 'europe': 2105, 'clean': 1122, 'proper': 4638, 'shed': 5285, 'understand': 6214, 'tourist': 6063, 'guide': 2714, 'tour': 6062, 'country': 1385, 'reference': 4835, 'travel': 6099, 'outline': 4175, 'exclude': 2146, 'romania': 5029, 'precision': 4538, 'dk': 1799, 'realy': 4782, 'combination': 1195, 'illustration': 3015, 'text': 5920, 'sight': 5332, 'city': 1104, 'european': 2106, 'thank': 5931, 'eyewitness': 2212, 'lonely': 3579, 'planet': 4417, 'info': 3103, 'indiv': 3087, 'surface': 5768, 'receive': 4798, 'germany': 2586, 'overview': 4203, '457': 106, '480': 107, 'greece': 2678, 'spanish': 5509, 'sort': 5491, 'printing': 4582, 'highlight': 2857, 'memphis': 3784, 'tn': 6024, 'reatard': 4790, 'course': 1390, 'kid': 3367, 'jay': 3275, 'eric': 2082, 'oblivians': 4084, 'goner': 2631, 'tender': 5896, '19': 40, 'rock': 5017, 'offend': 4105, 'sensibility': 5224, 'overdriven': 4188, 'crackle': 1403, 'underlay': 6207, 'croon': 1441, 'howl': 2942, 'energize': 2025, 'depend': 1609, 'foot': 2417, 'od': 4100, 'copper': 1354, 'pipe': 4399, 'accept': 168, '12': 21, '40': 98, '50': 109, 'way': 6491, 'thia': 5953, 'gift': 2593, 'husband': 2969, 'date': 1517, 'plate': 4420, 'markedly': 3700, 'inferior': 3099, 'previous': 4568, 'edition': 1948, 'ahead': 264, 'homer': 2898, 'moses': 3911, 'helpful': 2829, 'gem': 2567, 'complex': 1249, 'subject': 5700, 'second': 5197, 'century': 995, 'religious': 4876, 'authority': 523, 'textual': 5923, 'period': 4337, 'interelation': 3173, 'essential': 2095, 'detailed': 1641, 'yes': 6660, 'cardboard': 939, 'beat': 623, 'cheesy': 1042, 'looking': 3584, 'entire': 2060, 'solid': 5466, 'brass': 807, 'decent': 1545, 'distance': 1779, 'scrape': 5172, 'noticeable': 4053, 'gigantic': 2595, 'plenty': 4435, 'material': 3730, 'loosely': 3589, 'scraping': 5173, 'shipping': 5298, 'exchange': 2139, 'damage': 1500, 'risk': 5006, 'unknown': 6252, 'africa': 247, 'profesionally': 4607, 'produce': 4600, 'mixture': 3869, 'soukous': 5493, 'fado': 2229, 'african': 248, 'feeling': 2299, 'rithem': 5007, 'soft': 5459, 'borred': 779, 'oliver': 4120, 'goma': 2630, 'fit': 2352, 'pefectly': 4308, 'record': 4818, 'france': 2470, 'paris': 4259, 'professional': 4608, 'hot': 2933, 'lazy': 3456, 'greeting': 2682, 'arno': 440, 'amsterdam': 329, 'expectation': 2167, 'shoe': 5306, 'rip': 5003, 'apart': 386, 'sole': 5464, 'clark': 1113, 'sadly': 5076, 'month': 3895, 'similar': 5345, 'sperry': 5534, 'profound': 4615, 'narrative': 3967, 'style': 5696, 'famous': 2250, 'founder': 2462, 'biographer': 698, 'john': 3310, 'morse': 3909, 'arrogant': 446, 'flippant': 2382, 'frequently': 2487, 'exaggerated': 2123, 'carry': 955, 'tone': 6041, 'amateur': 312, 'yankee': 6648, 'historian': 2870, 'judgment': 3323, 'statesman': 5604, 'boston': 783, 'harvard': 2774, 'graduate': 2651, 'consider': 1297, 'penetrate': 4313, 'represent': 4918, 'today': 6025, 'research': 4928, 'conclude': 1272, 'manuscript': 3690, 'essentially': 2096, 'primary': 4574, 'secondary': 5198, 'source': 5502, '110': 17, 'rate': 4760, 'enjoyed': 2042, 'yr': 6671, 'barbie': 590, 'computer': 1263, 'worry': 6606, 'vibrant': 6375, 'friendly': 2494, 'mommy': 3887, 'rapunzel': 4754, 'ton': 6040, '42': 102, 'decorate': 1551, 'creativity': 1421, 'adventure': 232, 'addict': 206, 'opening': 4127, 'singe': 5360, 'stop': 5640, 'maze': 3739, 'prince': 4577, 'stefan': 5613, 'sing': 5359, 'program': 4616, 'child': 1057, 'currently': 1478, 'sob': 5449, 'bedroom': 633, 'result': 4953, 'rotten': 5042, 'flower': 2392, 'halfway': 2737, 'exit': 2162, 'freeze': 2484, 'spot': 5555, 'disembodied': 1741, 'urge': 6314, 'software': 5460, 'troubleshooting': 6132, 'rebooting': 4792, 'vivendi': 6417, 'games': 2546, 'site': 5374, 'disabled': 1712, 'weeping': 6512, 'fault': 2276, 'incompetent': 3065, 'catch': 972, 'bug': 859, 'buck': 855, 'avoid': 539, 'creative': 1420, 'granddaughter': 2657, 'plan': 4415, 'imagination': 3019, 'fight': 2320, 'choose': 1074, 'theme': 5941, 'allow': 296, 'crown': 1446, 'stephan': 5618, 'stone': 5637, 'animation': 356, 'boring': 778, 'adult': 225, 'variation': 6342, 'scheme': 5151, 'pattern': 4291, 'bored': 776, 'highly': 2858, 'rental': 4903, 'exciting': 2143, 'basically': 603, 'username': 6322, 'sister': 5370, 'classic': 1116, 'multiple': 3936, 'access': 170, 'everytime': 2114, 'task': 5851, 'openning': 4129, 'sequence': 5236, 'file': 2324, 'crash': 1410, 'jumpy': 3328, 'possible': 4496, 'grand': 2655, 'patience': 4285, 'survive': 5785, 'trial': 6113, 'substitute': 5711, 'importcds': 3040, 'vendor': 6359, 'recieve': 4806, 'feedback': 2295, 'bother': 785, 'orginally': 4150, 'future': 2532, 'company': 1229, 'mistake': 3860, 'businee': 880, 'dollar': 1815, 'careful': 943, 'smell': 5429, 'bottle': 786, 'freesia': 2482, 'fragrance': 2467, 'delicate': 1585, 'summer': 5739, 'perfume': 4335, 'recipient': 4808, 'impressed': 3045, 'creepy': 1426, 'wow': 6616, 'jealousy': 3279, 'revenge': 4967, 'door': 1822, 'open': 4126, 'prepare': 4553, 'existent': 2161, 'thai': 5928, 'menu': 3789, 'exist': 2157, 'matter': 3733, 'exception': 2132, 'dubbing': 1887, 'sub': 5699, 'par': 4248, 'remain': 4880, 'shame': 5275, 'sham': 5274, 'original': 4152, 'offering': 4108, 'tokyo': 6031, 'sooo': 5481, 'jammin': 3263, 'trill': 6120, 'suc': 5715, 'pokey': 4456, 'official': 4110, 'sh': 5266, 'production': 4603, 'screw': 5179, 'downsize': 1842, 'pioneer': 4397, 'sad': 5074, 'tx': 6168, 'rap': 4752, 'kickback': 3366, 'blow': 732, 'jam': 3260, 'ride': 4992, 'underrated': 6212, 'bullsh': 867, 'mtv': 3927, 'houston': 2941, 'pat': 4280, 'hawk': 2784, 'moe': 3879, 'pimp': 4392, 'steve': 5623, 'regardless': 4844, 'alive': 289, 'legacy': 3474, 'dirty': 1710, 'south': 5504, 'southside': 5506, 'guest': 2713, 'fat': 2271, 'botany': 784, 'mr': 3925, 'chris': 1078, 'ward': 6465, 'chronicle': 1085, 'farm': 2259, 'photo': 4371, 'shewas': 5293, 'til': 5998, 'revise': 4974, 'version': 6366, 'maid': 3651, 'israel': 3241, 'harvey': 2775, 'history': 2873, 'awaken': 543, 'possiblility': 4497, 'relationship': 4857, 'faithfulness': 2240, 'loved': 3605, 'treasure': 6106, 'keeper': 3351, 'draw': 1856, 'idea': 2987, 'comfy': 1207, 'lounging': 3602, 'cake': 902, 'topper': 6046, 'june': 3329, '27': 83, '2010': 66, 'estimate': 2098, 'july': 3326, 'apologize': 396, 'mishap': 3848, 'wrong': 6632, 'ariel': 433, 'litte': 3553, 'mermaid': 3795, 'pearl': 4301, 'kit': 3385, 'cute': 1491, 'provide': 4656, 'annoyed': 365, 'include': 3063, 'trust': 6140, 'additional': 211, 'pledge': 4433, 'sweet': 5806, 'grace': 2648, 'email': 1988, 'supply': 5756, 'list': 3542, 'sheet': 5287, 'ready': 4776, 'figurine': 2323, 'bracelet': 801, 'boot': 768, 'phone': 4370, 'manufacturer': 3689, 'yield': 6662, 'unreliable': 6275, 'soulwax': 5497, 'critic': 1434, 'consumer': 1310, 'alike': 288, 'release': 4864, 'debut': 1541, 'defunct': 1577, 'almo': 299, 'sounds': 5500, 'chock': 1070, 'radio': 4736, 'hit': 2874, 'djs': 1798, 'mash': 3714, 'remix': 4890, 'bootlegs': 770, 'fledged': 2373, 'band': 583, 'critcs': 1433, 'electro': 1972, 'hopefully': 2915, 'nite': 4027, 'cheeky': 1037, 'duran': 1902, 'club': 1159, 'garner': 2554, 'sale': 5085, 'practically': 4522, 'leap': 3464, 'grab': 2647, 'tiga': 5994, 'upcoming': 6304, 'sexor': 5260, 'omg': 4121, 'specially1': 5523, 'talking2': 5835, 'nylipps3': 4076, 'techo': 5877, 'guysmusicchoice': 2726, 'saturday': 5115, 'techno': 5874, 'cause': 976, 'classical': 1117, '1980': 50, 'jason': 3271, 'bateman': 607, 'lead': 3459, 'moment': 3885, 'relief': 4872, 'minimum': 3832, 'action': 194, 'suspense': 5790, 'achieve': 187, 'suggest': 5730, 'skip': 5398, 'finally': 2332, 'wriiten': 6623, 'horribly': 2923, 'lack': 3411, 'mystery': 3955, 'culprit': 1463, 'educational': 1951, 'train': 6082, 'shape': 5277, 'peg': 4309, 'guarantee': 2707, 'puzzle': 4700, 'range': 4748, 'ludicrous': 3624, 'silly': 5343, 'faintly': 2234, 'entertain': 2053, 'odd': 4101, 'theory': 5944, 'artistry': 461, 'deodato': 1607, 'concert': 1269, 'string': 5671, 'appreciate': 411, 'tropea': 6129, 'excelent': 2130, 'rendition': 4900, 'caution': 977, 'remastere': 4883, 'irrelevant': 3225, 'textbook': 5921, 'ship': 5296, 'transaction': 6087, 'engage': 2030, 'tess': 5915, 'wash': 6477, 'shore': 5311, 'isle': 3239, 'caretaker': 945, 'colin': 1176, 'macpherson': 3638, 'uncover': 6201, 'opposite': 4135, 'attract': 504, 'avon': 541, 'trashy': 6097, 'fokr': 2404, 'usually': 6325, 'tie': 5993, 'blond': 727, 'kilt': 3375, 'enchanting': 2008, 'highlander': 2856, 'heroine': 2839, 'guillible': 2716, 'lindsay': 3533, 'content': 1315, 'occupant': 4095, 'island': 3237, 'ashore': 467, 'solitude': 5467, 'shake': 5271, 'fear': 2284, 'strange': 5654, 'reluctantly': 4878, 'strike': 5670, 'friendship': 2495, 'discover': 1732, 'courage': 1389, 'amusing': 331, 'lose': 3593, 'believability': 652, 'characterization': 1018, 'character': 1017, 'introduce': 3199, 'sided': 5327, 'behavior': 646, 'development': 1651, 'force': 2421, 'improve': 3053, 'addonics': 212, 'portable': 4482, 'drive': 1872, 'performance': 4331, 'underpowered': 6211, 'constantly': 1306, 'half': 2736, 'unsuccessfully': 6291, 'uncomfortable': 6199, 'pant': 4243, 'incredibly': 3075, 'stiff': 5627, 'authentic': 520, 'encounter': 2011, 'yoruba': 6665, 'particular': 4267, 'certain': 998, 'nuance': 4063, 'speak': 5517, 'clue': 1160, 'question': 4714, 'unanswered': 6187, 'textbooks': 5922, 'impossible': 3043, 'pronnounce': 4632, 'native': 3978, 'confusing': 1291, 'spealling': 5519, 'teach': 5863, 'basic': 602, 'vocabulary': 6419, 'pronounciation': 4634, 'dashmat': 1515, 'specific': 5524, 'vehicle': 6355, 'correct': 1366, 'mat': 3727, 'duplicate': 1899, 'sit': 5371, 'grandchild': 2656, 'value': 6337, 'tic': 5991, 'tac': 5827, 'toe': 6028, 'kelly': 3353, 'pet': 4358, 'parade': 4250, 'clap': 1111, 'dress': 1867, 'kids': 3371, 'pony': 4469, 'jump': 3327, 'animal': 353, 'elephant': 1979, 'nail': 3959, 'float': 2384, 'accomplish': 177, 'wonderfully': 6583, 'instal': 3139, 'laptop': 3428, 'inset': 3128, 'tough': 6060, 'polar': 4458, 'bear': 621, 'fish': 2349, 'tricky': 6117, 'finger': 2337, 'pad': 4225, 'load': 3563, 'different': 1682, 'endless': 2019, 'graphic': 2665, 'colorful': 1189, 'winner': 6561, '4yr': 108, 'reinstall': 4852, 'issue': 3242, 'activity': 197, 'slip': 5415, 'heel': 2820, 'satisfied': 5111, 'manner': 3686, 'delivered': 1592, 'awesome': 547, 'advertisement': 238, 'ddr': 1529, 'cooool': 1349, 'verson': 6367, 'claire': 1108, 'uk': 6179, 'assume': 487, 'euromix': 2104, '2nd': 88, '3rd': 96, 'available': 532, 'abit': 145, 'arcade': 419, 'navey': 3984, 'playstation': 4427, 'guy': 2725, 'compatible': 1235, 'machine': 3635, 'deeply': 1557, 'stage': 5573, 'revolution': 4978, 'equivalent': 2077, 'jungle': 3330, 'unheard': 6235, 'favourite': 2281, 'pedantic': 4304, 'situation': 5378, 'lucky': 3622, 'deprivation': 1619, 'boredom': 777, 'letter': 3490, 'admit': 220, 'michael': 3807, 'montgomery': 3894, 'silk': 5341, 'forward': 2455, 'approach': 413, 'horseman': 2928, 'jmm': 3303, 'comeback': 1199, 'couple': 1388, 'beer': 636, 'devil': 1655, 'spice': 5536, 'deserve': 1627, 'cuz': 1492, 'acctually': 183, 'soldier': 5463, 'stuffi': 5689, 'punk': 4684, 'hrs': 2943, '23': 76, 'ball': 577, 'este': 2097, 'libro': 3506, 'contiene': 1319, 'todo': 6027, 'lo': 3562, 'que': 4708, 'pense': 4319, 'interesaba': 3174, 'saber': 5071, 'sobre': 5450, 'el': 1966, 'sus': 5787, 'campeones': 915, 'pero': 4344, 'cautivo': 978, 'lei': 3480, 'hasta': 2776, 'ahora': 266, 'si': 5323, 'puedo': 4674, 'discutir': 1739, 'mi': 3806, 'esposo': 2091, 'mis': 3842, 'cuñados': 1493, 'suegro': 5727, 'es': 2088, 'ex': 2120, 'boxeador': 795, 'cereal': 997, 'taste': 5853, 'addiction': 208, 'curiousity': 1474, 'hooked': 2910, 'breakfast': 815, 'eat': 1931, 'especially': 2089, 'crave': 1414, 'honey': 2904, 'hungry': 2961, 'bunche': 870, 'oats': 4077, 'cap': 922, 'crunch': 1452, 'berry': 673, 'splendid': 5546, 'tasting': 5854, 'addictive': 209, 'word': 6590, 'warning': 6473, 'saturate': 5114, 'cabinet': 899, 'hmmmm': 2877, 'outrageous': 4177, '99': 132, 'distraught': 1786, 'local': 3567, 'walmart': 6456, 'elated': 1967, 'church': 1090, 'pack': 4220, 'bowl': 792, 'timber': 6002, 'frame': 2468, 'building': 863, 'technical': 5871, 'joint': 3312, 'beam': 619, 'span': 5508, 'hanford': 2751, 'mills': 3826, 'museum': 3942, 'master': 3720, 'carpenter': 952, 'recent': 4800, 'trip': 6124, 'avid': 538, 'incline': 3062, 'buil': 860, 'vaguely': 6332, 'breeze': 821, 'mortise': 3910, 'tenon': 5897, 'illustrate': 3014, 'practical': 4521, 'builder': 862, 'outstanding': 4181, 'guidebook': 2715, 'timberframe': 6003, 'hybrid': 2973, 'construction': 1308, 'invaluable': 3206, 'process': 4598, 'editing': 1947, 'disservice': 1777, 'publishers': 4671, 'improperly': 3052, 'caption': 932, 'traditional': 6077, 'framing': 2469, 'print': 4580, 'glossy': 2621, 'drawing': 1858, 'bw': 895, 'inform': 3104, 'venture': 6361, 'project': 4620, 'dip': 1703, 'boiling': 748, 'temperature': 5889, 'shrink': 5319, 'egg': 1963, 'transfer': 6089, 'needless': 3999, 'intelligent': 3160, 'immediately': 3028, 'property': 4640, 'boil': 746, 'explicit': 2181, 'war': 6463, '2006': 63, 'coast': 1167, 'pc': 4297, 'outrun': 4178, '1985': 52, 'holidays': 2888, 'sink': 5367, 'addicted': 207, 'follower': 2410, 'macrea': 3639, 'rally': 4741, 'race': 4728, 'ferrari': 2308, 'invader': 3205, 'battlefield': 611, 'landscape': 3420, 'idiotic': 2996, 'slide': 5409, 'total': 6055, 'rubbish': 5052, 'ireland': 3221, 'florida': 2388, 'costly': 1375, 'henry': 2832, '70': 123, 'nose': 4046, 'chew': 1050, 'worn': 6604, 'clone': 1145, 'dad': 1497, 'significance': 5336, 'insecurity': 3126, 'battle': 610, 'terminal': 5902, 'cancer': 918, 'charisma': 1027, 'obsession': 4085, 'guise': 2719, 'incurable': 3077, 'lice': 3507, 'infection': 3098, 'trauma': 6098, 'replacement': 4912, 'safely': 5079, 'closet': 1150, 'shelf': 5289, 'thread': 5974, 'bare': 591, 'online': 4123, 'overwhelmed': 4205, 'cherish': 1047, '14': 25, 'excite': 2140, 'walt': 6457, 'disney': 1757, 'toy': 6068, 'sweeet': 5804, 'contemporary': 1314, 'fairytale': 2238, 'delight': 1587, 'snow': 5446, 'grown': 2702, 'act': 192, 'rewarding': 4981, 'sinclair': 5358, 'sketch': 5390, 'excitement': 2142, 'actual': 199, 'norma': 4040, 'gregory': 2683, 'rendering': 4899, 'beginner': 644, 'seasoned': 5193, 'artisans': 458, 'thrill': 5978, 'instructional': 3148, 'commentswill': 1216, 'match': 3728, 'description': 1624, 'cushion': 1485, 'deliver': 1591, 'impression': 3046, 'insert': 3127, 'attach': 496, 'strap': 5657, 'definately': 1568, 'luckily': 3621, 'policy': 4462, 'unintentionally': 6242, 'heard': 2806, 'free': 2479, 'sex': 5259, 'defintely': 1575, 'coda': 1172, 'divroced': 1796, 'broadway': 837, 'theatre': 5937, 'archive': 425, 'witty': 6572, 'sparkle': 5513, '18th': 39, 'ruin': 5055, 'costume': 1376, 'intentionally': 3167, 'absurd': 160, 'blanket': 713, 'camp': 913, 'incidental': 3061, 'btw': 854, 'renaissance': 4897, 'director': 1707, 'goal': 2625, 'fusty': 2530, 'scandal': 5131, 'intention': 3165, 'recording': 4819, 'extended': 2195, 'bop': 772, 'drinking': 1871, 'blurry': 737, 'revolt': 4977, 'marketplace': 3703, 'unprofessional': 6270, 'depending': 1610, 'cleverly': 1129, 'audience': 510, 'nudge': 4065, 'charming': 1029, '1700': 32, 'chance': 1012, 'student': 5684, 'compilation': 1242, 'perform': 4330, 'artist': 459, 'poorly': 4475, 'starlite': 5591, 'wave': 6489, 'prior': 4583, 'imply': 3036, 'garbage': 2549, 'prompt': 4630, 'november': 4059, 'cds': 984, 'deceive': 1543, 'group': 2699, 'duh': 1891, 'trash': 6096, 'purist': 4690, 'wedding': 6506, 'reception': 4803, 'session': 5245, 'sounding': 5499, 'tel': 5884, 'cheap': 1033, 'synthesizer': 5823, 'vocalist': 6421, 'schlock': 5153, 'hi8': 2849, 'handycam': 2750, 'die': 1676, 'camera': 910, 'continue': 1321, 'faster': 2270, 'nights': 4022, 'close': 1147, 'packace': 4221, 'tech': 5870, 'hines': 2865, 'curious': 1473, 'remastering': 4884, 'producer': 4601, 'performers': 4333, 'courtroom': 1394, 'suspect': 5788, 'defendant': 1562, 'unasked': 6189, '200': 60, 'realize': 4780, 'reveal': 4965, 'crucial': 1447, 'unbelievable': 6194, 'earlier': 1921, 'ending': 2018, 'detective': 1642, 'attorney': 503, 'unfold': 6228, 'mcbain': 3742, 'abuse': 162, 'abusive': 164, 'childhood': 1058, 'dorie': 1825, 'forgive': 2436, 'abuser': 163, 'successful': 5717, 'spite': 5545, 'horrendous': 2921, 'lord': 3591, 'benchmark': 663, 'refer': 4833, 'cardinal': 940, 'gm': 2624, 'jocketty': 3308, 'contradict': 1325, 'columnist': 1192, 'peter': 4360, 'gammons': 2547, 'tracy': 6074, 'ringolsby': 5002, 'jayson': 3276, 'stark': 5589, 'major': 3659, 'league': 3462, 'stat': 5600, 'score': 5167, 'extensive': 2197, 'resource': 4937, 'prospect': 4645, 'subscriber': 5704, 'got': 2639, 'tomorrow': 6039, 'till': 5999, 'massive': 3719, 'subscribe': 5703, 'plunge': 4445, 'subscription': 5705, 'coverage': 1397, 'minor': 3836, 'college': 1183, 'uh': 6177, 'brand': 806, 'embarassment': 1989, 'steer': 5612, 'fusion': 2529, 'secret': 5199, 'pointer': 4452, 'joe': 3309, 'satriani': 5113, 'al': 276, 'dimeola': 1694, 'pratice': 4525, 'phil': 4366, 'tarzan': 5850, 'bruford': 847, 'wilding': 6551, 'intriguing': 3198, 'influence': 3101, 'destiny': 1636, 'titan': 6020, '1970': 48, 'apollo': 394, 'apex': 393, 'exploration': 2185, 'solar': 5461, 'mining': 3833, 'colony': 1187, 'jet': 3292, 'craft': 1404, 'oxygen': 4212, 'fuel': 2510, 'methane': 3802, 'atmosphere': 494, 'originally': 4153, 'intrigue': 3197, 'father': 2275, 'nasa': 3970, 'cassini': 963, 'huygens': 2972, 'mission': 3857, 'saturn': 5116, 'sci': 5162, 'fi': 2315, 'reality': 4779, 'fantasy': 2257, 'starr': 5592, 'david': 1523, 'asimov': 470, 'peek': 4305, 'block': 726, 'obviously': 4090, 'jiggle': 3299, 'noise': 4030, 'incorporate': 3069, 'alphabet': 302, 'easily': 1926, 'identify': 2990, 'banana': 582, 'penguin': 4314, 'umbrella': 6183, 'zebra': 6676, 'frustrated': 2505, 'pry': 4660, 'fling': 2380, 'tend': 5894, 'wall': 6452, 'smash': 5427, 'drop': 1875, 'tolerate': 6033, 'treatment': 6109, 'art': 450, 'collector': 1182, 'sey': 5264, 'twin': 6164, 'learning': 3466, 'tool': 6042, 'teething': 5883, 'fisher': 2350, 'incrediblock': 3074, 'fp': 2464, 'pizazz': 4409, 'spin': 5539, 'queen': 4711, 'watermelon': 6486, 'seed': 5205, 'stack': 5572, 'knock': 3391, 'banging': 585, 'plain': 4413, 'wooden': 6587, 'texture': 5925, 'object': 4081, 'opportunity': 4134, 'bang': 584, 'eye': 2211, 'purchasing': 4686, 'decision': 1550, 'overnight': 4196, 'rocket': 5018, 'scientist': 5166, 'reasons': 4789, 'rub': 5048, 'dissapointing': 1770, 'concept': 1267, 'baby': 557, 'chinsy': 1067, 'mother': 3913, 'imaginative': 3020, 'playtime': 4428, 'blunt': 735, 'edge': 1944, 'safety': 5080, 'corresponding': 1369, 'tot': 6054, 'awesume': 549, 'grandson': 2661, 'goodness': 2633, 'shopping': 5310, 'interested': 3175, 'manipulate': 3681, 'adorable': 222, 'twirl': 6165, 'definate': 1567, 'upgrade': 6306, 'format': 2442, 'leary': 3467, 'storage': 5643, 'zippered': 6681, '79': 125, 'purchse': 4687, 'durable': 1901, 'developmentally': 1652, 'appropriate': 414, '25': 79, 'months': 3896, 'perfectly': 4329, 'hurt': 2967, 'chewers': 1051, 'category': 974, 'factor': 2223, 'fully': 2513, 'comprehend': 1256, 'dig': 1686, 'waterproof': 6488, 'broke': 838, 'loosen': 3590, 'waterproff': 6487, 'exotics': 2164, 'ballsy': 580, 'reading': 4775, 'whale': 6523, 'naturalist': 3980, 'communicator': 1225, 'mary': 3711, 'getten': 2587, 'study': 5686, 'communication': 1224, '1991': 54, 'pod': 4448, 'washington': 6479, 'contention': 1316, 'direct': 1704, 'telepathic': 5885, 'diane': 1664, 'donovancalifornia': 1820, 'bookwatch': 765, 'nifty': 4019, 'halloween': 2739, 'dummy': 1896, 'forgot': 2438, 'depict': 1611, 'setter': 5249, 'unsubstantiated': 6289, 'discrimination': 1734, 'certainly': 999, 'smear': 5428, 'campaign': 914, 'troup': 6133, 'claw': 1121, 'preservation': 4557, 'academic': 167, 'freedom': 2481, 'molly': 3883, 'myers': 3952, 'rachel': 4730, 'corrie': 1370, 'ism': 3240, 'organization': 4148, 'dedicate': 1552, 'violence': 6396, 'left': 3471, 'wing': 6560, 'propaganda': 4637, 'office': 4109, 'regularly': 4847, 'hide': 2851, 'terrorist': 5912, 'weapon': 6497, 'pacifist': 4218, 'idf': 2993, 'sabotage': 5072, 'intend': 3162, 'innocent': 3116, 'palestinians': 4237, 'murder': 3938, 'disregard': 1765, 'extreme': 2208, 'gardening': 2551, 'lawn': 3449, 'formula': 2444, 'fortunately': 2452, 'hesitate': 2844, 'ost': 4159, 'caveat': 980, 'emptor': 2006, 'chinese': 1066, 'green': 2681, 'red': 4822, 'logo': 3577, 'fx': 2534, 'counterfeit': 1383, 'serial': 5237, 'pscn': 4663, '5021': 111, '5022': 112, '5023': 113, 'handful': 2745, 'missing': 3856, 'memorable': 3781, 'seal': 5186, 'boss': 781, 'lucca': 3617, 'blame': 712, 'sorry': 5490, '35': 93, 'travesty': 6102, 'lavos': 3447, 'mercy': 3792, 'soul': 5494, 'selves': 5214, 'indepth': 3081, 'killer': 3373, 'supernatural': 5751, 'storyline': 5649, 'hole': 2886, 'gory': 2637, 'stephen': 5619, 'colletti': 1184, 'initially': 3110, 'low': 3612, 'budget': 857, 'indie': 3086, 'horror': 2926, 'flick': 2377, 'jennifer': 3284, 'nikki': 4023, 'deloach': 1594, 'weekend': 6511, 'bernie': 671, 'terry': 5913, 'kiser': 3383, 'actor': 198, 'lol': 3578, 'anyways': 383, 'mask': 3715, 'maker': 3663, 'horrifying': 2925, 'ir': 3218, 'wondering': 6584, 'shocking': 5304, 'thhat': 5950, 'display': 1760, 'effort': 1960, 'dialogue': 1662, 'pacing': 4219, 'blonde': 728, 'don': 1816, 'tree': 6110, 'root': 5036, 'yard': 6649, 'knee': 3388, 'familiarity': 2248, 'cinematic': 1098, 'path': 4282, 'typically': 6172, 'predictable': 4541, 'fashion': 2266, 'unstoppable': 6288, 'victim': 6381, 'skinned': 5396, 'face': 2219, 'motivation': 3916, 'elicit': 1982, 'sympathy': 5820, 'entry': 2065, 'familiar': 2247, 'territory': 5909, 'tread': 6105, 'cast': 964, 'competence': 1239, 'effective': 1957, 'guilt': 2717, 'buffoon': 858, 'treat': 6107, 'williams': 6554, 'cameo': 909, 'role': 5024, 'vet': 6371, 'berryman': 674, 'unusually': 6299, 'restrain': 4949, 'jamie': 3262, 'lee': 3470, 'curtis': 1482, 'leading': 3461, 'survivor': 5786, 'resilient': 4934, 'resourceful': 4938, 'doom': 1821, 'fiance': 2316, 'sic': 5324, 'equally': 2074, 'ended': 2017, 'pave': 4294, 'ii': 3004, 'unmasked': 6259, 'filmmaker': 2327, 'shamelessly': 5276, 'borrow': 780, 'stale': 5577, 'elevate': 1980, 'maniac': 3677, 'unimaginative': 6238, 'exactly': 2122, 'fright': 2496, 'safe': 5078, 'brief': 827, 'appearance': 403, 'icon': 2986, 'dullness': 1894, 'misnomer': 3853, 'maskerade': 3716, 'genre': 2579, 'exhibit': 2156, 'genealogy': 2569, 'sittings': 5377, 'thi': 5952, 'ghetto': 2589, 'inhabitant': 3109, 'artifact': 456, 'ration': 4763, 'card': 938, 'certificate': 1001, 'yellow': 6657, 'kovno': 3399, 'kaunas': 3348, 'researcher': 4929, 'ordinary': 4144, 'holocaust': 2894, 'diary': 1666, 'ilya': 3016, 'gerber': 2584, 'extract': 2205, 'alexandra': 282, 'zapruder': 6675, 'ssalvaged': 5570, 'eastern': 1929, 'medincine': 3765, 'fo': 2398, 'methodology': 3804, 'medicine': 3763, 'healing': 2801, 'dissappointe': 1772, 'unhappy': 6234, 'practice': 4523, 'dept': 1620, 'primer': 4576, 'theoretical': 5943, 'basis': 604, 'application': 408, 'originate': 4154, 'philosophy': 4368, 'emphasize': 2004, 'balance': 576, 'nature': 3982, 'agriculture': 262, 'informative': 3106, 'starcraft': 5585, 'warcraft': 6464, 'redundant': 4829, 'warhammer': 6468, 'combat': 1194, 'pleasure': 4432, 'absentia': 153, 'respect': 4939, 'ambient': 320, 'enthusiast': 2058, 'whatsoever': 6525, 'hillage': 2863, 'forewarned': 2432, 'darn': 1514, 'guitarist': 2721, 'showcase': 5316, 'todd': 6026, 'rundgren': 5059, 'utopia': 6326, 'dissapointed': 1769, 'direction': 1705, 'dry': 1882, 'pillow': 4391, 'temperpedic': 5890, 'post': 4499, 'tiny': 6014, 'foam': 2399, 'preform': 4545, 'worthless': 6611, 'unatural': 6190, 'super': 5744, 'handle': 2747, 'heavy': 2815, 'cushy': 1486, 'styrofoam': 5698, 'squish': 5569, 'dense': 1606, 'prop': 4636, 'whilst': 6528, 'pollicie': 4466, 'ugh': 6175, 'sleep': 5406, 'leverage': 3492, 'position': 4489, 'width': 6544, 'rotate': 5040, 'lay': 3451, 'false': 2246, 'approx': 415, 'thickness': 5954, 'moshelle': 3912, 'deploy': 1613, 'military': 3822, 'disrespectful': 1767, 'courtesy': 1393, 'formulation': 2446, 'artec': 451, 'oreal': 4145, '29': 85, '07': 4, 'pic': 4381, 'professionel': 4610, 'potion': 4508, 'reply': 4915, 'serum': 5240, 'curly': 1476, 'frizzy': 2501, 'smooths': 5439, 'rid': 4990, 'allof': 295, 'friz': 2500, 'damp': 1502, 'weather': 6501, 'double': 1832, 'fix': 2355, 'slightly': 5411, 'tall': 5837, 'textureline': 5926, 'smoothing': 5437, 'controlgel': 1335, 'barber': 589, 'shop': 5308, 'grocery': 2696, 'dick': 1669, 'curless': 1475, 'provoke': 4658, 'tapping': 5846, 'winter': 6562, 'racquet': 4734, 'suit': 5732, 'advanced': 230, 'noticeably': 4054, 'powerful': 4519, 'forgiving': 2437, 'prefer': 4544, 'graphite': 2666, 'outfit': 4170, 'skater': 5388, 'skate': 5386, 'automatically': 525, 'hello': 2826, 'skating': 5389, 'technique': 5873, 'amazon1': 318, 'stroke': 5674, 'spiral': 5542, 'body': 742, 'layback': 3452, 'split': 5547, 'muscle': 3940, 'kick': 3365, 'pictures': 4385, 'vugame': 6435, 'stroking': 5675, 'crooked': 1440, 'loading': 3564, 'interactive': 3172, 'worthwhile': 6612, 'bore': 775, 'stink': 5630, 'scream': 5175, 'ice': 2983, 'simplistic': 5352, 'strive': 5673, 'enjoyment': 2043, 'rom': 5026, 'skippy': 5400, 'dud': 1890, 'boloney': 750, 'develope': 1649, 'coordination': 1350, 'pete': 4359, 'sake': 5084, 'seven': 5253, 'windows': 6559, 'xp': 6642, 'frustration': 2507, 'crowd': 1445, 'despite': 1634, 'freestyle': 2483, 'trick': 6116, 'yesterday': 6661, 'confidence': 1284, 'smoothly': 5438, '1978': 49, 'lauren': 3445, 'bacall': 560, 'cbs': 981, 'gentleman': 2580, 'ooh': 4124, 'la': 3405, 'rating': 4761, 'pity': 4405, 'delivery': 1593, 'mail': 3652, 'joy': 3319, 'rare': 4755, 'audi': 508, 'focus': 2400, 'quattro': 4707, 'truck': 6135, 'pleased': 4431, 'horse': 2927, '2004': 62, 'alternative': 305, 'etch': 2100, 'somber': 5472, 'imagery': 3018, 'melancholic': 3772, 'quietly': 4719, 'introspective': 3202, 'laid': 3416, 'works': 6600, 'coffee': 1174, 'tea': 5862, 'newspaper': 4011, 'sunday': 5741, 'afternoon': 249, 'eforcity': 1961, 'disingenuous': 1752, 'gizmo': 2605, 'supplier': 5755, 'nasty': 3972, 'feign': 2300, 'mailing': 3653, 'cop': 1351, 'apartment': 388, 'switch': 5811, 'living': 3558, 'occassionaly': 4093, 'outlet': 4174, 'plug': 4441, 'saver': 5120, 'tab': 5824, 'device': 1654, 'hitch': 2875, 'mystical': 3956, 'thinkman': 5960, 'thhis': 5951, 'luv': 3627, 'deduction': 1553, 'relax': 4860, 'lets': 3489, 'souful': 5492, 'fixx': 2357, 'rupert': 5062, 'mastrpiece': 3725, 'dr': 1844, 'track03': 6070, 'legend': 3476, 'stranger': 5656, 'dream': 1862, 'surreal': 5780, 'imagry': 3022, 'galore': 2541, 'symbolism': 5816, 'soph': 5484, 'auuuuur': 528, 'kabbalah': 3342, '777': 124, 'maby': 3633, 'someday': 5473, 'whe': 6526, 'annal': 360, 'bbbbbuuuuuyyyy': 612, 'peace': 4299, 'frind': 2499, 'shine': 5295, 'prmised': 4590, 'ecstasy': 1942, 'thought': 5970, 'captivating': 934, 'lj': 3559, 'smith': 5431, 'ash': 465, 'teenage': 5881, 'beacuase': 617, 'sitiuation': 5375, 'charm': 1028, 'humor': 2958, 'paranormal': 4255, 'disaster': 1722, 'area': 427, 'darkness': 1513, 'hook': 2909, 'protect': 4649, 'jermey': 3289, '100th': 7, 'questionare': 4715, 'lynnette': 3630, 'happiness': 2762, 'mate': 3729, 'irrestible': 3226, 'vampyre': 6339, '21': 71, 'possibly': 4498, 'instantly': 3143, 'rowan': 5046, 'jade': 3258, 'kestrel': 3355, '6th': 122, 'inspire': 3138, 'poetry': 4450, 'brilliance': 830, 'induce': 3090, 'respond': 4942, 'iread': 3220, 'smithdescribed': 5432, 'especialy': 2090, 'personality': 4348, 'vampire': 6338, 'vision': 6406, 'trilogy': 6121, 'smiths': 5433, 'thehuntress': 5938, 'ihave': 3003, 'wasprobably': 6480, 'mobsessed': 3872, 'lynette': 3629, 'awesomeanybody': 548, 'recognize': 4810, 'dosn': 1829, 'soulmate': 5496, 'befriend': 639, 'rogue': 5023, 'unrealistic': 6274, 'slayer': 5405, 'tune': 6149, 'jeramey': 3286, 'sofar': 5458, 'ihappen': 3002, 'agreement': 260, 'male': 3668, 'attraction': 505, 'characters': 1020, 'witch': 6566, 'writng': 6629, 'instance': 3141, 'speelbound': 5530, 'salem': 5086, 'isis': 3233, 'witchcraft': 6567, 'hecate': 2816, 'wtich': 6633, 'legendary': 3477, 'crature': 1413, 'nonogram': 4035, 'publishing': 4672, 'erasing': 2080, 'gel': 2566, 'pen': 4311, 'ink': 3112, 'bleed': 718, 'square': 5563, 'finished': 2340, 'pixillated': 4408, 'locate': 3569, 'solving': 5471, 'magazine': 3645, 'publication': 4668, 'mirror': 3841, 'glimpse': 2613, 'nation': 3975, 'shortly': 5313, 'stuffy': 5690, 'hofstader': 2879, 'scholar': 5155, 'compelling': 1237, 'illuminating': 3012, 'precious': 4536, 'examine': 2125, 'evolution': 2119, 'conclusion': 1273, 'spoon': 5554, 'feed': 2294, 'vast': 6346, 'datum': 1518, 'breezy': 822, 'readable': 4771, 'reflection': 4837, 'bravo': 811, 'professor': 4611, 'regret': 4845, 'yout': 6668, 'obligation': 4083, 'wade': 6440, 'quagmire': 4703, 'redundantly': 4830, 'likely': 3525, 'wealth': 6496, 'overly': 4195, 'intellecual': 3158, 'decipher': 1549, 'nealry': 3988, '300': 90, 'rediculously': 4826, 'awake': 542, 'invest': 3208, 'witht': 6570, 'endlessly': 2020, 'stlye': 5633, 'somewhat': 5475, 'useful': 6319, 'concentration': 1266, 'richard': 4986, 'hofstadter': 2880, 'convincing': 1343, 'colonial': 1186, 'monotonously': 3891, 'thouroughly': 5972, '1750': 33, 'fanatic': 2252, 'wide': 6540, 'perspective': 4351, 'kylie': 3404, 'bite': 704, 'subtlety': 5713, 'minogue': 3835, 'subtle': 5712, 'performer': 4332, 'hi': 2848, 'nrg': 4062, 'floorfillers': 2386, 'purr': 4698, 'ballad': 578, 'nightmare': 4021, 'flow': 2391, 'namesake': 3961, 'catchy': 973, 'chorus': 1077, 'melt': 3777, 'remixe': 4891, 'funky': 2523, 'emo': 1997, 'language': 3421, 'usa': 6315, 'wondeful': 6579, 'import': 3037, 'limited': 3531, 'pennsylvania': 4317, 'border': 773, 'edit': 1945, 'cut': 1490, 'arrangement': 443, '102': 9, 'rarity': 4757, '103': 10, 'tom': 6035, 'middleton': 3814, 'cosmos': 1373, 'dub': 1886, '104': 11, 'woderful': 6576, '105': 12, 'fever': 2312, '10i': 14, 'cord': 1357, 'thomas': 5964, 'naturally': 3981, 'stories': 5646, 'compel': 1236, 'related': 4856, 'bunch': 869, 'rambling': 4743, 'pbs': 4296, 'involve': 3213, 'mindless': 3829, 'prattle': 4526, 'footage': 2418, 'meet': 3768, 'promotional': 4629, 'rent': 4902, 'kay': 3349, 'aside': 469, 'sodor': 5457, 'celebration': 989, 'interview': 3193, 'annoy': 363, 'interviewes': 3194, 'gear': 2563, 'thinking': 5959, 'normal': 4041, 'solely': 5465, 'promo': 4627, 'center': 992, 'commercial': 1217, 'ode': 4103, 'allcroft': 290, 'penn': 4316, 'introduction': 3200, 'pain': 4229, 'advice': 241, 'interspersed': 3190, 'thinly': 5961, 'disguise': 1744, 'awfully': 551, 'jut': 3338, 'invariably': 3207, 'duplications': 1900, 'sigh': 5331, 'asleep': 472, 'unleash': 6253, 'horrific': 2924, 'warrior': 6476, 'brilliantly': 832, 'novell': 4057, 'gore': 2635, 'masterton': 3723, '25k': 81, 'warranty': 6475, 'apc': 391, '000': 1, 'equipment': 2076, 'surge': 5770, 'protector': 4651, 'printer': 4581, 'strip': 5672, 'docker': 1802, 'stretch': 5668, 'waist': 6441, 'tight': 5995, 'inch': 3060, 'earn': 1923, 'comfortable': 1204, 'inseam': 3124, 'dockers': 1803, 'largely': 3430, 'atkinson': 492, 'slack': 5403, 'waistband': 6443, 'ge': 2562, 'stateting': 5605, 'avability': 529, 'usual': 6324, 'normally': 4042, 'shippment': 5299, 'postage': 4500, 'flimsy': 2379, 'dye': 1913, 'fabric': 2217, 'advertising': 239, 'cafe': 901, 'gray': 2673, 'mislead': 3851, 'skinny': 5397, 'hefty': 2821, 'handling': 2748, 'pricy': 4572, 'business': 881, 'chain': 1006, 'khaki': 3363, 'iron': 3223, 'retelling': 4957, 'emphasis': 2003, 'best': 675, 'suited': 5734, 'mature': 3734, 'inquisitive': 3121, 'shy': 5322, 'disturb': 1788, 'arefrightene': 428, 'baffled': 569, 'supposedto': 5762, 'retold': 4961, 'fairy': 2237, 'tale': 5830, 'abstract': 159, 'grim': 2685, 'wwii': 6635, 'berlin': 670, 'kreig': 3401, 'glove': 2622, 'envelop': 2067, 'valuable': 6336, 'schooler': 5158, 'italy': 3244, 'longing': 3582, 'vicarious': 6377, 'interrupt': 3188, 'youth': 6669, 'confusion': 1292, 'cent': 991, 'lapse': 3427, 'credibility': 1423, 'hoppingly': 2917, 'gullible': 2723, 'sufficiently': 5729, 'rome': 5033, 'appreciation': 412, 'breadth': 813, 'depth': 1621, 'dish': 1748, 'satisfying': 5112, 'flight': 2378, 'tasty': 5855, 'capella': 926, 'food': 2413, 'foodie': 2414, 'hearted': 2809, 'relatively': 4859, 'detract': 1646, 'triad': 6112, 'bruno': 850, 'laura': 3443, 'tomasso': 6036, 'cappella': 930, 'cyrano': 1495, 'locale': 3568, 'enhance': 2039, 'trifle': 6118, 'overdone': 4186, 'pun': 4679, 'playboy': 4424, 'penthouse': 4320, 'occupy': 4096, 'sweep': 5805, 'sensual': 5226, 'glory': 2619, 'devour': 1658, 'preparation': 4552, 'described': 1623, 'mexican': 3805, 'magic': 3646, 'fable': 2216, 'medieval': 3764, 'cook': 1345, 'stary': 5597, 'cooking': 1347, 'uhmmmm': 6178, 'nude': 4064, 'newlywed': 4010, 'booty': 771, 'inter': 3169, 'mandatory': 3675, 'cameriere': 911, 'chick': 1054, 'lit': 3548, 'roman': 5027, 'market': 3701, 'cursing': 1481, 'stud': 5683, 'underdeveloped': 6205, 'latte': 3438, 'porn': 4480, 'desire': 1630, 'substatially': 5710, 'bathe': 608, 'allure': 297, 'tip': 6015, 'lovemaking': 3608, 'insult': 3152, 'vivid': 6418, 'predictible': 4542, 'mafia': 3644, 'recepie': 4802, 'lousy': 3603, 'worst': 6609, 'superlative': 5750, 'embraceable': 1993, 'jerk': 3287, 'portray': 4486, 'innocently': 3117, 'unaware': 6192, 'promiscuous': 4624, 'inappropriate': 3055, 'facet': 2220, 'literature': 3552, 'substance': 5707, 'sexy': 5263, 'eating': 1932, 'metaphor': 3800, 'making': 3665, 'adequate': 214, 'virtual': 6401, 'orgasm': 4149, 'pallid': 4238, 'occasionally': 4092, 'positively': 4491, 'unappetizing': 6188, 'overlook': 4193, 'openly': 4128, 'cribbed': 1427, 'bergerac': 669, 'swiftly': 5808, 'diminish': 1695, 'plane': 4416, 'suppose': 5760, 'generally': 2571, 'management': 3674, 'accounting': 182, 'heap': 2804, 'report': 4916, 'brush': 851, 'financial': 2334, 'analysis': 335, 'ability': 144, 'topic': 6045, 'bedtable': 634, 'pre': 4530, 'mba': 3740, 'mbas': 3741, 'cash': 960, 'limit': 3530, 'tha': 5927, 'thang': 5930, 'hood': 2907, 'number1': 4070, 'existence': 2160, 'permanently': 4341, 'millionaire': 3825, 'repetitive': 4910, 'repeat': 4906, 'huh': 2952, 'yeah': 6653, 'hotboys': 2935, 'sign': 5333, 'passing': 4275, 'fad': 2227, 'hop': 2913, 'bfi': 683, 'lil': 3526, 'alright': 303, 'rapper': 4753, 'goog': 2634, 'accomidate': 174, 'beats': 624, 'cmr': 1165, 'ghenry187': 2588, 'aol': 384, 'com': 1193, 'tru': 6134, 'million': 3824, 'billboard': 692, 'juvenile': 3339, 'suck': 5719, '17': 31, 'bounce': 787, 'ass': 475, 'overused': 4202, 'unoriginal': 6266, 'turk': 6153, 'da': 1496, 'solo': 5468, 'ya': 6646, 'll': 3560, 'quit': 4720, 'hatin': 2780, 'comedian': 1200, 'wayne': 6492, 'alien': 286, 'skateboard': 5387, 'deffinetly': 1564, 'starz': 5598, 'hea': 2793, 'guerilla': 2709, 'warfare': 6467, 'juvenilez': 3340, 'degreez': 1582, 'sept': 5234, '1999': 57, 'manie': 3680, 'snoop': 5444, 'doggystyle': 1811, 'dre': 1859, 'chronic': 1084, 'dictionary': 1673, 'wack': 6439, 'belive': 656, 'bellbottom': 658, 'wit': 6565, 'crap': 1408, 'walk': 6450, 'untamed': 6294, 'ryhmes': 5068, 'crab': 1401, 'foolishness': 2416, 'guerriula': 2710, 'meanless': 3751, 'pointless': 4453, 'backround': 562, 'hotboy': 2934, 'definetely': 1571, 'strong': 5676, 'tuesday': 6147, 'thursday': 5989, 'ridin': 4997, 'bout': 790, 'continuation': 1320, 'bump': 868, 'iz': 3253, 'aight': 268, 'dissin': 1778, 'diss': 1768, 'dis': 1711, 'talklin': 5836, 'talkin': 5834, 'satin': 5109, 'shit': 5301, 'thta': 5986, 'sayin': 5126, 'bling': 723, 'ha': 2729, 'amd': 322, 'makin': 3664, 'tymer': 6169, 'shot': 5314, 'hoody': 2908, 'hooo': 2911, 'tearin': 5868, 'logic': 3575, 'raise': 4740, 'responsible': 4945, 'worker': 6597, 'restaurant': 4948, 'antoine': 378, 'french': 2486, 'quarter': 4705, 'orleans': 4155, 'themed': 5942, 'mardi': 3694, 'gras': 2667, 'mid': 3812, '20th': 70, 'delightful': 1589, 'frances': 2471, 'parkinson': 4263, 'keyes': 3359, 'beauregard': 625, 'confederate': 1281, 'madame': 3642, 'castel': 965, 'lodger': 3572, 'phyllis': 4375, 'zimbler': 6680, 'miller': 3823, 'gently': 2582, 'unlimited': 6257, 'reserve': 4932, 'virginian': 6399, 'sun': 5740, 'brave': 809, 'tim': 6000, 'persuasion': 4353, 'stash': 5599, 'dinner': 1700, 'touching': 6058, 'privileged': 4589, 'sunny': 5743, 'certainty': 1000, 'january': 3267, 'february': 2290, '1948': 44, 'beanie': 620, 'stuffed': 5688, 'wheel': 6527, 'lesson': 3486, 'desktop': 1632, 'html': 2945, 'dhtml': 1659, 'relate': 4855, 'powell': 4517, 'xml': 6641, 'thorough': 5967, 'comprehensive': 1258, 'css': 1458, 'web': 6502, 'smart': 5426, 'index': 3082, 'attribute': 507, 'clever': 1128, 'expand': 2165, 'technology': 5876, 'intermediate': 3181, 'experienced': 2173, 'website': 6503, 'creator': 1422, 'comparision': 1232, 'analyst': 336, 'ibm': 2982, 'client': 1134, 'server': 5243, 'immense': 3030, 'htmls': 2946, 'expert': 2176, 'tweek': 6159, 'correctly': 1367, 'plainly': 4414, 'arrival': 444, 'indicate': 3083, 'javascript': 3272, 'healthy': 2803, 'allergic': 292, 'solaray': 5462, 'vitamin': 6415, 'thoughtful': 5971, 'fiction': 2317, 'cultural': 1464, 'generational': 2573, 'borderland': 774, 'identity': 2991, 'marginalise': 3695, 'outcast': 4166, 'forge': 2433, 'bond': 754, 'friendhip': 2493, 'teo': 5899, 'evoke': 2118, 'flavour': 2368, 'provoking': 4659, 'boundary': 788, 'melting': 3778, 'australia': 517, 'complain': 1243, 'ffvi': 2314, 'sne': 5441, 'orchestrate': 4142, 'ff': 2313, 'portrayal': 4487, 'cartridge': 958, 'throat': 5980, 'spray': 5558, 'hurry': 2966, 'relieve': 4873, 'yo': 6663, 'sore': 5488, 'recommendation': 4815, 'mom': 3884, 'numb': 4068, 'elbow': 1968, 'glider': 2610, 'sooner': 5480, 'bench': 662, 'metal': 3799, 'unsure': 6293, 'tighten': 5996, 'indent': 3079, 'bar': 588, 'weld': 6516, 'sturdy': 5695, 'assemble': 478, 'movement': 3921, 'blah': 711, 'compliment': 1251, 'gauge': 2560, 'chair': 1007, 'arm': 437, 'packed': 4224, 'shipment': 5297, 'unwrap': 6303, 'heck': 2817, 'drag': 1846, 'backyard': 565, 'wet': 6520, 'glides': 2611, 'patio': 4287, 'james': 3261, 'moody': 3898, 'known': 3397, 'awsome': 554, 'sample': 5092, 'consistantly': 1300, 'perk': 4338, 'ear': 1919, 'iv': 3251, 'rush': 5063, 'remixes': 4892, 'mayfield': 3738, 'tunes': 6150, 'eerie': 1954, 'signature': 5335, 'heavenly': 2813, 'tinsel': 6013, 'itune': 3250, 'woburn': 6575, 'untold': 6296, 'ancestor': 339, 'samuel': 5093, 'richardson': 4987, '1630': 30, 'facinating': 2221, 'understanding': 6216, 'generation': 2572, 'united': 6247, 'states': 5603, 'jacob': 3257, 'beth': 678, 'interpret': 3187, 'jewish': 3295, 'worthy': 6613, 'midrash': 3815, 'unshackle': 6281, 'rabbi': 4725, 'lawrence': 3450, 'kushner': 3403, 'element': 1975, 'juggling': 3324, 'momentum': 3886, 'thrust': 5984, 'zoo': 6683, 'biblical': 686, 'truth': 6141, 'gospel': 2638, 'socialism': 5453, 'economic': 1937, 'pseudo': 4664, 'psychology': 4666, 'broad': 836, 'variety': 6344, 'political': 4464, 'romanticism': 5032, 'collective': 1181, 'anarchy': 338, 'commune': 1222, 'foster': 2458, 'individualism': 3089, 'socialisms': 5454, 'centuries': 994, 'meandering': 3746, 'tract': 6073, 'veer': 6352, 'aesthetic': 243, 'synthesize': 5822, 'unsuccessful': 6290, 'quotable': 4723, 'daydream': 1525, 'wilde': 6550, 'shah': 5269, 'rukh': 5056, 'khan': 3364, 'rani': 4749, 'mukerji': 3932, 'marriage': 3707, 'prejudiced': 4550, 'marry': 3709, 'respectful': 4940, 'supportive': 5759, 'welcome': 6515, 'spouse': 5557, 'bollywood': 749, 'shahrukh': 5270, 'consist': 1299, 'woos': 6589, 'bickering': 689, 'excessive': 2137, 'argument': 431, 'misunderstanding': 3863, 'pixilate': 4407, 'watchable': 6484, 'married': 3708, 'cat': 970, 'angry': 352, 'yell': 6656, 'length': 3482, 'grrrrr': 2704, 'arrrrgh': 449, 'endure': 2022, 'praxis': 4527, 'teaching': 5865, 'test': 5917, 'discussion': 1738, 'discuss': 1737, 'outside': 4179, 'chop': 1075, 'answer': 369, 'excuse': 2150, 'mulch': 3933, 'kindle': 3378, 'entomb': 2062, 'wolverine': 6577, 'respectively': 4941, 'maniacs': 3679, 'uprising': 6309, 'cheer': 1038, 'bye': 896, 'critter': 1438, 'bind': 695, 'bass': 605, 'jordan': 3316, 'attitude': 502, 'singing': 5363, 'anita': 357, 'whith': 6534, 'drum': 1879, 'sheila': 5288, 'unique': 6245, 'scat': 5144, 'singin': 5362, 'suggestion': 5731, 'acoustic': 190, 'accord': 178, 'knowledge': 3395, 'duo': 1898, 'keen': 3350, 'instrumental': 3150, 'danelectro': 1504, 'dj': 1797, '20c': 69, 'rocky': 5019, 'road': 5012, 'mini': 3830, 'pedal': 4303, 'cheaply': 1034, 'terrific': 5907, 'organ': 4146, 'leslie': 3485, 'mess': 3796, 'setting': 5250, 'rubber': 5049, 'highlyrecommend': 2859, 'forwhat': 2456, 'jax': 3274, 'llcwho': 3561, 'service': 5244, 'pace': 4216, 'rivet': 5008, 'carpathian': 951, 'bookstore': 764, 'gothic': 2641, 'scarletti': 5140, 'curse': 1480, 'supposedly': 5761, 'grin': 2686, 'gargoyle': 2552, 'impressive': 3049, 'riveting': 5009, 'fascinatin': 2263, 'entanglement': 2050, 'feehan': 2297, 'warn': 6471, 'christine': 1082, 'intduction': 3154, 'nicotetta': 4017, 'giovanni': 2598, 'bryon': 852, 'lifemate': 3514, 'engine': 2032, 'quicker': 4717, 'darius': 1511, 'nicoletta': 4016, 'village': 6390, 'healer': 2800, 'overlord': 4194, 'exercise': 2153, 'select': 5209, 'bride': 824, 'palazzio': 4235, 'foreboding': 2425, 'mysterious': 3954, 'frightening': 2497, 'villager': 6391, 'communicate': 1223, 'similarity': 5346, 'sensuality': 5227, 'burgeon': 873, 'pounce': 4512, 'independant': 3080, 'sharing': 5282, 'eager': 1916, 'presence': 4554, 'enchant': 2007, 'unlike': 6255, 'temper': 5888, 'tantrum': 5842, 'ignite': 2997, 'passion': 4276, 'empathy': 2002, 'sensitivity': 5225, 'plentiful': 4434, 'intensity': 3164, 'dissappointed': 1773, 'potential': 4506, 'develop': 1648, 'spark': 5512, 'suspenseful': 5791, 'protaganist': 4647, 'believable': 653, 'pound': 4513, 'storie': 5645, 'ick': 2985, 'ack': 189, 'submissive': 5702, 'wifey': 6546, 'paraphrase': 4256, 'eeeeew': 1953, 'stalk': 5578, 'tide': 5992, 'emma': 1996, 'holly': 2892, 'upyr': 6312, 'scarlatti': 5137, 'humdrum': 2957, 'reach': 4767, 'dimensional': 1693, 'enlarge': 2044, 'advantage': 231, 'success': 5716, 'begining': 643, 'congratulation': 1293, 'anough': 367, 'immotionaly': 3032, 'sitting': 5376, 'climatic': 1137, 'quesse': 4713, 'weird': 6514, 'servant': 5241, 'lively': 3556, 'adore': 223, 'strongly': 5677, 'severely': 5256, 'confused': 1290, 'blech': 717, 'broodiness': 840, 'virgin': 6397, '_____________edited': 134, 'medival': 3767, 'molasse': 3881, 'camcorder': 908, 'usb': 6317, 'streaming': 5663, 'overseas': 4199, 'choppy': 1076, 'avi': 536, 'recently': 4801, 'driver': 1874, 'frankly': 2473, 'analogue': 332, 'protection': 4650, 'accurate': 185, '2002': 61, 'satified': 5108, 'blur': 736, 'barrel': 596, 'distortion': 1784, 'piano': 4380, 'homework': 2899, 'extremly': 2210, 'common': 1220, 'teacher': 5864, 'lid': 3511, 'accidentally': 172, 'alternate': 304, 'giggle': 2596, 'adjust': 219, 'distinguish': 1782, 'intermittent': 3182, '11': 16, 'squeel': 5565, 'cruel': 1451, 'unusual': 6298, 'patricia': 4289, 'cornwell': 1362, 'boyfriend': 797, 'books': 762, 'hornet': 2920, 'nest': 4005, 'ipad': 3216, 'rksbabydoll': 5010, 'jersey': 3291, 'seat': 5195, 'collapse': 1177, 'stick': 5625, 'taut': 5857, 'thriller': 5979, 'evil': 2116, 'republican': 4922, 'holder': 2885, 'nazi': 3986, 'criminal': 1429, 'dump': 1897, 'buildup': 864, 'whodunnit': 6537, 'evildoer': 2117, 'disappointing': 1719, 'scarpetta': 5141, 'cornwall': 1361, 'herring': 2841, 'patrica': 4288, 'summary': 5738, 'impressions': 3048, 'virtually': 6402, 'trudge': 6136, 'excruciating': 2148, 'extraneous': 2206, 'pages': 4228, 'closure': 1152, 'paramount': 4254, 'harris': 2771, 'levincruel': 3493, 'foray': 2420, 'disheartened': 1749, 'stimulating': 5629, 'trite': 6128, 'turner': 6155, 'grip': 2690, 'unable': 6185, 'underlying': 6209, 'vacuous': 6330, 'inauthentic': 3056, 'offence': 4104, 'inproper': 3119, 'cassette': 961, 'librarian': 3504, 'vouch': 6431, 'envelope': 2068, 'disgruntled': 1743, 'customer': 1488, 'prevent': 4567, 'sincerely': 5357, 'sherry': 5292, 'bohmp': 745, 'shopper': 5309, 'forensic': 2426, 'police': 4461, 'procedural': 4597, 'row': 5045, 'inmate': 3113, 'execute': 2151, 'fingerprint': 2338, 'current': 1477, 'particularly': 4268, 'niece': 4018, 'lucy': 3623, 'amaze': 313, 'installment': 3140, 'potter': 4509, 'field': 2319, 'bog': 743, 'mastery': 3724, 'science': 5163, 'authenticity': 521, 'muddled': 3929, 'disapoint': 1714, 'electric': 1971, 'flesh': 2374, 'commit': 1219, 'redeem': 4823, 'anti': 375, 'social': 5452, 'workaholic': 6596, 'tendency': 5895, 'unresolved': 6276, 'villain': 6392, 'unimportant': 6239, 'flaw': 2369, 'spellbound': 5531, 'agatha': 250, 'christie': 1081, 'dorothy': 1827, 'sayers': 5125, 'displace': 1759, 'wee': 6509, 'bed': 632, 'ends': 2021, 'hardcover': 2766, 'stopping': 5641, 'prime': 4575, 'scattered': 5145, 'shallow': 5273, 'contain': 1313, 'widely': 6541, 'dot': 1831, 'matrix': 3732, 'incomplete': 3066, 'binding': 697, 'professionally': 4609, 'documentation': 1808, 'unclear': 6198, 'usability': 6316, 'testing': 5919, 'soooooooooooo': 5483, 'excruciatingly': 2149, 'adam': 201, 'aristocrat': 435, 'unfunny': 6232, 'partner': 4270, 'earth': 1924, 'misleading': 3852, 'previously': 4569, 'spoilt': 5550, 'crudeness': 1450, 'degrade': 1580, 'zero': 6679, 'overated': 4184, 'nope': 4038, 'studio': 5685, 'dlss': 1800, 'mon': 3888, 'wake': 6447, 'dramedy': 1854, 'sandler': 5096, 'tasked': 5852, 'burden': 871, 'apathetic': 389, 'viewer': 6388, 'climax': 1138, 'lackluster': 3413, '45': 104, 'entirely': 2061, 'flounder': 2389, 'mann': 3685, 'bana': 581, 'did': 1674, 'happend': 2759, 'hill': 2862, 'eminem': 1995, 'snap': 5440, 'engaging': 2031, 'comic': 1209, 'illness': 3010, 'destroy': 1637, 'lifestyle': 3515, 'comesback': 1203, 'beautifull': 627, 'unfocused': 6227, 'necessarily': 3993, 'runtime': 5061, 'sadler': 5075, 'unlikable': 6254, 'george': 2583, 'simmons': 5348, 'aml': 327, 'experimental': 2175, 'drug': 1878, 'wanna': 6461, 'ira': 3219, 'wright': 6622, 'seth': 5248, 'rogen': 5021, 'hire': 2869, 'fer': 2305, 'convince': 1342, 'patch': 4281, 'suddenly': 5725, 'immature': 3026, 'lame': 3418, 'sexual': 5261, 'nudity': 4066, 'tweet': 6160, 'circle': 1100, 'reputation': 4923, 'dramatic': 1853, 'clunky': 1163, 'almostsit': 300, 'supportingperformances': 5758, 'cold': 1175, 'mainstream': 3656, 'press': 4561, 'downhill': 1837, 'jim': 3300, 'carrey': 954, 'moon': 3899, 'drift': 1869, 'oddly': 4102, 'draft': 1845, 'marketing': 3702, 'judd': 3321, 'apatow': 390, 'theater': 5936, 'chuckle': 1089, 'unnecessary': 6263, 'ensemble': 2048, 'schtick': 5160, 'bee': 635, 'chemistry': 1045, 'conversation': 1338, 'acceptable': 169, 'anger': 348, 'nicholson': 4014, 'hilarious': 2861, 'nicky': 4015, 'virigin': 6400, 'humour': 2960, 'artsy': 462, 'spotlight': 5556, 'gilmore': 2597, 'looooong': 3585, 'booooring': 766, 'skit': 5401, 'masturbate': 3726, 'cock': 1171, 'fart': 2261, 'suddenlink': 5724, 'demand': 1597, 'humorless': 2959, 'loser': 3594, 'vat': 6348, 'alzheimers': 310, 'zonke': 6682, 'bird': 699, 'titanic': 6021, 'profanity': 4606, 'degenerate': 1579, 'beneath': 665, 'sewer': 5258, 'comprise': 1262, 'charactor': 1021, 'manly': 3684, 'quentin': 4712, 'tarantino': 5847, 'fair': 2235, 'share': 5281, 'justify': 3337, 'appeal': 400, 'vulgar': 6436, 'offensive': 4106, 'unnessisary': 6264, 'stoop': 5639, 'likeable': 3524, 'swear': 5797, 'lower': 3613, 'adams': 203, 'blantant': 714, 'sexuality': 5262, '22': 74, 'grass': 2669, 'altogether': 307, 'tragic': 6079, 'scott': 5169, 'macneil': 3637, 'vand': 6340, 'collect': 1179, 'depress': 1614, 'wayyyyyyyy': 6493, 'aimless': 270, 'tackle': 5828, 'grandma': 2658, 'flix': 2383, 'moron': 3907, 'smile': 5430, 'performs': 4334, 'moronic': 3908, 'lowly': 3614, 'demeaning': 1598, 'elder': 1969, 'mel': 3771, 'brooks': 842, 'boys': 798, 'spartan': 5514, 'spartans': 5515, 'sidgzlnh_w4': 5328, '2hours': 87, '40yr': 101, 'ppl': 4520, 'hearing': 2807, 'jonah': 3314, 'pf': 4363, 'thankful': 5932, 'pros': 4643, 'liner': 3535, 'comedianscon': 1201, 'excess': 2136, 'foul': 2459, 'adultry': 228, 'inspirational': 3137, 'sfar': 5265, 'delightfully': 1590, 'sephardic': 5233, 'algeria': 284, 'algerian': 285, 'endearing': 2014, 'joann': 3304, 'meow': 3790, 'jorma': 3317, 'npr': 4061, 'lovingly': 3611, 'behold': 648, 'anecdote': 345, 'jews': 3296, 'north': 4044, 'thirty': 5962, 'hysterically': 2981, 'sardonic': 5104, 'phrase': 4373, 'stitch': 5632, 'comic_strip': 1210, 'musing': 3947, 'judaism': 3320, 'vital': 6414, 'artwork': 463, 'feast': 2285, 'dynamic': 1914, 'accompany': 176, 'wane': 6460, 'artistically': 460, 'delt': 1595, 'exotic': 2163, 'whimsical': 6529, 'lovely': 3607, 'host': 2931, 'protagonist': 4648, 'narrator': 3968, 'greek': 2680, 'sardonically': 5105, 'lengthy': 3483, 'detour': 1645, 'richly': 4988, 'diverse': 1790, 'seek': 5206, 'preserve': 4558, 'tradition': 6076, 'interact': 3170, 'multilayered': 3934, 'environment': 2069, 'serve': 5242, 'kindly': 3380, 'promising': 4626, 'delve': 1596, 'bitterness': 706, 'glorification': 2617, 'sin': 5356, 'ungodly': 6233, 'habit': 2730, 'judge': 3322, 'forever': 2431, 'hypocrite': 2976, 'account': 179, 'repentance': 4908, 'avoidance': 540, 'humanist': 2954, 'secular': 5202, 'atheist': 491, 'pagan': 4226, 'lifestyles': 3516, 'appealing': 401, 'swearing': 5798, 'denigrate': 1605, 'religion': 4874, 'stereotypical': 5621, 'suitable': 5733, 'disrespect': 1766, 'vulgarity': 6437, 'dissent': 1776, 'witness': 6571, 'sloan': 5416, 'jean': 3280, 'assassination': 477, 'president': 4559, 'kennedy': 3354, 'fed': 2292, 'article': 454, 'lady': 3415, 'jfk': 3297, '1998': 56, 'clint': 1142, 'bradford': 803, 'location': 3570, 'limelight': 3528, 'conspiracy': 1304, 'laughable': 3441, 'jackie': 3256, 'interviews': 3195, 'bs': 853, 'stem': 5616, 'stern': 5622, 'wallet': 6453, '1963': 46, 'devolve': 1656, 'government': 2645, 'statement': 5602, 'unsupported': 6292, 'lauds': 3439, 'garrison': 2555, 'persecution': 4345, '65533': 120, 'tapes': 5845, 'toole': 6043, 'uneducated': 6222, 'nonsense': 4036, 'physics': 4378, 'department': 1608, 'mad': 3641, 'sum': 5735, 'incoherant': 3064, 'physically': 4377, 'defy': 1578, 'debunk': 1540, 'furthermore': 2528, 'insane': 3122, 'babbling': 556, 'evidence': 2115, 'education': 1950, 'sword': 5812, 'december': 1544, '9th': 133, 'wednesday': 6508, 'absolutly': 157, 'blaze': 716, 'tracking': 6071, 'confident': 1285, 'demon': 1601, 'deffiently': 1563, 'unsharpened': 6282, 'stainless': 5575, 'steel': 5611, 'sharpen': 5284, 'wallhanger': 6455, 'katana': 3346, 'buddy': 856, 'symbol': 5815, 'blade': 710, 'fraud': 2475, 'entile': 2059, 'seventh': 5254, 'cliburn': 1130, 'competion': 1240, 'misrepresentation': 3854, 'adaptation': 204, 'mishmash': 3849, 'gibberish': 2592, 'convert': 1340, 'laundry': 3442, 'curve': 1483, 'cologne': 1185, 'latest': 3436, '22nd': 75, 'worried': 6605, 'update': 6305, 'amazons': 319, 'gap': 2548, 'heaven': 2812, 'scent': 5149, 'crush': 1454, 'unsatisfying': 6279, 'nicely': 4013, 'abruptly': 150, 'contractual': 1324, 'slap': 5404, 'superhuman': 5748, 'villian': 6393, 'terminate': 5903, 'method': 3803, 'adequately': 215, 'entriquing': 2064, 'possibility': 4495, 'genetic': 2576, 'gault': 2561, 'uncle': 6197, 'investigation': 3209, 'possibilite': 4494, 'formulaic': 2445, 'york': 6664, 'scarpetti': 5142, 'lucie': 3619, 'affair': 244, 'det': 1639, 'marino': 3698, 'honesty': 2903, 'displeasure': 1761, 'acquaint': 191, 'adjective': 218, 'apparent': 398, 'suffer': 5728, 'protracted': 4654, 'tripe': 6125, 'implausible': 3035, 'insufferable': 3151, 'snob': 5443, 'understandable': 6215, 'astonished': 489, 'bestseller': 676, 'brain': 804, 'numbingly': 4072, 'inept': 3093, 'sixth': 5380, 'labor': 3409, 'grit': 2695, 'bizarre': 707, 'gut': 2724, 'suspicious': 5793, 'factual': 2226, 'integrity': 3156, 'murderer': 3939, 'pale': 4236, 'malamute': 3667, 'agency': 253, 'ignorant': 3000, 'profile': 4612, 'twiddle': 6163, 'thumb': 5987, 'complaining': 1244, 'amused': 330, 'pit': 4401, 'contrived': 1333, 'boiler': 747, 'meaningless': 3749, 'thatdefy': 5934, 'sympathetic': 5818, 'suspend': 5789, 'disbelief': 1724, 'andsnarled': 344, 'bureacratic': 872, 'conflict': 1287, 'enforcement': 2029, 'haste': 2777, 'imitation': 3025, 'deserves': 1628, 'pile': 4388, 'remainder': 4881, 'table': 5825, 'achievment': 188, 'disturbing': 1789, 'heros': 2840, 'inconsistent': 3068, 'potters': 4510, 'temple': 5891, 'scary': 5143, 'threatening': 5976, 'partly': 4269, 'coarse': 1166, 'intelligently': 3161, 'gruff': 2705, 'demeanor': 1599, 'emotionally': 2000, 'grisham': 2693, 'amazed': 314, 'wildly': 6552, 'spoil': 5548, 'satan': 5107, 'incarnate': 3057, 'hannibal': 2753, 'lecter': 3469, 'insulting': 3153, 'intelligence': 3159, 'bury': 876, 'notoriously': 4055, 'chief': 1055, 'medical': 3761, 'examiner': 2127, 'virginia': 6398, 'consult': 1309, 'pathologist': 4284, 'fbi': 2283, 'cunning': 1468, 'spree': 5559, 'richmond': 4989, 'resurface': 4955, 'transient': 6090, 'frozen': 2503, 'central': 993, 'park': 4260, 'homeless': 2897, 'priority': 4584, 'terrifying': 5908, 'fetid': 2310, 'tunnel': 6151, 'grisly': 2694, 'accuracy': 184, 'voltage': 6424, 'suspensful': 5792, 'lacking': 3412, 'storyteller': 5650, 'possess': 4492, 'inevitable': 3095, 'showdown': 5317, 'crony': 1439, 'characterize': 1019, 'tracks': 6072, 'knowledgable': 3394, 'sensantional': 5221, 'fireplace': 2344, 'yank': 6647, 'skipper': 5399, 'doc': 1801, 'resturante': 4952, 'using': 6323, 'subway': 5714, 'plausible': 4421, 'vicariously': 6378, 'involved': 3214, 'spontaneously': 5553, 'visa': 6404, 'genius': 2578, 'engineer': 2033, 'passable': 4274, 'stilte': 5628, 'hokey': 2882, 'tin': 6011, 'wizard': 6573, 'oz': 4213, 'oil': 4114, 'jerky': 3288, 'clumsy': 1162, 'hollow': 2891, 'ring': 5000, 'churn': 1091, 'profit': 4613, 'terror': 5910, 'goult': 2644, 'killing': 3374, 'thompson': 5965, 'bluegrass': 734, 'prove': 4655, 'dynamite': 1915, 'coroner': 1363, 'progress': 4619, 'rekindle': 4854, 'undo': 6219, 'rectification': 4820, 'ross': 5038, 'tessa': 5916, 'diagnose': 1660, 'fatal': 2272, 'hostility': 2932, 'gloss': 2620, 'thousand': 5973, 'shinchan': 5294, 'funimation': 2522, 'seasons': 5194, 'americanize': 325, 'brow': 844, 'grant': 2662, 'retain': 4956, 'innocence': 3115, 'mischief': 3844, 'manga': 3676, 'funimaiton': 2521, 'noisy': 4031, 'vibration': 6376, 'massager': 3717, 'objective': 4082, 'finding': 2335, 'reasoning': 4788, 'grasp': 2668, 'advance': 229, 'exposition': 2190, 'principle': 4579, 'accountant': 181, 'desert': 1626, '211': 72, 'osu': 4163, 'sarcasm': 5102, '31': 91, 'x1': 6636, 'afraid': 246, 'hey': 2847, 'bitter': 705, '0072316373': 3, '0070412901': 2, 'beac': 615, 'archaeology': 421, 'friday': 2490, 'knowledgeable': 3396, 'traveller': 6100, 'bank': 586, 'arab': 416, 'sabbath': 5070, 'quack': 4701, 'technologically': 5875, 'civilization': 1105, 'harp': 2769, 'hancock': 2742, 'buval': 892, 'arabia': 417, 'childress': 1062, 'ramble': 4742, 'fossil': 2457, 'hunter': 2963, 'port': 4481, 'arabian': 418, 'ark': 436, 'covenant': 1395, 'giant': 2591, 'megalith': 3769, 'kalahari': 3343, 'map': 3691, 'browse': 846, 'travelogue': 6101, 'teaser': 5869, 'archeological': 424, 'disappear': 1716, 'embarrassingly': 1992, 'chit': 1069, 'chat': 1032, 'superficial': 5746, 'scientific': 5164, 'levitation': 3494, 'lincoln': 3532, 'mpost': 3924, 'donald': 1817, 'perception': 4325, 'belong': 660, '19th': 58, 'hazlitt': 2792, 'tinkering': 6012, 'myopic': 3953, 'focusse': 2402, 'intervention': 3192, 'intentioned': 3168, 'champion': 1011, 'forgotten': 2439, 'governmet': 2646, 'working': 6598, 'articulated': 455, 'vote': 6429, 'erroneous': 2086, 'economy': 1941, 'capitalism': 927, 'merit': 3793, 'fallacy': 2245, 'significantly': 5338, 'ideal': 2988, 'ayn': 555, 'rand': 4746, 'graph': 2664, 'naked': 3960, 'undress': 6221, 'dismal': 1755, 'swing': 5809, 'conservative': 1296, 'rant': 4751, 'liberalism': 3498, 'layperson': 3455, 'relevant': 4867, 'crisis': 1431, 'politician': 4465, 'classroom': 1119, 'continued': 1322, 'destructive': 1638, 'supplementary': 5754, 'division': 1794, 'microeconomic': 3809, 'undergraduate': 6206, 'neglect': 4002, 'spillover': 5538, 'externalities': 2200, 'depletable': 1612, 'undepletable': 6204, 'failure': 2232, 'adverse': 234, 'hazard': 2790, 'principal': 4578, 'agent': 255, 'nations': 3976, '1776': 34, 'scam': 5129, 'deceptive': 1547, 'economics': 1938, 'simplify': 5351, 'definition': 1573, 'polemic': 4459, 'outdated': 4169, 'enormously': 2047, 'reflect': 4836, 'settle': 5251, 'presumably': 4563, 'choir': 1073, 'simplistically': 5353, 'calle': 905, 'keynesian': 3360, 'treatise': 6108, 'bastiat': 606, '101': 8, 'pe': 4298, 'libertarian': 3502, 'broken': 839, 'mental': 3786, 'manipulation': 3682, 'pro': 4591, 'damn': 1501, 'facts': 2225, 'outdate': 4168, 'apology': 397, 'fallacious': 2244, 'benefit': 667, 'unnoticed': 6265, 'partially': 4265, 'cave': 979, 'union': 6244, 'tantamount': 5841, 'coersion': 1173, 'indication': 3084, 'succinct': 5718, 'screed': 5176, 'piss': 4400, 'rabid': 4727, 'amazingly': 316, 'university': 6251, 'chicago': 1053, 'productive': 4604, 'profitable': 4614, 'endeavor': 2015, 'lunatic': 3626, 'crazy': 1417, 'dork': 1826, 'march': 3693, 'wto': 6634, 'elucidate': 1985, 'mise': 3845, 'subjects': 5701, 'fundamental': 2518, 'considere': 1298, 'rampant': 4745, 'congress': 1294, 'legislature': 3479, 'county': 1387, 'commission': 1218, 'council': 1380, 'overwhelming': 4206, 'majority': 3660, 'leader': 3460, 'earful': 1920, 'cite': 1103, '60': 118, 'valid': 6334, 'voting': 6430, 'bombard': 752, 'gullibility': 2722, 'pervasive': 4355, 'resultant': 4954, 'cancerous': 919, 'growth': 2703, 'meddling': 3759, 'increased': 3072, 'revolutionary': 4979, 'elect': 1970, 'directly': 1706, 'mises': 3847, 'institute': 3145, 'plague': 4412, 'foundation': 2461, 'fred': 2478, 'dope': 1823, 'austrian': 519, 'schiff': 5152, 'crashproof': 1411, 'followup': 2411, 'von': 6428, 'credit': 1424, 'rothbard': 5041, 'depression': 1617, 'reqde': 4924, 'dispute': 1764, 'nationwide': 3977, 'elementary': 1976, 'years': 6655, 'fellow': 2301, 'couldn': 1379, 'timagine': 6001, 'loss': 3595, 'senator': 5216, 'reps': 4921, 'heed': 2819, 'gobblygook': 2626, 'arcane': 420, 'fancy': 2255, 'lables': 3408, 'unintended': 6241, 'consequence': 1295, 'experiential': 2174, 'tremendous': 6111, 'debt': 1539, 'subsidize': 5706, 'taxis': 5860, 'expense': 2170, 'rhetoric': 4983, 'muddle': 3928, 'micro': 3808, 'macro': 3640, 'necessary': 3994, 'taxpayer': 5861, 'measure': 3753, 'aka': 275, 'enemy': 2023, 'narration': 3966, 'conversational': 1339, 'inflection': 3100, 'phrasing': 4374, 'tedious': 5879, 'written': 6630, 'restate': 4947, 'donate': 1818, 'unsatisfactory': 6278, 'remedy': 4885, 'perceive': 4323, 'heilbroner': 2823, 'worldly': 6603, 'philosopher': 4367, 'audible': 509, 'excessively': 2138, 'polemical': 4460, 'rigid': 4999, '250': 80, 'retitle': 4960, '1940': 42, '1950': 45, 'critique': 1437, 'capitalist': 928, 'doctrine': 1806, 'hypocritical': 2977, 'popular': 4478, 'agenda': 254, 'rarely': 4756, 'arguement': 430, 'omit': 4122, 'contrary': 1328, 'veiws': 6357, 'grain': 2652, 'salt': 5089, 'unpalatable': 6267, 'dose': 1828, 'taxation': 5859, 'reinforce': 4851, 'straightforward': 5652, 'blinkered': 724, 'regulation': 4848, 'west': 6518, 'health': 2802, 'programme': 4617, 'organise': 4147, 'tax': 5858, 'spuriously': 5561, 'dangerous': 1507, 'naomi': 3963, 'klein': 3387, 'unfettered': 6225, 'ideology': 2992, 'explode': 2183, 'liberal': 3497, 'myth': 3957, 'context': 1318, 'unchecked': 6196, 'corporate': 1365, 'liberate': 3500, 'humanity': 2955, 'apologist': 395, 'scrutinize': 5183, 'liberalized': 3499, 'trade': 6075, 'relinquish': 4877, 'astounding': 490, 'bearing': 622, 'blatantly': 715, 'hopelessly': 2916, 'preach': 4531, 'limbaugh': 3527, 'helper': 2828, 'avaiable': 530, 'limewire': 3529, 'net': 4006, 'et': 2099, 'relentlessly': 4865, 'hammer': 2741, 'amall': 311, 'fanatically': 2253, 'enslave': 2049, 'blind': 721, 'utopian': 6327, 'functioning': 2517, 'anytime': 382, 'beneficiary': 666, 'espouse': 2092, 'reactionary': 4769, 'rehash': 4849, 'greed': 2679, 'plunder': 4444, 'pillage': 4390, 'plutocracy': 4447, 'worship': 6608, 'tout': 6064, '1946': 43, 'fundamentally': 2519, 'economist': 1940, 'enlightening': 2045, 'global': 2615, 'sanity': 5099, 'intead': 3155, 'cumulatively': 1467, 'logically': 3576, 'casual': 968, 'silliness': 5342, 'spring': 5560, 'premise': 4551, 'blindness': 722, 'lifetime': 3517, 'libertarianism': 3503, 'jargon': 3270, 'oversimplify': 4200, 'informed': 3107, 'troubled': 6131, 'frighteningly': 2498, 'prophetic': 4641, 'surprisingly': 5778, 'mediatic': 3760, 'unseen': 6280, 'curb': 1470, 'enthusiasm': 2057, 'diatribe': 1667, 'ron': 5034, 'paul': 4292, 'groups': 2700, 'economincs': 1939, 'recession': 4805, 'disprove': 1763, 'patter': 4290, 'skim': 5394, 'faulty': 2277, 'apparently': 399, 'spew': 5535, 'promote': 4628, 'flawed': 2370, 'tariff': 5849, 'impose': 3041, 'sweater': 5800, 'duty': 1904, 'equation': 2075, 'analyze': 337, 'revenue': 4969, 'export': 2188, 'narrow': 3969, 'introductory': 3201, 'importantly': 3039, 'fundamentals': 2520, 'samuelson': 5094, 'mankiw': 3683, 'malkiel': 3669, 'recycle': 4821, 'obsolete': 4087, 'pepper': 4322, 'concerned': 1268, 'hiphop': 2868, 'slug': 5422, 'latelly': 3434, 'euro': 2103, 'bmw': 738, 'goldfrapp': 2629, 'secretly': 5200, 'zune': 6685, 'overtime': 4201, 'instrument': 3149, 'allison': 293, 'gripe': 2691, 'velvet': 6358, 'wonderous': 6585, 'electronica': 1974, 'airwave': 273, 'tired': 6018, 'hos': 2929, 'dumb': 1895, 'idiocy': 2994, 'barney': 595, 'wiggle': 6547, 'carlin': 948, 'narrate': 3965, 'hottie': 2937, 'thereit': 5948, 'fanaticdelcious': 2254, 'stellar': 5615, 'nancy': 3962, 'cassidy': 962, 'alto': 306, 'invite': 3212, 'musically': 3945, 'varied': 6343, 'musical': 3944, 'animated': 355, 'kiddie': 3369, 'andrap': 343, 'tmnt': 6023, 'punch': 4680, 'egbert': 1962, 'searching': 5191, 'photocopy': 4372, 'vegetation': 6354, 'okinawa': 4117, 'ryukyu': 5069, 'islands': 3238, 'torii': 6048, 'station': 5606, 'edith': 1946, 'frost': 2502, 'kinship': 3382, 'transport': 6094, 'repose': 4917, 'hauntingly': 2782, 'linger': 3537, 'misty': 3862, 'thanks': 5933, 'candid': 920, 'webtv': 6505, 'stan': 5580, 'definitly': 1574, 'mantra': 3687, 'ergonomic': 2081, 'tripod': 6127, 'bag': 570, 'circuit': 1101, 'erna': 2084, 'anderson': 341, 'sixteen': 5379, 'alfreda': 283, 'poverty': 4515, 'sweden': 5803, 'finaly': 2333, 'karl': 3345, 'journey': 3318, 'sorrow': 5489, 'iceberg': 2984, 'daniel': 1508, 'insomniac': 3133, 'melodic': 3774, 'deflaboxe': 1576, 'agressive': 261, 'honnest': 2905, 'bélanger': 897, 'hidden': 2850, 'depressive': 1618, 'reflexion': 4838, 'rver': 5067, 'mieux': 3818, 'wich': 6538, 'exclusive': 2147, 'lyrical': 3632, 'concieve': 1270, 'constant': 1305, 'analogy': 333, 'boxe': 794, 'listner': 3547, 'quebec': 4710, 'nineteen': 4025, 'folktale': 2407, '1925': 41, 'newbery': 4009, 'medal': 3758, 'contribution': 1332, 'oviously': 4208, 'minority': 3837, 'incomprehensible': 3067, 'anthology': 373, 'possession': 4493, 'christian': 1080, 'touchy': 6059, 'fudge': 2509, 'maple': 3692, 'torso': 6051, 'bustier': 884, 'corset': 1371, 'cincher': 1095, 'boning': 756, 'bridal': 823, 'corsette': 1372, 'chest': 1048, 'tire': 6017, 'wire': 6563, 'bra': 800, 'custom': 1487, 'cloth': 1154, 'squeem': 5567, 'shapewear': 5279, 'lining': 3539, 'absorb': 158, 'wetness': 6521, 'flatten': 2364, 'blouse': 731, 'tshirt': 6143, 'bone': 755, 'definaitely': 1566, 'dtop': 1884, 'clerk': 1127, 'butt': 886, 'eureka': 2102, 'hardly': 2767, 'xl': 6639, 'vest': 6370, 'firm': 2345, 'vintage': 6394, 'chincher': 1065, 'favour': 2280, 'pinup': 4396, 'expensive': 2171, 'annother': 362, 'clingy': 1141, 'slinky': 5414, 'clothe': 1155, 'ripple': 5004, 'irritation': 3230, 'training': 6084, 'specialists': 5521, 'ft': 2508, 'cinching': 1096, 'comfortably': 1205, 'dior': 1702, 'permenant': 4342, 'rubbery': 5051, 'layer': 3453, 'perspires': 4352, 'measurement': 3754, 'fasten': 2269, '00': 0, 'privilege': 4588, 'posture': 4503, 'slimmer': 5413, 'associate': 485, 'slim': 5412, 'waistline': 6445, 'petite': 4361, 'reduce': 4828, 'clothing': 1156, 'sweating': 5801, 'rash': 4758, 'goth': 2640, 'fetish': 2311, 'opt': 4139, 'bust': 882, 'def': 1559, 'sweaty': 5802, 'boob': 758, '36': 94, 'dd': 1528, 'ladie': 3414, '135lbs': 24, 'busty': 885, 'coursette': 1391, 'beige': 649, 'shaper': 5278, 'straps': 5659, 'everyday': 2112, 'belly': 659, 'ince': 3058, 'deffinitly': 1565, 'pregnancy': 4547, 'crease': 1418, 'sick': 5325, 'boutique': 791, 'literally': 3549, 'rib': 4984, 'commute': 1227, 'desk': 1631, 'pleasant': 4429, 'bruise': 848, 'flexible': 2376, 'reshape': 4933, 'xlarge': 6640, 'waaay': 6438, 'unbearable': 6193, 'redistribute': 4827, 'bulge': 865, 'proportioned': 4642, 'arthropod': 452, 'curvy': 1484, 'unnatural': 6260, 'advertize': 240, 'streak': 5661, 'inspect': 3135, 'spare': 5511, 'hourglass': 2939, 'grayish': 2674, 'fold': 2405, 'trimmer': 6122, 'camisole': 912, 'incentive': 3059, 'pesky': 4357, 'sized': 5382, 'shirt': 5300, 'juniors': 3332, 'lift': 3518, 'upset': 6310, 'clincher': 1140, 'sizer': 5383, 'miracle': 3840, 'girldle': 2601, 'squeeze': 5568, 'welt': 6517, 'prong': 4631, 'fraction': 2465, 'ahi': 265, '116': 19, 'lb': 3457, 'tummy': 6148, 'downfall': 1836, 'thou': 5969, 'incrediby': 3076, 'closed': 1148, 'exhausted': 2155, 'tank': 5839, 'irritate': 3227, 'sweat': 5799, 'tolerant': 6032, 'tightness': 5997, 'recommed': 4813, 'unwanted': 6300, 'inner': 3114, 'cotton': 1378, 'moisture': 3880, 'pure': 4688, 'strapless': 5658, 'cher': 1046, 'occasion': 4091, 'desperation': 1633, 'cinch': 1094, 'muffin': 3931, 'worse': 6607, '140lbs': 26, 'bruising': 849, 'birth': 701, 'specifically': 5525, 'sleeveless': 5408, 'bulky': 866, 'hideous': 2852, 'racerback': 4729, 'sizing': 5384, '3x': 97, '5x': 116, 'speed': 5529, 'mislabele': 3850, 'surgery': 5771, 'unwearable': 6302, 'hog': 2881, 'stomach': 5636, 'discreet': 1733, 'underneath': 6210, 'irritated': 3228, 'breathing': 819, 'squeele': 5566, 'bell': 657, 'tuck': 6146, 'exclaim': 2144, 'upright': 6308, 'snug': 5447, 'concentrate': 1265, 'parking': 4262, 'breast': 816, 'firmness': 2346, 'outwards': 4182, 'vertical': 6369, 'compressed': 1260, 'pregancy': 4546, 'believer': 655, 'pregnant': 4548, 'pinch': 4393, 'unhook': 6236, 'pooch': 4470, 'fajas': 2241, 'ur': 6313, 'postpartum': 4502, 'downside': 1841, 'portion': 4485, 'sew': 5257, 'lingerie': 3538, 'sock': 5456, 'core': 1358, 'poke': 4455, 'specially': 5522, 'bend': 664, 'pronounced': 4633, 'aroun': 442, 'smaller': 5424, 'buttt': 891, 'upper': 6307, 'eewww': 1955, 'defeat': 1560, 'increase': 3071, 'interior': 3180, 'repair': 4904, 'bent': 668, 'unreal': 6273, 'gym': 2727, 'abdominal': 143, 'nip': 4026, 'overlapsed': 4191, 'faint': 2233, 'dryer': 1883, 'charcoal': 1023, 'exterior': 2199, 'abdomen': 142, 'wrinkle': 6624, 'days': 1526, 'scared': 5135, 'triangle': 6114, 'visible': 6405, 'armpit': 439, '125': 22, 'pink': 4394, 'dint': 1701, 'therapeutically': 5946, 'spine': 5540, 'severe': 5255, 'injury': 3111, 'vanity': 6341, 'surpass': 5774, 'compensate': 1238, 'suprise': 5764, 'merchandise': 3791, 'fitting': 2353, 'bridesmaid': 825, 'stunk': 5693, 'washing': 6478, 'clammy': 1109, 'abandon': 139, 'torture': 6052, 'returned': 4963, 'asap': 464, 'comply': 1252, 'waisted': 6444, 'instant': 3142, 'makeover': 3662, 'ease': 1925, 'foresee': 2427, 'hmmm': 2876, 'loop': 3587, 'running': 5060, 'flabby': 2358, 'baggy': 572, 'wearing': 6500, 'conform': 1288, 'stretchy': 5669, 'panty': 4244, 'daytime': 1527, 'withoutfeele': 6569, '55': 114, 'seam': 5187, 'brooke': 841, 'burke': 874, 'sucks': 5721, 'compress': 1259, 'butthis': 889, 'garment': 2553, 'eventually': 2110, 'waist2': 6442, 'hurts3': 2968, 'sausage': 5117, 'visually': 6412, 'divide': 1791, 'sitas': 5372, 'ouch': 4164, 'easiness': 1927, 'daily': 1499, 'binder': 696, 'loveit': 3606, '40d': 100, 'clinch': 1139, 'hug': 2949, 'rod': 5020, 'painfully': 4231, 'ultimately': 6181, 'wound': 6615, 'grandmother': 2660, 'encourage': 2012, 'regain': 4842, 'trainer': 6083, 'itchy': 3246, 'trap': 6095, 'ow': 4209, 'diet': 1680, '150': 28, 'midsection': 3816, 'anticipate': 376, '28': 84, 'xs': 6643, 'unstitched': 6286, 'fray': 2476, 'unstitche': 6285, 'contacting': 1312, 'futility': 2531, 'near': 3989, 'accommodate': 175, 'girth': 2603, 'xxxxx': 6644, 'discomfort': 1729, 'fly': 2396, 'steam': 5610, 'diarrhea': 1665, 'badly': 568, 'stab': 5571, 'itit': 3249, 'cleanly': 1124, 'cardio': 941, 'permanantley': 4339, 'vthe': 6434, 'oppossed': 4137, 'spanx': 5510, 'exsactly': 2193, 'wider': 6542, 'jacket': 3255, 'squeeeeeze': 5564, 'partum': 4271, 'separated': 5230, 'reminder': 4889, 'abs': 151, 'strengthen': 5666, 'inward': 3215, 'seamless': 5188, 'tanktop': 5840, 'occur': 4097, 'ardyss': 426, 'wearer': 6499, 'slight': 5410, 'hugging': 2951, 'advert': 236, '2h': 86, 'sender': 5218, 'stamp': 5579, 'torne': 6050, 'itch': 3245, 'swell': 5807, 'bleeding': 719, 'disease': 1740, '31my': 92, 'smallit': 5425, 'clasp': 1114, 'terribly': 5906, 'gross': 2697, 'wrapping': 6618, 'wal': 6448, 'mart': 3710, 'disgusting': 1747, 'surgical': 5772, '44': 103, 'girdle': 2599, 'eaight': 1918, 'rear': 4784, 'latch': 3432, 'pudge': 4673, 'stuck': 5682, 'jan': 3264, '2012': 67, '107': 13, 'birmingham': 700, 'janes': 3266, 'aircraft': 272, '1996': 55, 'marie': 3697, 'antoinette': 379, 'doll': 1814, 'covet': 1398, 'necklace': 3996, 'owe': 4210, 'crate': 1412, 'barrell': 597, 'seemingly': 5207, 'silverware': 5344, 'permanent': 4340, 'diawasher': 1668, 'microwave': 3811, 'noticable': 4051, 'china': 1064, 'porcelain': 4479, 'german': 2585, 'pinscher': 4395, 'resent': 4931, 'amost': 328, 'upsetting': 6311, 'dragoon': 1850, 'logan': 3574, 'jimmy': 3301, 'ponder': 4467, 'horn': 2919, 'crosse': 1444, 'bacteria': 566, 'jiggel': 3298, 'shakeing': 5272, 'haloween': 2740, 'leftovers': 3472, 'memorize': 3782, 'lines': 3536, 'casting': 966, 'apears': 392, 'cinimatography': 1099, 'someones': 5474, 'cam': 906, 'anoyingly': 368, 'unwatchable': 6301, 'hokie': 2883, 'cheese': 1041, 'meh': 3770, 'demonic': 1602, 'interestingly': 3178, 'saying': 5127, 'scholarly': 5156, 'impossibility': 3042, 'mold': 3882, 'islam': 3234, 'islamic': 3235, 'arise': 434, 'adherent': 216, 'dictate': 1671, 'unavoidably': 6191, 'islamists': 3236, 'distinct': 1781, 'banking': 587, 'prof': 4605, 'kuran': 3402, 'bravely': 810, 'prosperity': 4646, 'masse': 3718, 'formal': 2441, 'surley': 5773, 'apporpriate': 410, 'intersted': 3191, 'gold': 2628, 'gorilla': 2636, 'hurrah': 2965, 'handicap': 2746, 'cheerleader': 1039, 'pray': 4528, 'overcome': 4185, 'vinyl': 6395, 'strut': 5681, 'rhapsody': 4982, 'caan': 898, 'cti': 1459, 'higgins': 2854, 'transparent': 6093, 'ghosts': 2590, 'laurel': 3444, 'danger': 1506, 'publisher': 4670, 'absence': 152, 'dispose': 1762, 'disastrous': 1723, 'senses': 5223, 'words': 6593, 'carswell': 956, 'silencer': 5339, 'sean': 5189, 'dillion': 1691, 'license': 3508, 'justice': 3335, 'pickup': 4383, 'terrorism': 5911, 'competition': 1241, 'egoism': 1964, 'fulfillment': 2512, 'british': 835, 'maniacal': 3678, 'threaten': 5975, 'masculinity': 3713, 'legal': 3475, 'entertainment': 2056, 'ridicule': 4994, 'slog': 5417, 'motions': 3915, 'dillon': 1692, 'holster': 2895, 'bushmills': 879, 'overexplain': 4189, 'dozen': 1843, 'prod': 4599, 'para': 4249, 'reward': 4980, 'minimalism': 3831, 'expose': 2189, 'reliance': 4871, 'arid': 432, 'novels': 4058, 'capitalize': 929, 'marlboro': 3705, 'smoke': 5434, 'champagne': 1010, 'drink': 1870, 'midst': 3817, 'adventures': 233, 'condense': 1278, 'developed': 1650, 'proabably': 4592, 'ludlum': 3625, 'theyear': 5949, 'higgin': 2853, 'generous': 2575, '5th': 115, 'foretell': 2430, 'sequel': 5235, 'autopilot': 526, 'guilty': 2718, 'status': 5607, 'oops': 4125, 'ths': 5985, 'aspire': 474, 'strangely': 5655, 'waltz': 6458, 'editor': 1949, 'potboiler': 4505, 'sophomoric': 5486, 'harbor': 2764, 'dialog': 1661, 'unbelievably': 6195, 'kate': 3347, 'hazar': 2789, 'scout': 5171, 'puhleeze': 4676, 'exclamation': 2145, 'reaction': 4768, 'clancy': 1110, 'rough': 5044, 'reall': 4781, 'bilionaires': 691, 'interestng': 3179, 'compaint': 1228, 'curiosity': 1472, 'tension': 5898, 'executive': 2152, 'cliche': 1131, 'obsessive': 4086, 'colorless': 1190, 'guard': 2708, 'assassin': 476, 'whack': 6522, 'target': 5848, 'shoot': 5307, 'timing': 6010, 'justification': 3336, 'challenging': 1009, 'wrestle': 6621, 'unrewarding': 6277, 'fyi': 2535, 'silent': 5340, 'jefferson': 3282, 'parker': 4261, 'fascination': 2265, 'dining': 1699, 'glamorous': 2607, 'savage': 5118, 'dirk': 1708, 'pitt': 4404, 'piker': 4387, 'extension': 2196, 'gentlemanly': 2581, 'crazed': 1416, 'savoy': 5121, 'strain': 5653, 'credulity': 1425, 'spy': 5562, 'fulfill': 2511, 'contract': 1323, 'samey': 5091, 'ferguson': 2306, 'yarn': 6650, 'board': 739, 'invincibility': 3211, 'sickening': 5326, 'libary': 3496, 'retire': 4958, 'cartoon': 957, 'scottish': 5170, 'heritage': 2837, 'aged': 252, 'pose': 4488, 'bomber': 753, 'aviator': 537, 'dust': 1903, 'contradictory': 1326, 'visualization': 6411, 'does': 1809, 'harry': 2772, 'condensation': 1277, 'screenplay': 5178, 'syllable': 5813, 'twaddle': 6157, 'shred': 5318, 'puerile': 4675, 'uninspired': 6240, 'deadline': 1531, 'critical': 1435, 'eagle': 1917, 'rashid': 4759, 'dillin': 1690, 'sudden': 5723, 'billion': 693, 'overkill': 4190, 'lives': 3557, 'dora': 1824, 'explorer': 2187, 'scenes': 5148, 'reliablity': 4869, 'saga': 5081, 'bernstein': 672, 'precede': 4534, 'disenchanted': 1742, 'terse': 5914, 'nerve': 4004, 'childishly': 1060, 'redeeming': 4824, 'irish': 3222, 'gambit': 2542, 'endeavour': 2016, 'retired': 4959, 'abridge': 148, 'singleminde': 5365, 'behaviour': 647, 'hack': 2731, 'macnee': 3636, 'tranparent': 6086, 'quaff': 4702, 'enormous': 2046, 'whisky': 6530, 'stinker': 5631, 'schwartz': 5161, 'los': 3592, 'angeles': 347, 'mob': 3871, 'times': 6009, 'energizer': 2026, 'rabbit': 4726, 'muzik': 3951, 'lounge': 3601, 'ballroom': 579, 'vegas': 6353, 'ellington': 1983, 'survey': 5783, 'duke': 1892, 'ivie': 3252, 'destine': 1635, 'greatness': 2677, 'cigarette': 1093, 'hoot': 2912, 'herb': 2833, 'herbal': 2834, 'magick': 3648, 'beyerl': 682, 'magickal': 3649, 'nedde': 3997, 'herbalism': 2835, 'bibliography': 687, 'precise': 4537, 'association': 486, 'explaint': 2179, 'herbs': 2836, 'wonderfull': 6582, 'occult': 4094, 'medicinal': 3762, 'plant': 4418, 'legged': 3478, 'beagle': 618, 'daugher': 1519, 'rottie': 5043, 'personalize': 4349, 'clipping': 1144, 'marmaduke': 3706, 'mutt': 3950, 'obdii': 4078, 'data': 1516, 'stream': 5662, 'scanner': 5133, 'vista': 6409, 'ez': 2213, 'scan': 5130, 'microsoft': 3810, '2014': 68, 'firmware': 2347, '2007': 64, '2008': 65, 'jane': 3265, 'hypatia': 2974, 'hypothesis': 2978, 'heliocentric': 2824, 'universe': 6250, 'copernicus': 1353, 'scarlett': 5139, 'ridiculously': 4996, 'nook': 4037, 'scarlet': 5138, 'thorn': 5966, 'hawthorne': 2787, 'undeniable': 6203, 'alcoholism': 280, 'ambiguity': 321, 'atlas': 493, 'shrug': 5320, 'anthem': 372, 'aldous': 281, 'huxley': 2971, 'vacation': 6329, 'unconventional': 6200, 'captivate': 933, 'intricacy': 3196, 'puritan': 4692, 'morality': 3903, '11th': 20, 'unspeakable': 6283, 'punish': 4681, 'confess': 1282, 'physical': 4376, 'wrath': 6619, 'vicious': 6380, 'reccomend': 4795, 'auto': 524, 'dolby': 1813, 'surroundsound': 5782, 'mundane': 3937, 'allusion': 298, 'infamous': 3097, 'hester': 2845, 'prynne': 4662, 'literaraly': 3550, 'savvy': 5122, 'symbolize': 5817, 'morbid': 3904, 'unnecesary': 6261, 'convey': 1341, 'overrated': 4198, 'bloated': 725, 'indulgent': 3091, 'blabber': 708, 'melodramatic': 3775, 'dickens': 1670, 'ostracize': 4162, 'prat': 4524, 'unforgive': 6229, 'confuse': 1289, 'dialouge': 1663, 'provkitive': 4657, 'throughly': 5982, 'fahrenheit': 2230, '451': 105, 'huckleberry': 2948, 'finn': 2342, 'footnote': 2419, 'coral': 1356, 'nonetheless': 4033, 'grante': 2663, 'simile': 5347, 'deem': 1555, 'neat': 3992, 'gripping': 2692, 'digest': 1687, 'essence': 2094, 'candy': 921, 'clutch': 1164, 'breathtaking': 820, 'navigate': 3985, 'ebook': 1934, 'nathaniel': 3974, 'analyse': 334, 'strength': 5665, 'weakness': 6495, 'prynn': 4661, 'vengeful': 6360, 'hapless': 2757, 'minister': 3834, 'vivacious': 6416, 'elf': 1981, 'cope': 1352, 'puritanical': 4693, 'psychological': 4665, 'ponderous': 4468, 'divorce': 1795, 'adultery': 227, 'punishment': 4683, 'dover': 1835, 'reprint': 4919, 'beginer': 642, 'eon': 2070, 'forwaed': 2454, 'irritating': 3229, 'prolix': 4621, 'confirm': 1286, 'hanry': 2755, 'theraou': 5945, 'walden': 6449, 'tre': 6104, 'pretentious': 4565, 'era': 2079, 'mildly': 3820, 'applaud': 406, 'tenacity': 5893, 'thy': 5990, 'f2f': 2214, 'hertzberg': 2842, 'toughen': 6061, 'bc': 614, 'cobb': 1170, 'cc': 982, 'mcg': 3743, 'pls': 4440, 'highschool': 2860, 'assignment': 483, 'dread': 1860, 'ilegitamate': 3005, 'hillariously': 2864, 'melancholy': 3773, 'comprehension': 1257, 'commas': 1213, '240': 78, 'aunt': 515, 'verbose': 6362, 'honor': 2906, 'wordy': 6594, 'reverand': 4970, 'dimmesdale': 1696, 'dirt': 1709, 'thisto': 5963, 'interestingelements': 3177, 'nosuspense': 4048, 'sympathize': 5819, 'empathize': 2001, 'evenless': 2108, 'samestory': 5090, 'labelledthis': 3407, 'tragedy': 6078, 'thatit': 5935, 'tragically': 6080, 'unnecessarily': 6262, 'flown': 2393, 'effectively': 1958, 'reccommend': 4796, 'dificult': 1685, 'boaring': 740, 'mainly': 3655, 'historically': 2872, 'settler': 5252, 'pill': 4389, 'exceed': 2129, 'timely': 6007, 'pertinent': 4354, 'extinct': 2202, 'reviving': 4976, 'cliff': 1135, 'ostracism': 4161, 'robot': 5015, 'encompass': 2010, 'unpleasant': 6269, 'steinbeck': 5614, 'twain': 6158, 'austen': 516, 'chillingworth': 1063, 'stray': 5660, 'beg': 640, 'completey': 1248, 'cease': 985, 'taudry': 5856, 'unfamiliar': 6224, 'eloquent': 1984, 'religiosity': 4875, 'trajectory': 6085, 'puritans': 4695, 'cryptic': 1456, 'appendix': 405, 'norton': 4045, 'definatly': 1569, 'antiquated': 377, 'insomniacs': 3134, 'zzzzzzzzzzzz': 6686, 'stongly': 5638, 'dislike': 1754, 'meaninglessness': 3750, 'webster': 6504, 'wander': 6459, 'extravagant': 2207, 'flamboyant': 2360, 'summarily': 5736, 'prejudice': 4549, 'void': 6423, 'hallelujah': 2738, 'scandalous': 5132, 'piousness': 4398, 'celebrated': 988, 'receptive': 4804, 'gable': 2536, 'forcefed': 2423, 'customs': 1489, 'filler': 2325, 'lest': 3487, 'cynical': 1494, 'vice': 6379, 'versa': 6365, 'idiom': 2995, 'transcription': 6088, 'textually': 5924, 'freebie': 2480, 'awhile': 552, 'specify': 5526, 'assign': 481, 'conceal': 1264, 'cliched': 1132, 'olde': 4119, 'marking': 3704, 'lavish': 3446, 'diction': 1672, 'syntax': 5821, 'percieving': 4326, 'masterful': 3721, 'literary': 3551, 'revival': 4975, '1800': 37, 'fore': 2424, 'discriptive': 1736, 'unlying': 6258, 'rule': 5057, 'gotton': 2642, 'attenion': 499, 'lobster': 3566, 'delecacy': 1584, 'shell': 5290, 'meat': 3755, 'dissect': 1775, 'concusion': 1275, 'wording': 6592, 'formatting': 2443, '18': 36, '604': 119, 'bookmaark': 761, 'requirement': 4926, 'inerpret': 3094, 'cheat': 1035, 'preacher': 4532, 'testicle': 5918, 'calendar': 903, 'column': 1191, 'taboo': 5826, 'exploit': 2184, 'unthinkable': 6295, 'spirit': 5543, 'delighted': 1588, 'mas': 3712, 'paperback': 4247, 'outlive': 4176, 'engrossing': 2038, 'flawless': 2371, 'overload': 4192, 'superfluous': 5747, 'sift': 5330, '17th': 35, 'enforce': 2028, 'imprison': 3051, 'scaffold': 5128, 'irony': 3224, 'abrupt': 149, 'illiteracy': 3008, 'illiterate': 3009, 'negaive': 4000, 'readers': 4773, 'cumbersome': 1466, 'exceptional': 2133, 'developmet': 1653, 'voyeur': 6432, 'yawn': 6652, 'childish': 1059, 'classify': 1118, 'existance': 2158, 'dribble': 1868, 'contrasting': 1330, 'shadow': 5268, 'purity': 4696, 'belabor': 650, 'headache': 2795, 'jeff': 3281, 'scorn': 5168, 'preachy': 4533, 'puritanism': 4694, 'ap': 385, 'summarize': 5737, 'modernize': 3878, 'dost': 1830, 'armour': 438, 'sl': 5402, 'reclassify': 4809, 'nullify': 4067, 'avg': 535, 'underline': 6208, 'hint': 2866, 'quote': 4724, 'thematic': 5940, 'opposition': 4136, 'rebellious': 4791, 'alienation': 287, 'furnace': 2526, 'isbn': 3231, 'appearence': 404, '500': 110, 'significant': 5337, 'frustrate': 2504, 'inability': 3054, 'revelation': 4966, 'reproduce': 4920, 'maury': 3735, 'hawthorn': 2786, 'fist': 2351, 'comi': 1208, 'court': 1392, 'lantern': 3423, 'crimson': 1430, 'crusader': 1453, 'unlikely': 6256, 'radioactive': 4737, 'spider': 5537, 'existed': 2159, 'pow': 4516, 'zort': 6684, 'roger': 5022, 'physiognomy': 4379, 'exceptionally': 2134, 'guessand': 2712, 'surprising': 5777, 'strucutre': 5679, 'uninteresting': 6243, 'puritain': 4691, 'foreshadowing': 2428, 'foil': 2403, 'figuare': 2321, 'sucker': 5720, 'forth': 2447, 'outsider': 4180, 'townspeople': 6067, 'hestor': 2846, 'ren': 4896, 'discription': 1735, 'tribulation': 6115, 'parenthood': 4258, 'forced': 2422, 'curriculum': 1479, 'sophomore': 5485, 'turgid': 6152, 'laborious': 3410, 'degree': 1581, 'widow': 6543, 'grief': 2684, 'bogge': 744, 'hurl': 2964, 'agitation': 257, 'disgust': 1745, 'auido': 514, 'exam': 2124, 'consitutional': 1303, 'slop': 5418, 'freakin': 2477, 'insanity': 3123, 'fest': 2309, 'stake': 5576, '1850': 38, 'hs': 2944, 'homicide': 2900, 'assigning': 482, 'overrate': 4197, 'regard': 4843, 'torturous': 6053, 'ofthe': 4112, 'vastly': 6347, 'reject': 4853, 'permissive': 4343, 'accustom': 186, 'entrench': 2063, 'gloom': 2616, 'exposure': 2191, 'versus': 6368, 'hypocrisy': 2975, 'passions': 4278, 'refreshing': 4839, 'passionately': 4277, 'skillfully': 5393, 'construct': 1307, 'sinner': 5369, 'noble': 4029, 'punisher': 4682, 'applicable': 407, 'clinton': 1143, 'peel': 4306, 'rose': 5037, 'nurture': 4074, 'bared': 592, 'examined': 2126, 'forthright': 2448, 'lambrynth': 3417, 'neart': 3991, 'redemption': 4825, 'confession': 1283, 'comforting': 1206, 'teenager': 5882, 'palatable': 4234, 'font': 2412, 'worddi': 6591, 'obey': 4080, 'socety': 5451, 'mistakes': 3861, 'overwritten': 4207, 'sappy': 5101, 'imitate': 3024, 'victorian': 6382, 'underscore': 6213, 'sue': 5726, 'nathanial': 3973, 'assertion': 480, 'concrete': 1274, 'malaise': 3666, 'erika': 2083, 'vause': 6349, '10th': 15, 'magnificence': 3650, 'wrenching': 6620, 'ignorance': 2999, 'disscusse': 1774, 'smokescreen': 5435, 'meaty': 3756, 'plod': 4436, 'unravle': 6272, 'okay': 4116, 'hawthornes': 2788, 'rev': 4964, 'dimmsdale': 1697, 'voluntarily': 6426, 'reconsider': 4817, 'narcotic': 3964, 'sux': 5795, 'happenes': 2760, 'quiz': 4722, 'slowly': 5421, 'lightly': 3521, 'filth': 2328, 'lock': 3571, 'attic': 501, 'mentally': 3787, 'nora': 4039, 'roberts': 5014, 'sandra': 5098, 'paralyzing': 4253, 'archaicly': 423, 'peice': 4310, 'kindling': 3379, 'tempt': 5892, 'gouge': 2643, 'sacred': 5073, 'cliffnote': 1136, 'therapy': 5947, 'torment': 6049, 'reconciliation': 4816, 'prologue': 4622, 'abomination': 147, 'spawn': 5516, 'cess': 1004, 'twisted': 6167, 'untolerably': 6297, 'cure': 1471, 'insomnia': 3132, 'entangling': 2051, 'ignominiously': 2998, 'bloody': 730, 'ah': 263, 'junior': 3331, 'origanal': 4151, 'dang': 1505, 'gatsby': 2559, 'surprisngly': 5779, 'wast': 6481, 'flattering': 2365, 'patient': 4286, 'rank': 4750, 'eighteeth': 1965, 'centuryher': 996, 'commte': 1221, 'themarriage': 5939, 'bossom': 782, 'tipical': 6016, 'looooooooooooooooong': 3586, 'readily': 4774, 'oct': 4098, 'october': 4099, 'luck': 3620, 'hat': 2778, 'setences': 5247, 'brillaint': 829, 'salinger': 5087, 'twian': 6161, 'meaningful': 3748, 'ashamed': 466, 'witchunt': 6568, 'stare': 5587, 'peculiarly': 4302, 'harsh': 2773, 'lap': 3425, 'stress': 5667, 'gamble': 2543, 'anxious': 380, 'plotline': 4438, 'commend': 1214, 'annotate': 361, 'archaic': 422, 'convoluted': 1344, 'detach': 1640, 'orwell': 4157, 'fitzgerald': 2354, 'bradbury': 802, 'hemmingway': 2831, 'barrier': 598, 'loathe': 3565, 'efficiently': 1959, 'priceless': 4571, 'intuition': 3203, 'attack': 497, 'dare': 1509, 'portait': 4483, 'portayal': 4484, 'superstition': 5753, 'institution': 3146, 'oppression': 4138, 'comparable': 1230, 'superior': 5749, 'arthur': 453, 'crucible': 1448, 'bibliophile': 688, 'huck': 2947, 'bargian': 594, 've': 6351, 'assist': 484, 'uniteresting': 6248, 'notes': 4050, 'aid': 267, 'demi': 1600, 'moore': 3900, 'monte': 3893, 'cristo': 1432, 'backward': 564, 'abbreviate': 140, 'deification': 1583, 'fixation': 2356, 'bodily': 741, 'overdramatization': 4187, 'simplification': 5350, 'predictability': 4540, 'annoyance': 364, 'tolstoy': 6034, 'anna': 359, 'karenina': 3344, 'grave': 2672, 'aghhhhhhhhhhhhhhhhhhh': 256, 'lterature': 3616, 'poetic': 4449, 'hawrthorne': 2785, 'cram': 1406, 'extend': 2194, 'distribute': 1787, 'determination': 1643, 'adversity': 235, 'renew': 4901, 'pride': 4573, 'dignity': 1689, 'gt': 2706, 'reverend': 4971, 'whatesoever': 6524, 'relevancy': 4866, 'antagonize': 370, 'demoralize': 1604, 'snooze': 5445, 'favor': 2278, 'intellectual': 3157, 'melville': 3779, 'cowardice': 1399, 'undone': 6220, 'unfinishable': 6226, 'ruffle': 5054, 'feather': 2287, 'dreary': 1866, 'dreadful': 1861, 'aim': 269, 'miserably': 3846, 'understood': 6218, 'numerous': 4073, 'composition': 1255, 'frantically': 2474, 'wedlock': 6507, 'adulterer': 226, 'determine': 1644, 'motive': 3917, 'bush': 878, 'prison': 4585, 'jealous': 3278, 'plumeria': 4442, 'hawaii': 2783, 'plumerias': 4443, 'abundant': 161, 'leis': 3481, 'lotion': 3598, 'consistency': 1301, 'based': 601, 'seattle': 5196, 'uneffected': 6223, 'rain': 4739, 'proof': 4635, 'arnold': 441, 'senior': 5220, 'dual': 1885, 'quixote': 4721, 'meaning': 3747, 'immediatly': 3029, 'tomato': 6037, 'anthony': 374, 'starke': 5590, 'lt': 3615, 'wilbur': 6548, 'finletter': 2341, 'triolgy': 6123, 'fricken': 2489, 'campy': 916, 'cheezy': 1043, 'clooney': 1146, 'mop': 3901, 'monsturds': 3892, 'deception': 1546, 'disciple': 1727, 'listing': 3546, 'substandard': 5708, 'dishs': 1750, 'sooooo': 5482, 'sassy': 5106, 'warming': 6470, 'babys': 559, 'grandmas': 2659, 'suction': 5722, 'cleaning': 1123, 'feeding': 2296, 'fridge': 2491, 'freindly': 2485, 'dishwith': 1751, 'tap': 5843, 'tray': 6103, 'compartment': 1233, 'abutter': 165, 'knife': 3389, 'problems': 4596, 'perhap': 4336, 'harried': 2770, 'bpa': 799, 'disassemble': 1721, 'minutes': 3839, 'vs': 6433, 'divider': 1792, 'everyman': 2113, 'rackham': 4733, 'vietnam': 6386, 'cambodia': 907, 'angkor': 350, 'phnomh': 4369, 'penh': 4315, 'siem': 5329, 'reap': 4783, 'pol': 4457, 'se': 5184, 'asia': 468, 'laos': 3424, 'drastically': 1855, 'ethnographic': 2101, 'august': 513, 'australian': 518, 'nong': 4034, 'khai': 3362, 'thailand': 5929, 'availability': 531, 'bus': 877, 'mixed': 3867, 'sloppy': 5419, 'functional': 2516, 'generator': 2574, 'miswire': 3864, 'internally': 3184, 'yep': 6658, 'meter': 3801, 'finagle': 2329, 'probe': 4594, 'externally': 2201, 'atrocious': 495, 'didnt': 1675, 'iteam': 3247, 'daring': 1510, 'corperate': 1364, 'hectic': 2818, 'kiss': 3384, 'jerry': 3290, 'lewis': 3495, 'officially': 4111, 'brained': 805, 'scattological': 5146, 'nauseating': 3983, 'adamant': 202, 'holliday': 2890, 'depressing': 1616, 'sarcastic': 5103, 'depressed': 1615, 'disorder': 1758, 'holiday': 2887, 'immorality': 3031, 'exceptions': 2135, 'carol': 950, 'grinch': 2687, 'potty': 4511, 'beavis': 630, 'butthead': 888, 'meanness': 3752, 'cancel': 917, 'glimmer': 2612, 'eclipse': 1935, 'nastiness': 3971, 'dissapointment': 1771, 'receipt': 4797, 'rude': 5053, 'postive': 4501, 'funniest': 2524, 'farth': 2262, 'reindeer': 4850, 'pooping': 4472, 'poop': 4471, 'grating': 2671, 'embarrassing': 1991, 'deed': 1554, 'chanukah': 1015, 'jew': 3293, 'celebrate': 987, 'whitey': 6533, 'repentence': 4909, 'hairdresser': 2734, 'entertaing': 2054, 'discount': 1730, 'rack': 4732, 'pretend': 4564, 'perverted': 4356, 'innuendo': 3118, 'restriction': 4951, 'fortitude': 2449, 'bomb': 751, 'thrid': 5977, 'flop': 2387, 'followe': 2409, 'toiler': 6029, 'carrer': 953, 'simpson': 5355, 'fould': 2460, 'snl': 5442, 'vomit': 6427, 'hanukkah': 2756, 'christams': 1079, 'warne': 6472, 'spolier': 5551, 'davey': 1522, 'drunk': 1881, 'migiet': 3819, 'whitney': 6535, 'woody': 6588, 'allen': 291, 'crack': 1402, 'fernal': 2307, 'elenaor': 1977, 'deer': 1558, 'supposted': 5763, 'santa': 5100, 'claus': 1120, 'cheery': 1040, 'spirited': 5544, 'tsicle': 6144, 'lick': 3509, 'tooth': 6044, 'yup': 6674, 'extent': 2198, 'evericould': 2111, 'startit': 5596, 'distort': 1783, 'youngster': 6667, 'heads': 2799, 'crying': 1455, 'triple': 6126, 'breasted': 817, 'shiznit': 5302, 'referee': 4834, 'elenore': 1978, 'poopsicle': 4473, 'children': 1061, 'billy': 694, 'madison': 3643, 'understatement': 6217, 'remotely': 4894, 'hairy': 2735, 'jock': 3307, 'licking': 3510, 'fece': 2291, 'diehard': 1679, 'animate': 354, 'hannukah': 2754, 'downright': 1840, 'humbug': 2956, 'ishtar': 3232, 'insist': 3131, 'protest': 4652, 'stargate': 5588, 'bias': 684, 'cloying': 1158, 'schmaltz': 5154, 'sinks': 5368, 'signal': 5334, 'cr': 1400, 'flatulence': 2366, 'gas': 2557, 'parode': 4264, 'mall': 3670, 'ploy': 4439, 'scrooge': 5182, 'sandlers': 5097, 'singular': 5366, 'terribe': 5904, 'proto': 4653, 'resemblance': 4930, 'ecm': 1936, 'corea': 1359, 'farrell': 2260, 'pulsing': 4678, 'flute': 2395, 'sax': 5123, 'storming': 5647, 'soprano': 5487, 'elvin': 1986, 'jones': 3315, 'quartet': 4706, 'buster': 883, 'william': 6553, 'mclaughlin': 3744, 'dave': 1521, 'holland': 2889, 'unparreled': 6268, 'outback': 4165, 'disco': 1728, 'shut': 5321, 'duck': 1888, 'impressionable': 3047, 'percy': 4327, 'dragonout': 1849, 'augest06storyteller': 512, 'carlinstories': 949, 'stops': 5642, 'doglass': 1812, 'deputation': 1622, 'scarf': 5136, 'dieasel': 1677, 'edward': 1952, 'expliotsdvd': 2182, 'features': 2289, 'charater': 1022, 'galleryand': 2540, 'morealso': 3905, 'dvdsthomas': 1910, 'dvdbetter': 1908, 'dvdon': 1909, 'dvdandcome': 1907, 'rail': 4738, 'dvdand': 1906, 'forgetthomas': 2435, 'dvdto': 1911, 'countinued': 1384, 'thumbs': 5988, 'ho': 2878, 'arrr': 448, 'matey': 3731, 'yer': 6659, 'sail': 5082, 'flag': 2359, 'swashbuckling': 5796, 'illegible': 3007, 'fax': 2282, 'reasonable': 4786, 'elvis': 1987, 'deborah': 1538, 'walley': 6454, 'drummer': 1880, 'deusenberg': 1647, 'closing': 1151, 'walking': 6451, 'gearstick': 2564, 'craptastic': 1409, 'les': 3484, 'crop': 1442, 'shove': 5315, 'chikadees': 1056, 'spinout': 5541, 'king': 3381, 'aisle': 274, 'shelley': 5291, 'fabares': 2215, 'veteran': 6372, 'height': 2822, 'stardom': 5586, 'cecil': 986, 'kellaway': 3352, 'una': 6184, 'merkel': 3794, 'carl': 947, 'betz': 680, 'donna': 1819, 'reed': 4831, 'swinging': 5810, 'presley': 4560, '1966': 47, 'nadir': 3958, 'cinema': 1097, 'sitcom': 5373, 'chase': 1031, 'spoiled': 5549, 'brat': 808, 'erotic': 2085, 'listenable': 3544, 'sleepwalk': 5407, 'overweight': 4204, 'outing': 4173, 'motion': 3914, 'racing': 4731, 'contraption': 1327, 'impetus': 3034, 'handspre': 2749, 'visor': 6408, 'stylus': 5697, 'properly': 4639, 'headpiece': 2798, 'enclose': 2009, 'nitrite': 4028, 'expire': 2177, 'undamaged': 6202, 'assembly': 479, 'furniture': 2527, 'hutch': 2970, 'ebay': 1933, 'separately': 5231, 'particle': 4266, 'bookcase': 760, 'shade': 5267, 'beeswax': 638, 'drawer': 1857, 'fortune': 2453, 'kidkraft': 3370, 'kidcraft': 3368, 'avalon': 533, 'outgrow': 4171, 'kraft': 3400, 'knowing': 3393, 'cubby': 1461, 'ding': 1698, 'hardware': 2768, 'baggie': 571, 'knob': 3390, 'supersaver': 5752, 'expedite': 2169, 'kg': 3361, 'er': 2078, 'crafts': 1405, 'rambunctious': 4744, '5yr': 117, 'yrs': 6672, 'catered': 975, '8yrs': 129, 'popeye': 4477, 'palooka': 4239, 'wot': 6614, 'devotee': 1657, 'sceptical': 5150, 'brisk': 834, 'conducting': 1280, 'sung': 5742, 'alw': 308, 'shard': 5280, 'unicorn': 6237, 'peugeot': 4362, 'firth': 2348, 'bounds': 789, 'grind': 2688, 'crank': 1407, 'grinder': 2689, 'operate': 4131, 'wrist': 6625, 'exerciser': 2154, 'multilevel': 3935, 'sketchy': 5391, 'shoddy': 5305, 'empirically': 2005, 'jensen': 3285, 'doctorate': 1805, 'citation': 1102, 'pitch': 4402, 'colleague': 1178, 'certify': 1003, 'unproven': 6271, 'gathering': 2558, 'accessible': 171, 'deadwood': 1532, 'veritable': 6363, 'cottage': 1377, 'industry': 3092, 'replete': 4913, 'workshop': 6601, 'seminar': 5215, 'certification': 1002, 'scientifically': 5165, 'starting': 5595, 'influential': 3102, 'accountability': 180, 'fortunate': 2451, 'clueless': 1161, 'obstacle': 4088, 'survivalist': 5784, '21st': 73, 'phd': 4364, 'document': 1807, 'standpoint': 5583, 'fuzzy': 2533, 'speculation': 5528, 'queasy': 4709, 'given': 2604, 'activation': 195, 'precent': 4535, '39': 95, 'percent': 4324, 'paradigm': 4251, 'correspond': 1368, 'unacceptable': 6186, 'recapture': 4794, 'unfortunate': 6230, 'neurologist': 4007, 'fluff': 2394, 'breathe': 818, 'nostril': 4047, 'tissue': 6019, 'engorge': 2037, 'affect': 245, 'hemisphere': 2830, 'p25': 4215, 'p20': 4214}

TF-IDF

The goal of using tf-idf is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.

tf-idf(d, t) = tf(t) * idf(d, t)
  • tf(t)= the term frequency is the number of times the term appears in the document
  • idf(d, t) = the document frequency is the number of documents ‘d’ that contain term ‘t’
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(smooth_idf=False, sublinear_tf=False, norm=None, analyzer='word')
txt_fitted = tf.fit([" ".join(r) for r in reviews])
txt_transformed = txt_fitted.transform([" ".join(r) for r in reviews])
# print ("The text: ", txt1)
tf.vocabulary_
{'stun': 5796,
 'non': 4106,
 'gamer': 2589,
 'sound': 5601,
 'track': 6193,
 'beautiful': 639,
 'paint': 4320,
 'senery': 5313,
 'mind': 3894,
 'recomend': 4904,
 'people': 4410,
 'hate': 2827,
 'vid': 6511,
 'game': 2588,
 'music': 4014,
 'play': 4512,
 'chrono': 1108,
 'cross': 1468,
 'good': 2679,
 'back': 572,
 'away': 557,
 'crude': 1474,
 'keyboarding': 3420,
 'take': 5936,
 'fresh': 2531,
 'step': 5720,
 'grate': 2717,
 'guitar': 2767,
 'soulful': 5598,
 'orchestra': 4224,
 'impress': 3099,
 'care': 964,
 'listen': 3608,
 'soundtrack': 5604,
 'read': 4863,
 'lot': 3662,
 'review': 5065,
 'say': 5217,
 'figure': 2359,
 'write': 6764,
 'disagree': 1742,
 'bit': 721,
 'opinino': 4214,
 'yasunori': 6789,
 'mitsuda': 3932,
 'ultimate': 6305,
 'masterpiece': 3787,
 'timeless': 6126,
 'year': 6792,
 'beauty': 642,
 'simply': 5452,
 'refuse': 4934,
 'fade': 2265,
 'the': 6044,
 'price': 4661,
 'tag': 5935,
 'pretty': 4657,
 'staggering': 5677,
 'go': 2671,
 'buy': 914,
 'cd': 1005,
 'money': 3956,
 'feel': 2335,
 'worth': 6748,
 'penny': 4407,
 'amazing': 322,
 'favorite': 2316,
 'time': 6125,
 'hand': 2791,
 'intense': 3219,
 'sadness': 5170,
 'prisoner': 4677,
 'fate': 2310,
 'mean': 3811,
 'hope': 2966,
 'distant': 1809,
 'promise': 4716,
 'girl': 2645,
 'steal': 5712,
 'star': 5687,
 'important': 3093,
 'inspiration': 3192,
 'personally': 4440,
 'teen': 5987,
 'high': 2906,
 'energy': 2060,
 'scar': 5227,
 'dreamwatch': 1895,
 'chronomantique': 1109,
 'indefinably': 3134,
 'remeniscent': 4980,
 'trigger': 6243,
 'absolutely': 155,
 'superb': 5850,
 'well': 6647,
 'this': 6080,
 'probably': 4684,
 'composer': 1277,
 'work': 6733,
 'hear': 2855,
 'xenogears': 6776,
 'sure': 5870,
 'twice': 6286,
 'it': 3301,
 'wish': 6701,
 'excellent': 2167,
 'truly': 6263,
 'enjoy': 2073,
 'video': 6512,
 'relaxing': 4955,
 'peaceful': 4389,
 'on': 4202,
 'disk': 1782,
 'life': 3578,
 'death': 1563,
 'forest': 2471,
 'illusion': 3068,
 'fortress': 2492,
 'ancient': 348,
 'dragon': 1878,
 'lost': 3661,
 'fragment': 2509,
 'drown': 1907,
 'valley': 6462,
 'draggon': 1877,
 'galdorb': 2582,
 'home': 2948,
 'gale': 2583,
 'girlfriend': 2647,
 'like': 3588,
 'zelbessdisk': 6817,
 'garden': 2594,
 'god': 2674,
 'chronopolis': 1110,
 'jellyfish': 3344,
 'sea': 5278,
 'burn': 895,
 'orphange': 4239,
 'prayer': 4620,
 'tower': 6189,
 'radical': 4827,
 'dreamer': 1893,
 'unstealable': 6409,
 'jewel': 3355,
 'overall': 4270,
 'bring': 853,
 'xander': 6775,
 'remember': 4979,
 'pull': 4768,
 'jaw': 3334,
 'floor': 2426,
 'know': 3454,
 'divine': 1822,
 'single': 5463,
 'song': 5580,
 'tell': 5994,
 'story': 5752,
 'great': 2722,
 'doubt': 1863,
 'magical': 3712,
 'wind': 6694,
 'unstolen': 6412,
 'translation': 6216,
 'vary': 6472,
 'perfect': 4417,
 'ask': 482,
 'pour': 4605,
 'heart': 2858,
 'paper': 4334,
 'absolute': 154,
 'actually': 200,
 'aware': 556,
 'contribute': 1355,
 'greatly': 2723,
 'mood': 3964,
 'minute': 3905,
 'compose': 1276,
 'exact': 2157,
 'count': 1406,
 'render': 4991,
 'impressively': 3105,
 'remarkable': 4975,
 'assure': 499,
 'forget': 2476,
 'listener': 3610,
 'fast': 2305,
 'paced': 4305,
 'energetic': 2057,
 'dance': 1529,
 'tokage': 6152,
 'termina': 6008,
 'slow': 5519,
 'haunting': 2829,
 'purely': 4780,
 'beautifully': 641,
 'fantastic': 2293,
 'vocal': 6548,
 'dreamers': 1894,
 'videogame': 6513,
 'surely': 5871,
 'buyer': 915,
 'beware': 698,
 'self': 5305,
 'publish': 4760,
 'book': 777,
 'want': 6590,
 'paragraph': 4340,
 'ms': 3996,
 'haddon': 2780,
 'family': 2286,
 'friend': 2535,
 'imagine': 3076,
 'thing': 6073,
 'spend': 5636,
 'evening': 2142,
 'hysteric': 3033,
 'piece': 4476,
 'definitely': 1599,
 'bad': 579,
 'enter': 2085,
 'kind': 3438,
 'contest': 1341,
 'believe': 670,
 'amazon': 324,
 'sell': 5306,
 'maybe': 3802,
 'offer': 4186,
 '8th': 128,
 'grade': 2696,
 'term': 6007,
 'kill': 3434,
 'mockingbird': 3941,
 'send': 5311,
 'joke': 3374,
 'stay': 5711,
 'far': 2295,
 'glorious': 2664,
 'love': 3669,
 'whisper': 6665,
 'wicked': 6675,
 'saint': 5176,
 'pleasantly': 4520,
 'surprised': 5881,
 'change': 1035,
 'normaly': 4118,
 'romance': 5121,
 'novel': 4133,
 'world': 6740,
 'rave': 4857,
 'brilliant': 851,
 'true': 6261,
 'wonderful': 6719,
 'typical': 6296,
 'crime': 1453,
 'becuase': 645,
 'miss': 3922,
 'warm': 6597,
 'finish': 2378,
 'fall': 2280,
 'caracter': 959,
 'expect': 2203,
 'average': 545,
 'instead': 3200,
 'find': 2373,
 'think': 6075,
 'predict': 4630,
 'outcome': 4254,
 'shock': 5398,
 'writting': 6769,
 'descriptive': 1653,
 'break': 834,
 'julia': 3386,
 'reader': 4865,
 'lover': 3674,
 'let': 3553,
 'cover': 1421,
 'fool': 2456,
 'spectacular': 5630,
 'easy': 1961,
 'leave': 3532,
 'follow': 2449,
 'come': 1221,
 'soon': 5582,
 'get': 2631,
 'enjoyable': 2074,
 'complete': 1269,
 'waste': 6611,
 'typographical': 6299,
 'error': 2120,
 'poor': 4565,
 'grammar': 2700,
 'totally': 6180,
 'pathetic': 4372,
 'plot': 4528,
 'add': 205,
 'embarrassed': 2023,
 'author': 533,
 'disappointed': 1747,
 'pay': 4384,
 'boy': 816,
 'twist': 6290,
 'turn': 6278,
 'keep': 3412,
 'guess': 2758,
 'happen': 2806,
 'make': 3726,
 'heat': 2860,
 'angery': 358,
 'throu': 6100,
 'emotion': 2031,
 'quick': 4808,
 'end': 2046,
 'day': 1550,
 'night': 4093,
 'realistic': 4871,
 'show': 5412,
 'human': 3006,
 'fact': 2259,
 'writer': 6765,
 'loving': 3675,
 'revengeful': 5061,
 'glass': 2655,
 'castle': 989,
 'oh': 4192,
 'discerning': 1755,
 'drivel': 1903,
 'trouble': 6254,
 'typo': 6298,
 'prominently': 4714,
 'feature': 2325,
 'page': 4315,
 'remove': 4988,
 'wait': 6574,
 'point': 4542,
 'beginning': 660,
 'clear': 1147,
 'intentional': 3222,
 'churning': 1114,
 'heated': 2861,
 'prose': 4735,
 'satiric': 5203,
 'purpose': 4788,
 'phew': 4455,
 'glad': 2652,
 '10': 5,
 '95': 131,
 'awful': 561,
 'belief': 667,
 '7th': 126,
 'grader': 2697,
 'grammatical': 2701,
 'skill': 5491,
 'age': 253,
 'reviewer': 5066,
 'misspelling': 3925,
 'chapter': 1038,
 'example': 2164,
 'mention': 3854,
 'lean': 3526,
 'house': 2992,
 'distract': 1814,
 'writing': 6766,
 'weak': 6624,
 'decide': 1575,
 'pencil': 4401,
 'mark': 3764,
 'horrible': 2974,
 'spelling': 5635,
 'relative': 4951,
 'faith': 2276,
 'try': 6266,
 'fake': 2279,
 'glaringly': 2654,
 'obvious': 4167,
 'glow': 2669,
 'person': 4436,
 'sentence': 5322,
 'structure': 5782,
 'veronica': 6491,
 'romantic': 5124,
 'zen': 6818,
 'baseball': 612,
 'comedy': 1225,
 'folk': 2447,
 'anymore': 391,
 'talk': 5940,
 'cool': 1372,
 'young': 6806,
 'cuban': 1486,
 'search': 5283,
 'idenity': 3043,
 'stumble': 5795,
 'coastal': 1191,
 'resort': 5029,
 'kitchen': 3448,
 'gig': 2639,
 'motorcycle': 3987,
 'maintenance': 3723,
 'man': 3737,
 'hysterical': 3034,
 'italian': 3302,
 'chef': 1066,
 'latino': 3500,
 'fireballing': 2383,
 'right': 5091,
 'handed': 2792,
 'pitcher': 4493,
 'team': 5973,
 'sponsor': 5655,
 'owner': 4299,
 'case': 981,
 'honest': 2953,
 'comical': 1234,
 'emotional': 2032,
 'interaction': 3227,
 'sizzling': 5484,
 'roster': 5132,
 'player': 4515,
 'mix': 3933,
 'special': 5623,
 'effect': 1987,
 'salsa': 5181,
 'flashback': 2403,
 'big': 707,
 'fashionable': 2304,
 'compression': 1284,
 'stocking': 5739,
 'dvt': 1942,
 'doctor': 1834,
 'require': 5018,
 'wear': 6628,
 'ugly': 6301,
 'white': 6666,
 'ted': 5985,
 'hose': 2982,
 'yucky': 6813,
 'thick': 6069,
 'brown': 865,
 'jobst': 3367,
 'ultrasheer': 6307,
 'give': 2649,
 'need': 4070,
 '15': 27,
 '20': 59,
 'look': 3648,
 'regular': 4939,
 'pantyhose': 4333,
 'blood': 747,
 'clot': 1175,
 'support': 5862,
 'leg': 3537,
 'nice': 4085,
 'note': 4125,
 'problem': 4686,
 'rubberized': 5143,
 'top': 6168,
 'roll': 5118,
 'thigh': 6071,
 'adhesive': 217,
 'skin': 5494,
 'inexpensive': 3152,
 'garter': 2600,
 'belt': 677,
 'fine': 2375,
 'help': 2877,
 'product': 4693,
 'difficult': 1712,
 'old': 4197,
 'workout': 6737,
 'begin': 656,
 'create': 1444,
 'deep': 1583,
 'ridge': 5086,
 'difficulty': 1713,
 'address': 213,
 'size': 5480,
 'recomende': 4905,
 'chart': 1052,
 'real': 4870,
 'small': 5522,
 'sheer': 5381,
 'item': 3307,
 'internet': 3243,
 'store': 5748,
 'check': 1058,
 'mens': 3851,
 'model': 3943,
 'ok': 4194,
 'sedentary': 5297,
 'type': 6295,
 'active': 196,
 'alot': 305,
 'job': 3366,
 'consistently': 1326,
 'ankle': 367,
 'solution': 5569,
 'standard': 5685,
 '30': 89,
 'stock': 5738,
 '114622': 18,
 'pair': 4321,
 'tear': 5974,
 'struggle': 5784,
 'riddance': 5084,
 'investment': 3267,
 'delicious': 1613,
 'cookie': 1370,
 'funny': 2569,
 'header': 2847,
 'quickly': 4810,
 'package': 4310,
 'notice': 4129,
 'title': 6142,
 'bake': 585,
 'convenience': 1360,
 'dough': 1864,
 'wrap': 6755,
 'plastic': 4509,
 'log': 3638,
 'surprise': 5880,
 'messy': 3864,
 'extremely': 2246,
 'sticky': 5729,
 'flexibility': 2416,
 'ratio': 4854,
 'ingredient': 3164,
 'extra': 2241,
 'butter': 908,
 'baked': 586,
 'chewy': 1074,
 'large': 3491,
 'chocolate': 1093,
 'chip': 1090,
 'addition': 210,
 'natural': 4051,
 'flavor': 2408,
 'abysmal': 166,
 'digital': 1717,
 'copy': 1379,
 'scratch': 5267,
 'insect': 3181,
 'dropping': 1906,
 'random': 4839,
 'pixelation': 4496,
 'combine': 1219,
 'muddy': 4001,
 'light': 3584,
 'vague': 6458,
 'image': 3072,
 'resolution': 5028,
 'cue': 1488,
 'packaging': 4311,
 'straight': 5755,
 'street': 5768,
 'corner': 1384,
 'bootleg': 787,
 'dealer': 1561,
 'if': 3051,
 'see': 5298,
 'reasonably': 4880,
 'condition': 1303,
 'film': 2364,
 'define': 1597,
 'visual': 6538,
 'crystal': 1483,
 'lighting': 3585,
 'contrast': 1353,
 'black': 727,
 'surround': 5886,
 'countryside': 1411,
 'scene': 5240,
 'set': 5340,
 'early': 1953,
 'morning': 3974,
 'ground': 2745,
 'mist': 3926,
 'haze': 2840,
 'memory': 3849,
 'event': 2144,
 'bridge': 846,
 'water': 6614,
 'bright': 848,
 'immediate': 3082,
 'here': 2887,
 'dull': 1923,
 'dark': 1538,
 'clouded': 1179,
 'timbre': 6124,
 'enunciation': 2099,
 'captain': 953,
 'command': 1235,
 'visuals': 6541,
 'after': 249,
 'hard': 2813,
 'award': 555,
 'win': 6693,
 'critically': 1461,
 'acclaim': 173,
 'presentation': 4647,
 'youtube': 6810,
 'dvd': 1935,
 '16': 29,
 'mm': 3937,
 'public': 4758,
 'library': 3570,
 'reel': 4925,
 'just': 3395,
 'appear': 413,
 'fascinating': 2301,
 'insight': 3186,
 'modern': 3944,
 'japanese': 3330,
 'thoroughly': 6086,
 'rise': 5098,
 'son': 5579,
 'daughter': 1546,
 'society': 5555,
 'view': 6515,
 'poise': 4545,
 'parent': 4345,
 'culture': 1491,
 'restraint': 5043,
 'obedience': 4157,
 'community': 1249,
 'peer': 4396,
 'adulation': 224,
 'western': 6650,
 'form': 2482,
 'new': 4081,
 'japan': 3329,
 'international': 3242,
 'blend': 738,
 'ando': 351,
 'demonstrate': 1630,
 'vignette': 6517,
 'private': 4678,
 'member': 3846,
 'steven': 5727,
 'wardell': 6594,
 'clearly': 1148,
 'talented': 5939,
 'adopt': 221,
 'schooling': 5252,
 'able': 146,
 'inside': 3185,
 'album': 281,
 'blue': 751,
 'angel': 355,
 'lanna': 3484,
 'mama': 3736,
 'hair': 2781,
 'neck': 4067,
 'roy': 5140,
 'trully': 6262,
 'singer': 5460,
 'talent': 5938,
 'charge': 1046,
 'aaas': 137,
 'charger': 1047,
 'aa': 135,
 'battery': 621,
 'huge': 3003,
 'secure': 5296,
 'aaa': 136,
 'flip': 2422,
 'little': 3619,
 'button': 911,
 'positive': 4581,
 'pop': 4567,
 'will': 6689,
 'hold': 2936,
 'mechanism': 3823,
 'loose': 3653,
 'horizontal': 2970,
 'pressure': 4653,
 'push': 4790,
 'duct': 1919,
 'tape': 5951,
 'segment': 5302,
 'crayon': 1440,
 'apply': 420,
 'painful': 4318,
 'advertise': 237,
 'instruction': 3203,
 'not': 4124,
 '24': 77,
 'hour': 2990,
 'charging': 1048,
 'return': 5055,
 'unit': 6371,
 'useless': 6447,
 'backup': 575,
 'manage': 3738,
 'drain': 1881,
 'aas': 138,
 'purchase': 4776,
 'convenient': 1361,
 'last': 3494,
 'short': 5407,
 'long': 3645,
 'kodak': 3460,
 'nimh': 4097,
 'dear': 1562,
 'excited': 2178,
 'ostensibly': 4243,
 'muslim': 4019,
 'feminism': 2340,
 'volume': 6553,
 'live': 3620,
 'expectations': 2205,
 'one': 4204,
 'essay': 2126,
 'describe': 1650,
 'veil': 6483,
 'potentially': 4598,
 'liberating': 3566,
 'explain': 2215,
 'why': 6673,
 'another': 376,
 'woman': 6716,
 'cape': 947,
 'town': 6190,
 'claim': 1129,
 'separate': 5323,
 'equal': 2106,
 'gee': 2609,
 'whiz': 6670,
 'disappointment': 1749,
 'feminist': 2341,
 'condemnation': 1300,
 'gender': 2612,
 'apartheid': 398,
 'extoll': 2240,
 'virtue': 6531,
 'female': 2339,
 'genital': 2621,
 'mutilation': 4020,
 'alyssa': 315,
 'lappen': 3488,
 'base': 611,
 'vcr': 6477,
 'christmas': 1105,
 'present': 4646,
 'join': 3372,
 'rest': 5039,
 'land': 3481,
 'vhs': 6501,
 'movie': 3992,
 'jvc': 3402,
 'tv': 6280,
 'choice': 1094,
 'agree': 261,
 'awkward': 564,
 'selection': 5304,
 'option': 4222,
 'hang': 2800,
 'two': 6292,
 'comment': 1238,
 'intuitive': 3261,
 'complicated': 1273,
 'remote': 4986,
 'technically': 5979,
 'minded': 3895,
 'rely': 4972,
 'heavily': 2864,
 'manual': 3753,
 'timer': 6128,
 'start': 5696,
 'scroll': 5274,
 'something': 5577,
 'complaint': 1268,
 'incorrect': 3126,
 'disc': 1754,
 'fan': 2288,
 'suspiscious': 5899,
 'section': 5294,
 'happy': 2811,
 'click': 1155,
 'receiver': 4892,
 'transition': 6215,
 'smooth': 5535,
 'pause': 4382,
 'fairly': 2273,
 'headcleaner': 2846,
 'message': 3863,
 'nut': 4153,
 'television': 5993,
 'bookshelf': 781,
 'audio': 522,
 'system': 5929,
 'car': 958,
 'room': 5128,
 'combo': 1220,
 'longer': 3646,
 'things': 6074,
 'no': 4102,
 'cable': 921,
 'box': 813,
 'compatability': 1257,
 'control': 1358,
 'seperate': 5326,
 'input': 3176,
 'coax': 1192,
 'programming': 4709,
 'mono': 3957,
 'wife': 6681,
 'difference': 1710,
 'hollywood': 2945,
 'debacle': 1564,
 'ridiculous': 5088,
 'wonder': 6718,
 'script': 5273,
 'mountain': 3988,
 'lion': 3605,
 'trailer': 6205,
 'capture': 957,
 'jail': 3320,
 'cell': 1012,
 'utterly': 6455,
 'completely': 1270,
 'stupid': 5798,
 'bet': 694,
 'hotel': 2988,
 'babylon': 569,
 'incredible': 3129,
 'acting': 193,
 'tamzin': 5945,
 'outhwaite': 4259,
 'eastenders': 1959,
 'bbc': 625,
 'soap': 5548,
 'max': 3801,
 'beesley': 651,
 'ill': 3061,
 'fated': 2311,
 'glitter': 2660,
 'mariah': 3761,
 'carey': 968,
 'drama': 1882,
 'series': 5332,
 'opera': 4212,
 'air': 273,
 'america': 330,
 'episode': 2104,
 'season': 5285,
 'finale': 2369,
 'interesting': 3233,
 'watch': 6612,
 'remind': 4981,
 'abc': 141,
 '1983': 51,
 '1988': 53,
 'reason': 4878,
 'fictional': 2355,
 'san': 5188,
 'francisco': 2515,
 'luxury': 3693,
 'england': 2068,
 'recommend': 4907,
 'willing': 6692,
 'casually': 991,
 'law': 3511,
 'school': 5250,
 'seriously': 5333,
 'unfortunately': 6356,
 'entertaining': 2088,
 'order': 4226,
 'hip': 2919,
 'daddy': 1524,
 'vibe': 6502,
 'dismay': 1785,
 'fourth': 2506,
 'class': 1137,
 'main': 3719,
 'jist': 3363,
 'xylaphone': 6783,
 'voice': 6550,
 'replicate': 5007,
 'party': 4361,
 'neighborhood': 4075,
 'laugh': 3503,
 'beach': 629,
 'grow': 2748,
 'surfer': 5874,
 'diego': 1707,
 'southern': 5608,
 'california': 925,
 'brother': 863,
 'honestly': 2954,
 'kinda': 3439,
 'absolutle': 156,
 'epitimy': 2105,
 'surf': 5872,
 'cha': 1027,
 'rochelle': 5109,
 'hell': 2875,
 'moral': 3969,
 'aspect': 484,
 'american': 331,
 'lucid': 3683,
 'argue': 440,
 'explanation': 2217,
 'simple': 5447,
 'focused': 2442,
 'individual': 3144,
 'ignore': 3056,
 'mock': 3940,
 'personal': 4437,
 'responsibility': 5037,
 'final': 2368,
 'response': 5036,
 'indictment': 3141,
 'robert': 5106,
 'ringer': 5094,
 'seller': 5307,
 'disgusted': 1775,
 'boorish': 785,
 'state': 5704,
 'medium': 3832,
 'politic': 4554,
 'discourse': 1760,
 'general': 2614,
 'head': 2844,
 'substantial': 5813,
 'challenge': 1030,
 'lie': 3577,
 'americans': 333,
 'being': 665,
 'playing': 4516,
 'larry': 3493,
 'muse': 4012,
 'label': 3468,
 'late': 3496,
 '80': 127,
 '90': 130,
 'explore': 2223,
 'rich': 5078,
 'catalog': 993,
 'jazz': 3338,
 'musician': 4017,
 'relaxed': 4954,
 'valentine': 6460,
 'stand': 5684,
 'chet': 1071,
 'baker': 587,
 'mile': 3887,
 'mac': 3699,
 'line': 3599,
 'os': 4241,
 'window': 6695,
 'frustrating': 2550,
 'attempt': 509,
 'touch': 6181,
 'use': 6445,
 'mouse': 3989,
 'power': 4609,
 'arrow': 458,
 'keyboard': 3419,
 'fun': 2558,
 'disapointed': 1744,
 'numbing': 4149,
 'attention': 511,
 'level': 3556,
 'rescue': 5020,
 'hero': 2889,
 'dog': 1840,
 'alaskan': 280,
 'repeatedly': 5000,
 'allman': 297,
 'recipe': 4900,
 'throw': 6103,
 'cup': 1495,
 'flour': 2431,
 'miscellaneous': 3910,
 'nearly': 4062,
 'compare': 1254,
 'ed': 1974,
 'wood': 6724,
 'flawlessly': 2413,
 'lisa': 3606,
 'rayner': 4858,
 'wild': 6685,
 'bread': 832,
 'sourdough': 5606,
 'artisan': 468,
 'fail': 2268,
 'novice': 4137,
 'reliably': 4963,
 'pancake': 4329,
 'place': 4500,
 'reliable': 4961,
 'concise': 1295,
 'information': 3161,
 'maintain': 3722,
 'fabulous': 2255,
 'tome': 6160,
 'pass': 4362,
 'while': 6661,
 'historical': 2923,
 'warrant': 6602,
 ...}
idf = tf.idf_
rr = dict(zip(txt_fitted.get_feature_names(), idf))
token_weight = pd.DataFrame.from_dict(rr, orient='index').reset_index()
token_weight.columns=('token','weight')
token_weight = token_weight.sort_values(by='weight', ascending=False)[:10]
import seaborn as sns
sns.barplot(x='token', y='weight', data=token_weight)            
plt.title("Inverse Document Frequency(idf) per token")
fig=plt.gcf()
fig.set_size_inches(10,5)
plt.show()

png

# get feature names
feature_names = np.array(tf.get_feature_names())
sorted_by_idf = np.argsort(tf.idf_)
print("Features with lowest idf:\n{}".format(
       feature_names[sorted_by_idf[:3]]))
print("\nFeatures with highest idf:\n{}".format(
       feature_names[sorted_by_idf[-3:]]))
Features with lowest idf:
['book' 'read' 'good']

Features with highest idf:
['nc' 'nathanial' 'zzzzzzzzzzzz']
TF-IDF - Maximum token value throughout the whole dataset
# find maximum value for each of the features over all of dataset:
max_val = txt_transformed.max(axis=0).toarray().ravel()

#sort weights from smallest to biggest and extract their indices 
sort_by_tfidf = max_val.argsort()

print("Features with lowest tfidf:\n{}".format(
      feature_names[sort_by_tfidf[:3]]))

print("\nFeatures with highest tfidf: \n{}".format(
      feature_names[sort_by_tfidf[-3:]]))
Features with lowest tfidf:
['second' 'finish' 'course']

Features with highest tfidf: 
['sword' 'profanity' 'cookie']

Custom Trained Embeddings

Word2Vec

Word2Vec is a more recent model that embeds words in a lower-dimensional vector space using a shallow neural network. The result is a set of word-vectors where vectors close together in vector space have similar meanings based on context, and word-vectors distant to each other have differing meanings. For example, strong and powerful would be close together and strong and Paris would be relatively far. There are two versions of this model based on skip-grams (SG) and continuous-bag-of-words (CBOW), both implemented by the gensim Word2Vec class.

Word2Vec uses a trick you may have seen elsewhere in machine learning. We’re going to train a simple neural network with a single hidden layer to perform a certain task, but then we’re not actually going to use that neural network for the task we trained it on! Instead, the goal is actually just to learn the weights of the hidden layer–we’ll see that these weights are actually the “word vectors” that we’re trying to learn.

The network is going to learn the statistics from the number of times each pairing shows up. So, for example, the network is probably going to get many more training samples of (“Soviet”, “Union”) than it is of (“Soviet”, “Sasquatch”). When the training is finished, if you give it the word “Soviet” as input, then it will output a much higher probability for “Union” or “Russia” than it will for “Sasquatch”.

  • Word2Vec - Skip-gram Model The skip-gram word2vec model, for example, takes in pairs (word1, word2) generated by moving a window across text data, and trains a 1-hidden-layer neural network based on the synthetic task of given an input word, giving us a predicted probability distribution of nearby words to the input. A virtual one-hot encoding of words goes through a ‘projection layer’ to the hidden layer; these projection weights are later interpreted as the word embeddings. So if the hidden layer has 300 neurons, this network will give us 300-dimensional word embeddings.

  • Word2Vec - Continuous-bag-of-words Model Continuous-bag-of-words Word2vec is very similar to the skip-gram model. It is also a 1-hidden-layer neural network. The synthetic training task now uses the average of multiple input context words, rather than a single word as in skip-gram, to predict the center word. Again, the projection weights that turn one-hot words into averageable vectors, of the same width as the hidden layer, are interpreted as the word embeddings.

CBOW

Defining Context Word Pairs

from tensorflow.keras.utils import to_categorical
def generate_context_word_pairs(corpus, window_size, vocab_size):
    context_length = window_size*2
    for words in corpus:
        sentence_length = len(words)
        for index, word in enumerate(words):
            context_words = []
            label_word   = []            
            start = index - window_size
            end = index + window_size + 1
            
            context_words.append([words[i] 
                                 for i in range(start, end) 
                                 if 0 <= i < sentence_length 
                                 and i != index])
            label_word.append(word)

            x = pad_sequences(context_words, maxlen=context_length)
            y = to_categorical(label_word, vocab_size)
            yield (x, y)

Sample Inputs and Outputs

context_size=2
i = 0
for x, y in generate_context_word_pairs(corpus=tokenized_reviews, window_size=context_size, vocab_size=V):
    if 0 not in x[0]:
        print('Context (X):', [id2word[w] for w in x[0]], '-> Target (Y):', id2word[np.argwhere(y[0])[0][0]])
        if i == 10:
            break
        i += 1
Context (X): ['stun', 'non', 'sound', 'track'] -> Target (Y): gamer
Context (X): ['non', 'gamer', 'track', 'beautiful'] -> Target (Y): sound
Context (X): ['gamer', 'sound', 'beautiful', 'paint'] -> Target (Y): track
Context (X): ['sound', 'track', 'paint', 'senery'] -> Target (Y): beautiful
Context (X): ['track', 'beautiful', 'senery', 'mind'] -> Target (Y): paint
Context (X): ['beautiful', 'paint', 'mind', 'recomend'] -> Target (Y): senery
Context (X): ['paint', 'senery', 'recomend', 'people'] -> Target (Y): mind
Context (X): ['senery', 'mind', 'people', 'hate'] -> Target (Y): recomend
Context (X): ['mind', 'recomend', 'hate', 'vid'] -> Target (Y): people
Context (X): ['recomend', 'people', 'vid', 'game'] -> Target (Y): hate
Context (X): ['people', 'hate', 'game', 'music'] -> Target (Y): vid

Defining CBOW Architecture

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Lambda
from tensorflow import keras

# build CBOW architecture
cbow = Sequential()
cbow.add(Embedding(input_dim=V, output_dim=10, input_length=2*2))
cbow.add(Lambda(lambda x: keras.backend.mean(x, axis=1), output_shape=(10,)))
cbow.add(Dense(V, activation='softmax'))
cbow.compile(loss='categorical_crossentropy', optimizer='adam')

# view model summary
print(cbow.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 4, 10)             72020     
_________________________________________________________________
lambda_1 (Lambda)            (None, 10)                0         
_________________________________________________________________
dense (Dense)                (None, 7202)              79222     
=================================================================
Total params: 151,242
Trainable params: 151,242
Non-trainable params: 0
_________________________________________________________________
None
#!pip install pydot graphviz 

Training the CBOW Model

for epoch in range(1, 6):
    loss = 0.
    i = 0
    for x, y in generate_context_word_pairs(corpus=tokenized_reviews, window_size=context_size, vocab_size=V):
        i += 1
        loss += cbow.train_on_batch(x, y)
        if i % 10000 == 0:
            print('Processed {} (context, word) pairs'.format(i))
    print('Epoch:', epoch, '\tLoss:', loss/i)
Processed 10000 (context, word) pairs
Processed 20000 (context, word) pairs
Processed 30000 (context, word) pairs
Epoch: 1 	Loss: 9.09887750598119
Processed 10000 (context, word) pairs
Processed 20000 (context, word) pairs
Processed 30000 (context, word) pairs
Epoch: 2 	Loss: 9.529425923458813
Processed 10000 (context, word) pairs
Processed 20000 (context, word) pairs
Processed 30000 (context, word) pairs
Epoch: 3 	Loss: 9.413009126756606
Processed 10000 (context, word) pairs
Processed 20000 (context, word) pairs
Processed 30000 (context, word) pairs
Epoch: 4 	Loss: 9.300529336367639
Processed 10000 (context, word) pairs
Processed 20000 (context, word) pairs
Processed 30000 (context, word) pairs
Epoch: 5 	Loss: 9.207261824633273
cbow.get_weights()[0].shape
(7202, 10)

Skipgram Model

Defining Skipgram Pairs

from tensorflow.keras.preprocessing.sequence import skipgrams
skip_grams=[skipgrams(r,V,window_size=4) for r in tokenized_reviews_idx]

Sample Inputs and Outputs

# view sample skip-grams
pairs, labels = skip_grams[0][0], skip_grams[0][1]
for i in range(10):
    print("({:s} ({:d}), {:s} ({:d})) -> {:d}".format(
          id2word[pairs[i][0]], pairs[i][0], 
          id2word[pairs[i][1]], pairs[i][1], 
          labels[i]))

embed_size=10
(recomend (1127), slinky (5656)) -> 0
(keyboarding (3168), december (5055)) -> 0
(fresh (1344), take (50)) -> 1
(step (776), crude (705)) -> 1
(grate (3169), orchestra (3171)) -> 1
(mind (236), load (1457)) -> 0
(beautiful (349), sound (152)) -> 1
(music (49), fun*charater (7028)) -> 0
(^_^ (2158), style (161)) -> 0
(game (32), dave (7003)) -> 0

Defining Skipgram Architecture

from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.layers import Input, Dense, Embedding, Lambda,Reshape, Dot

# build skip-gram architecture
word_model = Sequential()
word_model.add(Embedding(V, embed_size,
                         embeddings_initializer="glorot_uniform",
                         input_length=1))
word_model.add(Reshape((embed_size, )))

context_model = Sequential()
context_model.add(Embedding(V, embed_size,
                  embeddings_initializer="glorot_uniform",
                  input_length=1))
context_model.add(Reshape((embed_size,)))
input_sequence_1 = Input((None,))
input_sequence_2 = Input((None,))
dot=Dot(1)([word_model(input_sequence_1), context_model(input_sequence_2)])
out=Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid")(dot)
skip_gram=Model(inputs=[input_sequence_1, input_sequence_2], outputs=out)
skip_gram.compile(loss="mean_squared_error", optimizer="adam")
print(skip_gram.summary())

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
sequential_2 (Sequential)       (None, 10)           72020       input_1[0][0]                    
__________________________________________________________________________________________________
sequential_3 (Sequential)       (None, 10)           72020       input_2[0][0]                    
__________________________________________________________________________________________________
dot (Dot)                       (None, 1)            0           sequential_2[1][0]               
                                                                 sequential_3[1][0]               
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1)            2           dot[0][0]                        
==================================================================================================
Total params: 144,042
Trainable params: 144,042
Non-trainable params: 0
__________________________________________________________________________________________________
None

Training Skipgram Model

losses=[]
for epoch in range(1, 21):
    loss = 0
    for i, elem in enumerate(skip_grams):
        pair_first_elem = np.array(list(zip(*elem[0]))[0], dtype='int32')
        pair_second_elem = np.array(list(zip(*elem[0]))[1], dtype='int32')
        labels = np.array(elem[1], dtype='int32')
        X = [pair_first_elem, pair_second_elem]
        Y = labels
        if i % 1000 == 0:
            print('Processed {} (skip_first, skip_second, relevance) pairs'.format(i))
        loss += skip_gram.train_on_batch(X,Y)  
    losses.append(loss)
    print('Epoch:', epoch, 'Loss:', loss)
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 1 Loss: 235.61248167604208
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 2 Loss: 183.11513358727098
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 3 Loss: 161.4085715673864
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 4 Loss: 152.8154918886721
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 5 Loss: 148.4701641947031
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 6 Loss: 145.83660160005093
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 7 Loss: 143.74265605583787
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 8 Loss: 141.48228558525443
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 9 Loss: 138.557249289006
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 10 Loss: 134.69301289878786
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 11 Loss: 129.8125508632511
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 12 Loss: 124.07405409216881
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 13 Loss: 117.83104456402361
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 14 Loss: 111.5576415732503
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 15 Loss: 105.66755012259819
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 16 Loss: 100.3857261503581
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 17 Loss: 95.76235720119439
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 18 Loss: 91.74112020782195
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 19 Loss: 88.24516682536341
Processed 0 (skip_first, skip_second, relevance) pairs
Epoch: 20 Loss: 85.17682879325002
skip_gram.get_weights()[2].shape
(1, 1)
skip_gram.layers[2]
<tensorflow.python.keras.engine.sequential.Sequential at 0x19ed61136c8>
skip_gram.layers[2].get_weights()[0].shape
(7202, 10)

Comparing Skipgram Embeddings with CBOW Embeddings

word_emb_cbow=cbow.layers[0].get_weights()[0]
word_emb_skipgram=skip_gram.layers[2].get_weights()[0]
pd.DataFrame(word_emb_cbow, index=id2word.values()).head()
0 1 2 3 4 5 6 7 8 9
book 0.708490 -0.374733 -1.156976 0.061611 1.323896 -0.167964 -0.163473 -0.119780 -0.237391 -0.220100
read 0.735406 -0.177951 -1.701628 -0.180904 0.908241 0.057287 -0.259185 0.507574 -0.836217 0.124057
good 0.382017 -0.094007 -1.040212 -0.299303 0.120877 0.473118 0.280422 -0.570508 -0.499639 0.430844
great 0.134586 0.407227 -1.682064 0.282897 1.348614 0.793174 -0.404906 -1.019498 0.586437 -0.836739
love 0.195157 0.107005 -1.384699 0.248910 1.094821 0.434969 -0.049532 -0.878325 0.349899 -0.685128
pd.DataFrame(word_emb_skipgram, index=id2word.values()).head()
0 1 2 3 4 5 6 7 8 9
book -0.006523 -0.023810 0.018569 -0.005135 -0.009594 -0.027225 0.002897 -0.026608 -0.013788 -0.000316
read 0.696219 -0.371762 -0.280955 -0.019985 -0.103077 -1.282474 -0.550851 -0.432109 -0.228415 0.262486
good -0.351847 0.394407 -0.998008 0.393601 0.488250 -0.620138 -0.151128 -0.424484 -0.238454 0.016535
great 0.260107 -0.087505 0.411976 -0.294271 0.393042 -0.302891 -0.348807 1.168035 -0.540051 0.437487
love -0.080335 -0.385654 -0.596333 0.183089 0.144461 -0.085787 -0.689355 0.973601 0.321709 -0.142176

Visualizing learnt Embeddings

from sklearn.manifold import TSNE
tsne = TSNE()
Z = tsne.fit_transform(word_emb_cbow[:1000])
%matplotlib notebook
words=list(tokenizer.word_index.keys())[:1000]
plt.scatter(Z[:,0], Z[:,1])
for i in range(len(words)):
    plt.annotate(s=words[i], xy=(Z[i,0], Z[i,1]))
plt.show()
<IPython.core.display.Javascript object>

from sklearn.manifold import TSNE
tsne = TSNE()
Z = tsne.fit_transform(word_emb_skipgram[:1000])
%matplotlib notebook
words=list(tokenizer.word_index.keys())[:1000]
plt.scatter(Z[:,0], Z[:,1])
for i in range(len(words)):
    plt.annotate(s=words[i], xy=(Z[i,0], Z[i,1]))
plt.show()
<IPython.core.display.Javascript object>

GloVe

The GloVe algorithm uses context-counting approach to builds a word co-occurrence matrix and trains the word vectors to predict co-occurrence ratios based on their differences. Before Word2Vec, the matrix factorization techniques like Latent Semantic Analysis (LSA) were used to generate the word embeddings. In LSA, the matrices are of “term-document” type, i.e., the rows correspond to words or terms, and the columns correspond to different documents in the corpus. Word Vectors were generated by decomposing term-document matrices using Singular Value Decomposition. The resulting embeddings were not able to express word analogies into simple arithmetic operations unlike Word2Vec. GloVe, on the other hand, uses local context to compute the co-occurrence matrix using a fixed window size (words are deemed to co-occur when they appear together within a fixed window). After this, GLoVe aims to predict the co-occurrence ratios using the word vectors. Glove might result in generating better embeddings faster than word2vec as GloVe uses both the global co-occurrence statistics as well as local context.

Co-occurence Matrix Creation using Probability Ratios

# co-occurence matrix
X = np.zeros((V, V))
N = len(tokenized_reviews_idx)
context_size=5
for s in tokenized_reviews_idx:
    for i in range(len(s)):
        wi=s[i] # select current word
        start= max(0,i-context_size) # define start index
        #end = min(3,i+context_size) # define end index of the context
        if i - context_size < 0:
            points = 1.0/(i+1) # calculate context distances 
            X[wi,0]+=points
            X[0,wi]+=points

        for j in range(start,i):
                wj = s[j]
                points = 1.0 / (i - j) # this is +ve
                X[wi,wj] += points
                X[wj,wi] += points
# initialize weight matrix
fX=np.zeros((V,V))
fX[X<100]=(X[X<100]/float(100))**0.75
fX[X>=100]=1

Taking the log of the probability ratios to convert the ratio into a subtraction between probabilities.

# target
logX=np.log(X+1)
import tensorflow as tf
# Define the loss
def get_loss(model, inputs, targets):
    predictions = model(inputs)
    delta = targets - predictions
    return tf.reduce_sum(inputs * delta * delta)

# Gradient function
def get_grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        # calculate the loss
        loss_value = get_loss(model, inputs, targets)
        # return gradient
        return tape.gradient(loss_value, model.params)
    
class Glove(tf.keras.Model):
    def __init__(self, num_dims, vocab_size,mu):
        super(Glove, self).__init__()
        # initialize weights
        W = np.random.randn(V, num_dims) / np.sqrt(V + num_dims)
        b = np.zeros(V)
        U = np.random.randn(V, num_dims) / np.sqrt(V + num_dims)
        c = np.zeros(V)
        self.mu = mu
        # initialize weights, inputs, targets placeholders
        self.W = tf.Variable(W.astype(np.float32))
        self.b = tf.Variable(b.reshape(V, 1).astype(np.float32))
        self.U = tf.Variable(U.astype(np.float32))
        self.c = tf.Variable(c.reshape(1, V).astype(np.float32))
        self.params = [self.W, self.b,self.U,self.c]

    def call(self,inputs):
        return tf.matmul(self.W, tf.transpose(self.U)) + self.b + self.c + self.mu

Training GLoVE

mu = logX.mean()
glove_model=Glove(10,V,mu)

# Store the losses here
losses = []

# Create an optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.0001)

# Run the training loop
for i in range(200):
    # Get gradients
    grads = get_grad(glove_model, fX, logX)

    # Do one step of gradient descent: param <- param - learning_rate * grad
    optimizer.apply_gradients(zip(grads, glove_model.params))

    # Store the loss
    loss = get_loss(glove_model, fX, logX)
    losses.append(loss)
    print(i," ",loss)
WARNING:tensorflow:Layer glove is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

0   tf.Tensor(4179.916, shape=(), dtype=float32)
1   tf.Tensor(4120.3193, shape=(), dtype=float32)
2   tf.Tensor(4062.668, shape=(), dtype=float32)
3   tf.Tensor(4006.8877, shape=(), dtype=float32)
4   tf.Tensor(3952.9092, shape=(), dtype=float32)
5   tf.Tensor(3900.6646, shape=(), dtype=float32)
6   tf.Tensor(3850.0898, shape=(), dtype=float32)
7   tf.Tensor(3801.1228, shape=(), dtype=float32)
8   tf.Tensor(3753.7034, shape=(), dtype=float32)
9   tf.Tensor(3707.775, shape=(), dtype=float32)
10   tf.Tensor(3663.2812, shape=(), dtype=float32)
11   tf.Tensor(3620.17, shape=(), dtype=float32)
12   tf.Tensor(3578.3901, shape=(), dtype=float32)
13   tf.Tensor(3537.8926, shape=(), dtype=float32)
14   tf.Tensor(3498.6304, shape=(), dtype=float32)
15   tf.Tensor(3460.559, shape=(), dtype=float32)
16   tf.Tensor(3423.6335, shape=(), dtype=float32)
17   tf.Tensor(3387.813, shape=(), dtype=float32)
18   tf.Tensor(3353.0571, shape=(), dtype=float32)
19   tf.Tensor(3319.3274, shape=(), dtype=float32)
20   tf.Tensor(3286.5862, shape=(), dtype=float32)
21   tf.Tensor(3254.7983, shape=(), dtype=float32)
22   tf.Tensor(3223.929, shape=(), dtype=float32)
23   tf.Tensor(3193.9453, shape=(), dtype=float32)
24   tf.Tensor(3164.8157, shape=(), dtype=float32)
25   tf.Tensor(3136.5093, shape=(), dtype=float32)
26   tf.Tensor(3108.997, shape=(), dtype=float32)
27   tf.Tensor(3082.2507, shape=(), dtype=float32)
28   tf.Tensor(3056.2427, shape=(), dtype=float32)
29   tf.Tensor(3030.9475, shape=(), dtype=float32)
30   tf.Tensor(3006.339, shape=(), dtype=float32)
31   tf.Tensor(2982.3945, shape=(), dtype=float32)
32   tf.Tensor(2959.0896, shape=(), dtype=float32)
33   tf.Tensor(2936.4023, shape=(), dtype=float32)
34   tf.Tensor(2914.311, shape=(), dtype=float32)
35   tf.Tensor(2892.7957, shape=(), dtype=float32)
36   tf.Tensor(2871.8354, shape=(), dtype=float32)
37   tf.Tensor(2851.4114, shape=(), dtype=float32)
38   tf.Tensor(2831.5054, shape=(), dtype=float32)
39   tf.Tensor(2812.0996, shape=(), dtype=float32)
40   tf.Tensor(2793.1765, shape=(), dtype=float32)
41   tf.Tensor(2774.7207, shape=(), dtype=float32)
42   tf.Tensor(2756.715, shape=(), dtype=float32)
43   tf.Tensor(2739.146, shape=(), dtype=float32)
44   tf.Tensor(2721.9973, shape=(), dtype=float32)
45   tf.Tensor(2705.2559, shape=(), dtype=float32)
46   tf.Tensor(2688.9077, shape=(), dtype=float32)
47   tf.Tensor(2672.94, shape=(), dtype=float32)
48   tf.Tensor(2657.3396, shape=(), dtype=float32)
49   tf.Tensor(2642.0952, shape=(), dtype=float32)
50   tf.Tensor(2627.1948, shape=(), dtype=float32)
51   tf.Tensor(2612.6272, shape=(), dtype=float32)
52   tf.Tensor(2598.3816, shape=(), dtype=float32)
53   tf.Tensor(2584.4473, shape=(), dtype=float32)
54   tf.Tensor(2570.8147, shape=(), dtype=float32)
55   tf.Tensor(2557.474, shape=(), dtype=float32)
56   tf.Tensor(2544.416, shape=(), dtype=float32)
57   tf.Tensor(2531.6318, shape=(), dtype=float32)
58   tf.Tensor(2519.1125, shape=(), dtype=float32)
59   tf.Tensor(2506.8499, shape=(), dtype=float32)
60   tf.Tensor(2494.836, shape=(), dtype=float32)
61   tf.Tensor(2483.0632, shape=(), dtype=float32)
62   tf.Tensor(2471.5242, shape=(), dtype=float32)
63   tf.Tensor(2460.2112, shape=(), dtype=float32)
64   tf.Tensor(2449.1177, shape=(), dtype=float32)
65   tf.Tensor(2438.2373, shape=(), dtype=float32)
66   tf.Tensor(2427.5635, shape=(), dtype=float32)
67   tf.Tensor(2417.0894, shape=(), dtype=float32)
68   tf.Tensor(2406.8098, shape=(), dtype=float32)
69   tf.Tensor(2396.7188, shape=(), dtype=float32)
70   tf.Tensor(2386.8108, shape=(), dtype=float32)
71   tf.Tensor(2377.0803, shape=(), dtype=float32)
72   tf.Tensor(2367.5227, shape=(), dtype=float32)
73   tf.Tensor(2358.1318, shape=(), dtype=float32)
74   tf.Tensor(2348.9043, shape=(), dtype=float32)
75   tf.Tensor(2339.8345, shape=(), dtype=float32)
76   tf.Tensor(2330.9187, shape=(), dtype=float32)
77   tf.Tensor(2322.1519, shape=(), dtype=float32)
78   tf.Tensor(2313.5303, shape=(), dtype=float32)
79   tf.Tensor(2305.0496, shape=(), dtype=float32)
80   tf.Tensor(2296.7065, shape=(), dtype=float32)
81   tf.Tensor(2288.4963, shape=(), dtype=float32)
82   tf.Tensor(2280.4165, shape=(), dtype=float32)
83   tf.Tensor(2272.463, shape=(), dtype=float32)
84   tf.Tensor(2264.6323, shape=(), dtype=float32)
85   tf.Tensor(2256.9214, shape=(), dtype=float32)
86   tf.Tensor(2249.3271, shape=(), dtype=float32)
87   tf.Tensor(2241.8467, shape=(), dtype=float32)
88   tf.Tensor(2234.4766, shape=(), dtype=float32)
89   tf.Tensor(2227.2144, shape=(), dtype=float32)
90   tf.Tensor(2220.0571, shape=(), dtype=float32)
91   tf.Tensor(2213.0024, shape=(), dtype=float32)
92   tf.Tensor(2206.0474, shape=(), dtype=float32)
93   tf.Tensor(2199.19, shape=(), dtype=float32)
94   tf.Tensor(2192.4272, shape=(), dtype=float32)
95   tf.Tensor(2185.7573, shape=(), dtype=float32)
96   tf.Tensor(2179.1777, shape=(), dtype=float32)
97   tf.Tensor(2172.6863, shape=(), dtype=float32)
98   tf.Tensor(2166.2812, shape=(), dtype=float32)
99   tf.Tensor(2159.9602, shape=(), dtype=float32)
100   tf.Tensor(2153.7212, shape=(), dtype=float32)
101   tf.Tensor(2147.5625, shape=(), dtype=float32)
102   tf.Tensor(2141.4822, shape=(), dtype=float32)
103   tf.Tensor(2135.4785, shape=(), dtype=float32)
104   tf.Tensor(2129.5498, shape=(), dtype=float32)
105   tf.Tensor(2123.6943, shape=(), dtype=float32)
106   tf.Tensor(2117.9102, shape=(), dtype=float32)
107   tf.Tensor(2112.196, shape=(), dtype=float32)
108   tf.Tensor(2106.5508, shape=(), dtype=float32)
109   tf.Tensor(2100.9724, shape=(), dtype=float32)
110   tf.Tensor(2095.4597, shape=(), dtype=float32)
111   tf.Tensor(2090.0107, shape=(), dtype=float32)
112   tf.Tensor(2084.625, shape=(), dtype=float32)
113   tf.Tensor(2079.301, shape=(), dtype=float32)
114   tf.Tensor(2074.0374, shape=(), dtype=float32)
115   tf.Tensor(2068.8323, shape=(), dtype=float32)
116   tf.Tensor(2063.6855, shape=(), dtype=float32)
117   tf.Tensor(2058.5952, shape=(), dtype=float32)
118   tf.Tensor(2053.5608, shape=(), dtype=float32)
119   tf.Tensor(2048.5808, shape=(), dtype=float32)
120   tf.Tensor(2043.654, shape=(), dtype=float32)
121   tf.Tensor(2038.78, shape=(), dtype=float32)
122   tf.Tensor(2033.9573, shape=(), dtype=float32)
123   tf.Tensor(2029.185, shape=(), dtype=float32)
124   tf.Tensor(2024.4622, shape=(), dtype=float32)
125   tf.Tensor(2019.7882, shape=(), dtype=float32)
126   tf.Tensor(2015.1617, shape=(), dtype=float32)
127   tf.Tensor(2010.582, shape=(), dtype=float32)
128   tf.Tensor(2006.0488, shape=(), dtype=float32)
129   tf.Tensor(2001.5603, shape=(), dtype=float32)
130   tf.Tensor(1997.1166, shape=(), dtype=float32)
131   tf.Tensor(1992.7163, shape=(), dtype=float32)
132   tf.Tensor(1988.3589, shape=(), dtype=float32)
133   tf.Tensor(1984.0437, shape=(), dtype=float32)
134   tf.Tensor(1979.7698, shape=(), dtype=float32)
135   tf.Tensor(1975.5366, shape=(), dtype=float32)
136   tf.Tensor(1971.3433, shape=(), dtype=float32)
137   tf.Tensor(1967.1897, shape=(), dtype=float32)
138   tf.Tensor(1963.0747, shape=(), dtype=float32)
139   tf.Tensor(1958.998, shape=(), dtype=float32)
140   tf.Tensor(1954.9586, shape=(), dtype=float32)
141   tf.Tensor(1950.9562, shape=(), dtype=float32)
142   tf.Tensor(1946.9901, shape=(), dtype=float32)
143   tf.Tensor(1943.0599, shape=(), dtype=float32)
144   tf.Tensor(1939.1648, shape=(), dtype=float32)
145   tf.Tensor(1935.3042, shape=(), dtype=float32)
146   tf.Tensor(1931.4779, shape=(), dtype=float32)
147   tf.Tensor(1927.6852, shape=(), dtype=float32)
148   tf.Tensor(1923.9257, shape=(), dtype=float32)
149   tf.Tensor(1920.1985, shape=(), dtype=float32)
150   tf.Tensor(1916.504, shape=(), dtype=float32)
151   tf.Tensor(1912.8408, shape=(), dtype=float32)
152   tf.Tensor(1909.209, shape=(), dtype=float32)
153   tf.Tensor(1905.6079, shape=(), dtype=float32)
154   tf.Tensor(1902.0371, shape=(), dtype=float32)
155   tf.Tensor(1898.4965, shape=(), dtype=float32)
156   tf.Tensor(1894.985, shape=(), dtype=float32)
157   tf.Tensor(1891.5029, shape=(), dtype=float32)
158   tf.Tensor(1888.0494, shape=(), dtype=float32)
159   tf.Tensor(1884.6243, shape=(), dtype=float32)
160   tf.Tensor(1881.2272, shape=(), dtype=float32)
161   tf.Tensor(1877.8574, shape=(), dtype=float32)
162   tf.Tensor(1874.5149, shape=(), dtype=float32)
163   tf.Tensor(1871.1992, shape=(), dtype=float32)
164   tf.Tensor(1867.91, shape=(), dtype=float32)
165   tf.Tensor(1864.6469, shape=(), dtype=float32)
166   tf.Tensor(1861.4098, shape=(), dtype=float32)
167   tf.Tensor(1858.1978, shape=(), dtype=float32)
168   tf.Tensor(1855.0112, shape=(), dtype=float32)
169   tf.Tensor(1851.8494, shape=(), dtype=float32)
170   tf.Tensor(1848.712, shape=(), dtype=float32)
171   tf.Tensor(1845.599, shape=(), dtype=float32)
172   tf.Tensor(1842.5099, shape=(), dtype=float32)
173   tf.Tensor(1839.4443, shape=(), dtype=float32)
174   tf.Tensor(1836.402, shape=(), dtype=float32)
175   tf.Tensor(1833.3832, shape=(), dtype=float32)
176   tf.Tensor(1830.3866, shape=(), dtype=float32)
177   tf.Tensor(1827.4126, shape=(), dtype=float32)
178   tf.Tensor(1824.4612, shape=(), dtype=float32)
179   tf.Tensor(1821.5316, shape=(), dtype=float32)
180   tf.Tensor(1818.6235, shape=(), dtype=float32)
181   tf.Tensor(1815.7375, shape=(), dtype=float32)
182   tf.Tensor(1812.8721, shape=(), dtype=float32)
183   tf.Tensor(1810.028, shape=(), dtype=float32)
184   tf.Tensor(1807.2045, shape=(), dtype=float32)
185   tf.Tensor(1804.4016, shape=(), dtype=float32)
186   tf.Tensor(1801.619, shape=(), dtype=float32)
187   tf.Tensor(1798.8562, shape=(), dtype=float32)
188   tf.Tensor(1796.1135, shape=(), dtype=float32)
189   tf.Tensor(1793.3904, shape=(), dtype=float32)
190   tf.Tensor(1790.6868, shape=(), dtype=float32)
191   tf.Tensor(1788.002, shape=(), dtype=float32)
192   tf.Tensor(1785.3367, shape=(), dtype=float32)
193   tf.Tensor(1782.6897, shape=(), dtype=float32)
194   tf.Tensor(1780.0615, shape=(), dtype=float32)
195   tf.Tensor(1777.4515, shape=(), dtype=float32)
196   tf.Tensor(1774.8599, shape=(), dtype=float32)
197   tf.Tensor(1772.2863, shape=(), dtype=float32)
198   tf.Tensor(1769.7305, shape=(), dtype=float32)
199   tf.Tensor(1767.192, shape=(), dtype=float32)
W_f,_,U_f,_=glove_model.params
W1,W2=[W_f.numpy(),U_f.numpy().T]
We = np.hstack([W1, W2.T])
We_avg = (W1 + W2.T) / 2

Visualizing GLoVe

tsne = TSNE()
Z = tsne.fit_transform(We_avg[:1000])
%matplotlib notebook
words=list(tokenizer.word_index.keys())[:1000]
plt.scatter(Z[:,0], Z[:,1])
for i in range(len(words)):
    plt.annotate(s=words[i], xy=(Z[i,0], Z[i,1]))
plt.show()
<IPython.core.display.Javascript object>

FastText with Gensim

FastText splits out words using n-gram characters. Contrary to other popular models that learn word representations by assigning a distinct vector to each word, FastText is based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations. This approach is a significant improvement over word2vec and GloVe for two reasons:

  • The ability to infer out-of-vocabulary words. Example, ‘England’ is related to ‘Netherlands’ because of land present in both as ‘lan’ and ‘and’.
  • The robustness to spelling mistakes and typos.
from gensim.models import FastText
model_ft = FastText(reviews, size=20, window=5, min_count=1, iter=10, sorted_vocab=1)
model_ft.wv['book']
array([-0.14593095,  3.0355837 ,  1.114025  , -0.5476816 , -1.5547047 ,
        0.5005566 ,  1.5643926 , -3.8117805 ,  0.4766591 , -0.39367712,
       -1.6596947 ,  0.40210724, -1.9078114 ,  1.2122376 ,  1.9669913 ,
       -0.11507382, -0.6998857 , -0.90702647,  0.27300328,  0.16849636],
      dtype=float32)

Pretrained Embeddings

Loading Pretained GLoVE

import numpy as np
def loadGloveModel(File):
    print("Loading Glove Model")
    f = open(File,'r',encoding='utf8')
    gloveModel = {}
    for line in f:
        splitLines = line.split()
        if len(splitLines)>1:
            word = splitLines[0]
            wordEmbedding = np.array([float(value) for value in splitLines[1:]])
            gloveModel[word] = wordEmbedding
    print(len(gloveModel)," words loaded!")
    return gloveModel
filename = 'glove.6B/glove.6B.50d.txt'
glove_pretrained_embeddings=loadGloveModel(filename)
Loading Glove Model
400000  words loaded!

Spacy Word Vectors

nlp('abc').vector
array([ 0.015544 ,  0.57639  , -0.22369  ,  0.058487 ,  0.22128  ,
        0.0017315,  0.18017  ,  0.43484  ,  0.25918  , -0.15956  ,
       -0.57859  , -0.65652  , -0.5211   ,  0.18434  , -0.30634  ,
       -0.16944  , -0.040835 ,  0.85893  ,  0.20587  , -0.09456  ,
       -0.19005  ,  0.52936  ,  0.32827  ,  0.26956  ,  0.46428  ,
       -0.16203  , -0.86777  , -0.32718  ,  0.073993 , -0.14707  ,
       -0.16171  ,  0.14518  ,  0.52346  ,  0.28895  ,  0.10567  ,
        0.69243  , -0.21235  ,  0.78953  , -0.21818  ,  0.099286 ,
        0.19207  ,  0.076926 , -0.14017  ,  0.075001 , -0.49847  ,
       -0.045887 ,  0.10955  ,  0.02723  ,  0.21489  , -0.00508  ,
       -0.28442  , -0.10378  ,  0.42908  , -0.45975  , -0.18157  ,
       -0.075874 ,  0.032206 ,  0.39589  ,  0.034597 , -0.069252 ,
       -0.44504  ,  0.10295  ,  0.15058  , -0.42316  ,  0.013444 ,
        0.057608 ,  0.040736 , -0.089118 ,  0.19307  , -0.35986  ,
       -0.17654  , -0.27218  ,  0.27631  , -0.53668  ,  0.70755  ,
        0.42229  , -0.029253 ,  0.46846  ,  0.29093  ,  0.32838  ,
        0.074374 ,  0.41237  ,  0.05459  , -0.17672  ,  0.26883  ,
        0.066757 ,  0.90166  , -0.0079164,  0.090882 , -0.25042  ,
        0.15665  ,  0.17509  ,  0.099696 , -1.0406   ,  0.24512  ,
       -0.062083 , -0.28337  ,  0.15386  ,  0.12701  , -0.29336  ,
        0.11664  , -0.034885 ,  0.13529  ,  0.33636  , -0.38144  ,
       -0.8884   ,  0.45016  , -0.31225  ,  0.39693  , -0.087841 ,
       -0.39182  ,  0.38523  , -0.094282 ,  0.10318  ,  0.065112 ,
        0.18586  ,  0.21608  ,  0.73846  ,  0.36731  , -0.076651 ,
       -0.19152  ,  0.030808 ,  0.47692  , -0.074411 , -0.4125   ,
        0.16013  ,  0.46763  ,  0.60557  ,  0.2435   , -0.36579  ,
        0.49325  , -0.094911 ,  0.36638  , -0.26636  ,  0.24527  ,
        0.24341  , -0.66663  , -0.057877 , -0.071488 , -0.60937  ,
       -1.5547   , -0.31746  ,  0.42523  , -0.52602  ,  0.13782  ,
        0.27928  ,  0.40399  , -0.31062  , -0.18477  , -0.19317  ,
        0.31799  , -0.14417  , -0.38945  ,  0.031182 ,  0.55192  ,
       -0.39039  , -0.27579  , -0.13704  , -0.14254  ,  0.51599  ,
        0.27136  , -0.46131  ,  0.062238 ,  0.10632  , -0.15741  ,
        0.15737  , -0.42776  , -0.36731  ,  0.35005  , -0.30077  ,
        0.1453   ,  0.36576  , -0.50639  ,  0.25336  ,  0.45208  ,
       -0.33711  ,  0.028834 ,  0.49913  , -0.15675  ,  0.36223  ,
       -0.094679 , -0.25908  ,  0.36918  ,  0.63663  ,  0.2776   ,
       -0.062007 , -0.036352 , -0.014424 , -0.12582  , -0.5095   ,
       -0.64888  ,  0.13352  , -0.22433  , -0.12029  ,  0.118    ,
        0.37882  ,  0.28747  ,  0.28767  ,  0.58259  , -0.10626  ,
       -0.31157  , -0.50093  ,  0.38666  ,  0.30466  , -0.13183  ,
       -0.71681  ,  0.22892  ,  0.14153  , -0.079041 , -0.071159 ,
       -0.19885  ,  0.14945  ,  0.73209  ,  0.21448  , -0.32961  ,
       -0.05504  , -0.10333  , -0.33585  , -0.32386  ,  0.0092675,
        0.44894  ,  0.17116  , -0.12099  ,  0.2547   , -0.39983  ,
        0.69215  , -0.28655  , -0.094899 , -0.009452 ,  0.078809 ,
       -0.062489 ,  0.03548  ,  0.34206  ,  0.0078516, -0.7749   ,
       -0.4704   ,  0.076937 ,  0.31344  ,  0.17565  ,  0.3112   ,
       -0.087879 , -0.39894  ,  0.72429  , -0.31425  ,  0.01968  ,
       -0.31396  ,  0.34231  ,  0.042835 , -0.38605  , -0.14219  ,
        0.69995  ,  0.35879  ,  0.15834  , -0.52758  ,  0.26671  ,
       -0.15803  , -0.17879  ,  0.040895 ,  0.023194 , -0.0087685,
       -0.038725 ,  0.18178  , -0.24259  ,  0.033652 ,  0.61268  ,
       -0.59673  , -0.098315 ,  0.3811   ,  0.14771  ,  0.27156  ,
       -1.1951   ,  0.16972  , -0.18541  ,  0.54005  , -0.42205  ,
       -0.42933  , -0.24131  , -0.24106  , -0.18523  , -0.16413  ,
       -0.40613  , -0.44179  ,  0.011941 ,  0.63581  ,  0.097763 ,
        0.28391  , -0.08993  ,  0.47394  , -0.18371  ,  0.2378   ,
        0.72607  , -0.12237  , -0.54113  ,  0.56455  ,  0.39907  ,
       -0.47367  , -0.18671  , -0.54799  , -0.14384  ,  0.072209 ],
      dtype=float32)

Evaluating Embeddings

  • Finding Similar Words using Word Vectors
    • Cosine Similarity
  • Satisfying Word Analogies
    • King - Man + Woman = Queen
import numpy as np
from numpy import dot
from numpy.linalg import norm

# cosine similarity
def cosine(v1, v2):
    if norm(v1) > 0 and norm(v2) > 0:
        return dot(v1, v2) / (norm(v1) * norm(v2))
    else:
        return 0.0

Word Analogy Example:

Simlarity between Dog and Puppy should be more than Trousers and Octopus.

If we compare King - Man + Woman, we should get high similarity to Queen

v1=glove_pretrained_embeddings['king']-glove_pretrained_embeddings['man']+glove_pretrained_embeddings['woman']
v2=glove_pretrained_embeddings['queen']
cosine(v1, v2)
0.8609581258578943
cosine(glove_pretrained_embeddings['dog'], glove_pretrained_embeddings['puppy']) > cosine(
    glove_pretrained_embeddings['trousers'], glove_pretrained_embeddings['octopus'])
True

Applications of Word Embeddings

Word embeddings have found use across the complete spectrum of NLP tasks. Word Embeddings can help improve:

  • Text Classification tasks
  • Quality of language translations, by aligning single-language word embeddings using a transformation matrix.
  • Document search and information retrieval applications, where search strings no longer require exact keyword searches and can be insensitive to spelling.

More to explore

  • Doc2Vec
  • Combining Word Embeddings with TFIDF
  • Transformer Models
    • BERT
    • RoBERTa
    • DistilBERT
    • Open GPT (1 & 2)