USE - Universal Sentence Encoder

pip install vectorhub[encoders-text-tfhub]

Details

Example

#pip install vectorhub[encoders-text-tfhub]
#FOR WINDOWS: pip install vectorhub[encoders-text-tfhub-windows]
from vectorhub.encoders.text.tfhub import USE2Vec
model = USE2Vec()
model.encode("I enjoy taking long walks along the beach with my dog.")

Index and search vectors

Index and search your vectors easily on the cloud using 1 line of code!

username = '<your username>'
email = '<your email>'
# You can request an api_key using - type in your username and email.
api_key = model.request_api_key(username, email)

# Index in 1 line of code
items = ['chicken', 'toilet', 'paper', 'enjoy walking']
model.add_documents(user, api_key, items)

# Search in 1 line of code and get the most similar results.
model.search('basin')

# Add metadata to your search
metadata = [{'num_of_letters': 7, 'type': 'animal'}, {'num_of_letters': 6, 'type': 'household_items'}, {'num_of_letters': 5, 'type': 'household_items'}, {'num_of_letters': 12, 'type': 'emotion'}]
model.add_documents(user, api_key, items, metadata=metadata)

Description

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.",

USE Image

Image from Google.

Training Corpora

Wikipedia, Web News, web question-answering pages and discussion forums.

Working in Colab

If you are using this in colab and want to save this so you don't have to reload, use:

import os 
os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/"
os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "COMPRESSED"