BERT - Bidirectional Encoder Representations

pip install vectorhub[encoders-text-tfhub]

Details

Release date: 2018-10-11

Vector length: 1024 (Bert Large)

Repo: https://tfhub.dev/google/collections/bert/1

Paper: https://arxiv.org/abs/1810.04805v2

Example

#pip install vectorhub[encoders-text-tfhub]
#FOR WINDOWS: pip install vectorhub[encoders-text-tfhub-windows]
from vectorhub.encoders.text.tfhub import Bert2Vec
model = Bert2Vec()
model.encode("I enjoy taking long walks along the beach with my dog.")

Index and search vectors

Index and search your vectors easily on the cloud using 1 line of code!

username = '<your username>'
email = '<your email>'
# You can request an api_key using - type in your username and email.
api_key = model.request_api_key(username, email)

# Index in 1 line of code
items = ['chicken', 'toilet', 'paper', 'enjoy walking']
model.add_documents(user, api_key, items)

# Search in 1 line of code and get the most similar results.
model.search('basin')

# Add metadata to your search
metadata = [{'num_of_letters': 7, 'type': 'animal'}, {'num_of_letters': 6, 'type': 'household_items'}, {'num_of_letters': 5, 'type': 'household_items'}, {'num_of_letters': 12, 'type': 'emotion'}]
model.add_documents(user, api_key, items, metadata=metadata)

Description

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

Bert Image

Working in Colab

If you are using this in colab and want to save this so you don't have to reload, use:

import os 
os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/"
os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "COMPRESSED"

Model Versions

Model Table Vector Length
google/bert_cased_L-12_H-768_A-12 768
google/bert_cased_L-24_H-1024_A-16 1024
google/bert_chinese_L-12_H-768_A-12 768
google/bert_multi_cased_L-12_H-768_A-12 768
google/bert_uncased_L-12_H-768_A-12 768
google/bert_uncased_L-24_H-1024_A-16 1024

Limitations

  • NA

Training Corpora:

  • BooksCorpus (800M words)
  • Wikipedia (2,500M words)

Other Notes:

  • BERT was trained for 1M steps with a batch size of 128,000 words.