pip install vectorhub[encoders-audio-tfhub]


Release date: 2020-03-11

Vector length: 128 (default)




#pip install vectorhub[encoders-audio-tfhub]
from import Vggish2Vec
model = Vggish2Vec()
sample ='')

Index and search vectors

Index and search your vectors easily on the cloud using 1 line of code!

username = '<your username>'
email = '<your email>'
# You can request an api_key using - type in your username and email.
api_key = model.request_api_key(username, email)

# Index in 1 line of code
items = ['', '', '', '']
model.add_documents(user, api_key, items)

# Search in 1 line of code and get the most similar results.'')

# Add metadata to your search
metadata = None
model.add_documents(user, api_key, items, metadata=metadata)


An audio event embedding model trained on the YouTube-8M dataset. VGGish should be used:

  • as a high-level feature extractor: the 128-D embedding output of VGGish can be used as the input features of another shallow model which can then be trained on a small amount of data for a particular task. This allows quickly creating specialized audio classifiers without requiring a lot of labeled data and without having to train a large model end-to-end.
  • as a warm start: the VGGish model parameters can be used to initialize part of a larger model which allows faster fine-tuning and model exploration.

Working in Colab

If you are using this in colab and want to save this so you don't have to reload, use:

import os 
os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/"


VGGish has been trained on millions of YouTube videos and although these are very diverse, there can still be a domain mismatch between the average YouTube video and the audio inputs expected for any given task. You should expect to do some amount of fine-tuning and calibration to make VGGish usable in any system that you build.