VGGish

pip install vectorhub[encoders-audio-tfhub]

Details

Release date: 2020-03-11

Vector length: 128 (default)

Repo: https://tfhub.dev/google/vggish/1

Paper:

Example

#pip install vectorhub[encoders-audio-tfhub]
from vectorhub.encoders.audio.tfhub import Vggish2Vec
model = Vggish2Vec()
sample = model.read('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_2.wav')
model.encode(sample)

Index and search vectors

Index and search your vectors easily on the cloud using 1 line of code!

username = '<your username>'
email = '<your email>'
# You can request an api_key using - type in your username and email.
api_key = model.request_api_key(username, email)

# Index in 1 line of code
items = ['https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_99.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_10.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_5.wav']
model.add_documents(user, api_key, items)

# Search in 1 line of code and get the most similar results.
model.search('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav')

# Add metadata to your search
metadata = None
model.add_documents(user, api_key, items, metadata=metadata)

Description

An audio event embedding model trained on the YouTube-8M dataset. VGGish should be used:

  • as a high-level feature extractor: the 128-D embedding output of VGGish can be used as the input features of another shallow model which can then be trained on a small amount of data for a particular task. This allows quickly creating specialized audio classifiers without requiring a lot of labeled data and without having to train a large model end-to-end.
  • as a warm start: the VGGish model parameters can be used to initialize part of a larger model which allows faster fine-tuning and model exploration.

Working in Colab

If you are using this in colab and want to save this so you don't have to reload, use:

import os 
os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/"
os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "COMPRESSED"

Limitations

VGGish has been trained on millions of YouTube videos and although these are very diverse, there can still be a domain mismatch between the average YouTube video and the audio inputs expected for any given task. You should expect to do some amount of fine-tuning and calibration to make VGGish usable in any system that you build.