Release date: 2020-03-11
Vector length: 128 (default)
#pip install vectorhub[encoders-audio-tfhub] from vectorhub.encoders.audio.tfhub import Vggish2Vec model = Vggish2Vec() sample = model.read('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_2.wav') model.encode(sample)
Index and search your vectors easily on the cloud using 1 line of code!
username = '<your username>' email = '<your email>' # You can request an api_key using - type in your username and email. api_key = model.request_api_key(username, email) # Index in 1 line of code items = ['https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_99.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_10.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_5.wav'] model.add_documents(user, api_key, items) # Search in 1 line of code and get the most similar results. model.search('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav') # Add metadata to your search metadata = None model.add_documents(user, api_key, items, metadata=metadata)
An audio event embedding model trained on the YouTube-8M dataset. VGGish should be used:
If you are using this in colab and want to save this so you don't have to reload, use:
import os os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/" os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "COMPRESSED"
VGGish has been trained on millions of YouTube videos and although these are very diverse, there can still be a domain mismatch between the average YouTube video and the audio inputs expected for any given task. You should expect to do some amount of fine-tuning and calibration to make VGGish usable in any system that you build.