Release date: 2020-03-11
Vector length: 1024 (default)
#pip install vectorhub[encoders-audio-tfhub] from vectorhub.encoders.audio.tfhub import Yamnet2Vec model = Yamnet2Vec() sample = model.read('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_2.wav') model.encode(sample)
Index and search your vectors easily on the cloud using 1 line of code!
username = '<your username>' email = '<your email>' # You can request an api_key using - type in your username and email. api_key = model.request_api_key(username, email) # Index in 1 line of code items = ['https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_99.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_10.wav', 'https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_5.wav'] model.add_documents(user, api_key, items) # Search in 1 line of code and get the most similar results. model.search('https://vecsearch-bucket.s3.us-east-2.amazonaws.com/voices/common_voice_en_69.wav') # Add metadata to your search metadata = None model.add_documents(user, api_key, items, metadata=metadata)
YAMNet is an audio event classifier that takes audio waveform as input and makes independent predictions for each of 521 audio events from the AudioSet ontology. The model uses the MobileNet v1 architecture and was trained using the AudioSet corpus. This model was originally released in the TensorFlow Model Garden, where we have the model source code, the original model checkpoint, and more detailed documentation. This model can be used:
If you are using this in colab and want to save this so you don't have to reload, use:
import os os.environ['TFHUB_CACHE_DIR'] = "drive/MyDrive/" os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "COMPRESSED"
YAMNet's classifier outputs have not been calibrated across classes, so you cannot directly treat the outputs as probabilities. For any given task, you will very likely need to perform a calibration with task-specific data which lets you assign proper per-class score thresholds and scaling. YAMNet has been trained on millions of YouTube videos and although these are very diverse, there can still be a domain mismatch between the average YouTube video and the audio inputs expected for any given task. You should expect to do some amount of fine-tuning and calibration to make YAMNet usable in any system that you build.