Using audio recognition to solve copyrighted content
For a while, we have had an influx of creators at Hubhopper who unknowingly or knowingly uploaded copyrighted content into their episodes. As we scaled our operations, the process of identifying such content and flagging it manually became tedious. It had to be automated and built for scale.
We had all experienced applications like Shazam and Google Assistants’ “What’s this song?” feature. It was time for us to incorporate such a feature into our own platform.
While looking for a solution, our needs not only required a robust algorithm to recognise content but also an extensive database of content to be checked against. After looking through multiple integration opportunities, we decided to go with ACRCloud, which ticked all our requirements and the industry spoke highly of them.
They provide a simple API solution that helps you upload audio and check for copyrighted content within seconds.
Setting out the process for this was relatively straightforward. We already had an audio processing that took place after a creator uploaded a file. All that was needed was to fit ACRCloud into that process.
The first problem to solve before calling their API was to segment the audio file into chunks. ACRCloud recommends the chunk length to be 10-15 seconds.
We solved this by using ffmpeg.
def make_audio_chunks(file_path, folder):
CHUNK_DURATION = 10
if not os.path.isdir(f'/tmp/{folder}'):
try:
os.makedirs(f'/tmp/{folder}')
except Exception as e:
return False
os.system(
f"/opt/ffmpeglib/ffmpeg -i {file_path} -map 0 -segment_time {CHUNK_DURATION} \
-f segment -c copy /tmp/{folder}/output_%09d.mp3 -loglevel quiet")
return True
The function takes the file path, audio file as arguments and creates a temporary folder with all the audio chunks.
We then pick the audio chunk we need from the segmented folder and run it with a POST request to ACRCloud ‘/identify’ endpoint. The selected chunk should be under 1MB, so a lower bitrate is recommended. The raw audio data is to be prepared as a base64 string for it to be appended to the request body.
Since the endpoint requires each upload to be verified, the requested data is concatenated as a base64 string with a SHA1 hash to create a unique signature.
The signature and the base64 audio string are appended with other data to a URL encoded dictionary and sent out as request data.
def check_acr_cloud(audio_segment):
access_key = 'acr_access_key'
access_secret = 'acr_access_secret'
requrl = 'acr_host_url'
f = open(audio_segment, "rb")
sample_bytes = os.path.getsize(audio_segment)
content = f.read()
f.close()
http_method = "POST"
http_uri = "/v1/identify"
data_type = "audio"
signature_version = "1"
timestamp = time.time()
string_to_sign = http_method+"\n"+http_uri+"\n"+access_key + \
"\n"+data_type+"\n"+signature_version+"\n"+str(timestamp)
sign = base64.b64encode(hmac.new(access_secret.encode('ascii'), string_to_sign.encode('ascii'),
digestmod=hashlib.sha1).digest()).decode('ascii')
test_data = {'access_key': access_key,
'sample_bytes': sample_bytes,
'sample': base64.b64encode(content),
'timestamp': str(timestamp),
'signature': sign,
'data_type': data_type,
"signature_version": signature_version}
test_data_urlencode = urllib.parse.urlencode(test_data).encode("utf-8")
req = urllib.request.Request(url=requrl, data=test_data_urlencode)
res_data = urllib.request.urlopen(req, timeout=30)
res = res_data.read().decode('utf-8')
res = json.loads(res)
if 'result_type' in res:
if res['result_type'] == 0:
if res['metadata']:
print("metadata", res['metadata'])
return True
elif res['result_type'] == 1001:
return False
elif res['result_type'] == 3003:
print('ACR LIMIT EXCEEDED. UPGRADE ACRCLOUD ACCOUNT')
return False
else:
return False
The ACRCloud API sends the response back if a track contains a copyright infringement with all the track information, which can be then decoded as a JSON.