{"id":2292,"date":"2022-01-06T12:52:59","date_gmt":"2022-01-06T07:22:59","guid":{"rendered":"https:\/\/hubhopper.com\/blog\/?p=2292"},"modified":"2025-05-01T16:37:43","modified_gmt":"2025-05-01T11:07:43","slug":"using-acrcloud-to-identify-copyrighted-content-in-episodes-hubhopper","status":"publish","type":"post","link":"https:\/\/hubhopper.com\/blog\/using-acrcloud-to-identify-copyrighted-content-in-episodes-hubhopper\/","title":{"rendered":"Using ACRCloud to identify copyrighted content in episodes"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Using ACRCloud to solve copyrighted content<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">For a while, we have had an influx of creators at Hubhopper who unknowingly or knowingly uploaded copyrighted content into their episodes. As we scaled our operations, the process of identifying such content and flagging it manually became tedious. It had to be automated and built for scale. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">We had all experienced applications like Shazam and Google Assistants\u2019 &#8220;What&#8217;s this song?&#8221; feature. It was time for us to incorporate such a feature into our own platform. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">While looking for a solution, our needs not only required a robust algorithm to recognise content but also an extensive database of content to be checked against. After looking through multiple integration opportunities, we decided to go with <a style=\"color: #ff8933;\" href=\"https:\/\/www.acrcloud.com\/music-recognition\/\" target=\"_blank\" rel=\"noopener\">ACRCloud<\/a>, which ticked all our requirements and the industry spoke highly of them. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">They provide a simple API solution that helps you upload audio and check for copyrighted content within seconds. <\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Setting out the process for this was relatively straightforward. We already had an audio processing that took place after a creator uploaded a file. All that was needed was to fit ACRCloud into that process.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">The first problem to solve before calling their API was to segment the audio file into chunks. ACRCloud recommends the chunk length to be 10-15 seconds.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">We solved this by using ffmpeg.<\/span><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def make_audio_chunks(file_path, folder):\n\n    CHUNK_DURATION = 10\n\n    if not os.path.isdir(f&#039;\/tmp\/{folder}&#039;):\n        try:\n            os.makedirs(f&#039;\/tmp\/{folder}&#039;)\n        except Exception as e:\n            return False\n\n    os.system(\n        f&quot;\/opt\/ffmpeglib\/ffmpeg -i {file_path} -map 0 -segment_time {CHUNK_DURATION} \\\n        -f segment -c copy \/tmp\/{folder}\/output_%09d.mp3 -loglevel quiet&quot;)\n\n    return True<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n<\/p>\n<p><span style=\"font-weight: 400;\">The function takes the file path, audio file as arguments and creates a temporary folder with all the audio chunks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We then pick the audio chunk we need from the segmented folder and run it with a POST request to ACRCloud \u2018\/identify\u2019 endpoint. The selected chunk should be under 1MB, so a lower bitrate is recommended. The raw audio data is to be prepared as a base64 string for it to be appended to the request body.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since the endpoint requires each upload to be verified, the requested data is concatenated as a base64 string with a SHA1 hash to create a unique signature.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The signature and the base64 audio string are appended with other data to a URL encoded dictionary and sent out as request data. <\/span><\/p>\n\n\n<pre class=\"wp-block-code\"><code>def check_acr_cloud(audio_segment):\n    access_key = &#039;acr_access_key&#039;\n    access_secret = &#039;acr_access_secret&#039;\n    requrl = &#039;acr_host_url&#039;\n\n    f = open(audio_segment, &quot;rb&quot;)\n    sample_bytes = os.path.getsize(audio_segment)\n    content = f.read()\n    f.close()\n\n    http_method = &quot;POST&quot;\n    http_uri = &quot;\/v1\/identify&quot;\n    data_type = &quot;audio&quot;\n    signature_version = &quot;1&quot;\n    timestamp = time.time()\n\n    string_to_sign = http_method+&quot;\\n&quot;+http_uri+&quot;\\n&quot;+access_key + \\\n        &quot;\\n&quot;+data_type+&quot;\\n&quot;+signature_version+&quot;\\n&quot;+str(timestamp)\n\n    sign = base64.b64encode(hmac.new(access_secret.encode(&#039;ascii&#039;), string_to_sign.encode(&#039;ascii&#039;),\n                                     digestmod=hashlib.sha1).digest()).decode(&#039;ascii&#039;)\n\n    test_data = {&#039;access_key&#039;: access_key,\n                 &#039;sample_bytes&#039;: sample_bytes,\n                 &#039;sample&#039;: base64.b64encode(content),\n                 &#039;timestamp&#039;: str(timestamp),\n                 &#039;signature&#039;: sign,\n                 &#039;data_type&#039;: data_type,\n                 &quot;signature_version&quot;: signature_version}\n\n    test_data_urlencode = urllib.parse.urlencode(test_data).encode(&quot;utf-8&quot;)\n\n    req = urllib.request.Request(url=requrl, data=test_data_urlencode)\n\n    res_data = urllib.request.urlopen(req, timeout=30)\n    res = res_data.read().decode(&#039;utf-8&#039;)\n    res = json.loads(res)\n\n    if &#039;result_type&#039; in res:\n        if res&#091;&#039;result_type&#039;] == 0:\n            if res&#091;&#039;metadata&#039;]:\n                print(&quot;metadata&quot;, res&#091;&#039;metadata&#039;])\n            return True\n        elif res&#091;&#039;result_type&#039;] == 1001:\n            return False\n        elif res&#091;&#039;result_type&#039;] == 3003:\n            print(&#039;ACR LIMIT EXCEEDED. UPGRADE ACRCLOUD ACCOUNT&#039;)\n            return False\n    else:\n        return False<\/code><\/pre>\n\n\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The ACRCloud API sends the response back if a track contains a copyright infringement with all the track information, which can be then decoded as a JSON.<\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>Using ACRCloud to solve copyrighted content For a while, we have had an influx of creators at Hubhopper who unknowingly or knowingly uploaded copyrighted content &hellip; <\/p>\n","protected":false},"author":6,"featured_media":2330,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"om_disable_all_campaigns":false,"footnotes":""},"categories":[255],"tags":[],"class_list":["post-2292","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hubhopper-news-updates"],"acf":[],"_dp_original":null,"_oembed_3a87f2a5394252799bc1f55f1d66e857":null,"_oembed_44547d1fedb63de41203bcbda4b31c85":null,"_oembed_9fdfb869226e5e3d4523cafec891236e":null,"_oembed_c2633143ea8c6fb09c1951fb80526f73":null,"_oembed_cec5e7e844cda99b4f3d55f48552a2a3":null,"_oembed_f969b907ddea83f6b812b18531e275be":null,"_oembed_time_3a87f2a5394252799bc1f55f1d66e857":null,"_oembed_time_9fdfb869226e5e3d4523cafec891236e":null,"_oembed_time_cec5e7e844cda99b4f3d55f48552a2a3":null,"_oembed_time_f969b907ddea83f6b812b18531e275be":null,"_thumbnail_id":"2330","_wp_desired_post_slug":null,"_wp_old_date":null,"_wp_trash_meta_status":null,"_wp_trash_meta_time":null,"_yoast_wpseo_primary_category":"255","_yoast_wpseo_title":"Using ACRCloud to identify copyrighted content in episodes - Hubhopper","enclosure":null,"medium_post":{"author_image_url":null,"author_url":null,"byline_name":null,"byline_email":null,"cross_link":null,"id":null,"follower_notification":null,"license":null,"publication_id":null,"status":null,"url":null},"nb_of_words":null,"_edit_last":"10","BS_author_type":"BS_author_is_user","BS_guest_author_name":"","BS_guest_author_url":"","_links":{"self":[{"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/posts\/2292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/comments?post=2292"}],"version-history":[{"count":22,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/posts\/2292\/revisions"}],"predecessor-version":[{"id":3854,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/posts\/2292\/revisions\/3854"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/media\/2330"}],"wp:attachment":[{"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/media?parent=2292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/categories?post=2292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hubhopper.com\/blog\/wp-json\/wp\/v2\/tags?post=2292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}