Saturday, June 15, 2013

Caching Youtube di Squid

Puas rasanya, akhirnya bisa caching youtube (dan yang similar) :D.

Setelah sekian bulan “mangkrak” akhirnya jalan juga walau masih di mesin virtualbox.
Howto ini bukan untuk para pembenci youtube dan googlemap.

Tapi untuk youtube dan googlemap lovers.

bahan referensi yang jadi bacaan.

http://www.mail-archive.com/squid-users@squid-cache.org/msg54605.html
http://www.mail-archive.com/squid-users@squid-cache.org/msg51076.html
http://wiki.squid-cache.org/Features/StoreUrlRewrite
http://wiki.squid-cache.org/Features/StoreUrlRewrite/RewriteScript

Versi yang saya pakai adalah squid-2.7.STABLE3, tidak tahu dukungan untuk versi yang lain.

1. buat script untuk manipulasi youtube. 

#!/usr/bin/perl
$|=1;
while (<>) {
@X = split;
$url = $X[0];
$url =~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)&.*@squid://videos.youtube.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)$@squid://videos.youtube.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/videodownload\?(.*)docid=(.*?)$@squid://videos.google.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/videodownload\?(.*)docid=(.*?)&.*@squid://videos.google.INTERNAL/ID=$3@;
print "$url\n"; }

2. Lalu di squid.conf-nya edit seperti yang dibawah ini:

acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
acl store_rewrite_list url_regex ^http://(.*?)/videodownload\?
cache allow store_rewrite_list

# Had to uncomment this again, because I couln'd login to google mail using IE6 (firefox had no trouble):
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
refresh_pattern ^http://(.*?)/videodownload\? 10080 90% 999999 override-expire ignore-no-cache ignore-private

storeurl_access allow store_rewrite_list
storeurl_access deny all

storeurl_rewrite_program /usr/local/bin/store_url_rewrite

Hasilnya bisa dilihat di access-log, pada saat mengakses video yang sama, akan langsung hit.

# grep youtube access.log | grep TCP_HIT

1214834411.379 735 192.168.1.89 TCP_HIT/200 1604459 GET http://youtube.com/get_video?video_id=2d55B-SiJdM&t=OEgsToPDskKrwAAE_vVIhOqMhPqmPDUQ - NONE/- video/flv
1214834487.090 818 192.168.1.94 TCP_HIT/200 1604459 GET http://youtube.com/get_video?video_id=2d55B-SiJdM&t=OEgsToPDskLGVqEnxKjLEN4DGA3HYGse - NONE/- video/flv
1214836269.353 4383 192.168.1.91 TCP_HIT/200 9533167 GET http://youtube.com/get_video?video_id=i6cKRT12jgw&t=OEgsToPDskKeQxYVvYZ7fgEIW4UNC_U- - NONE/- video/flv
1214836514.802 3757 192.168.1.91 TCP_HIT/200 9533167 GET http://youtube.com/get_video?video_id=i6cKRT12jgw&t=OEgsToPDskIEwsTb26LiGFc96hBUUa9Z - NONE/- video/flv
 
 
Satu pesan dari Horacio Herrera Gonzalez, karena basic scriptnya tidak spesifik ke url tertentu, maka :

Warning! This code may match other sites not related to YT or GV.

He he he he, watching your bandwidth.

Karena beberapa user merasa kesulitan untuk mengaplied caching youtube.

Langkah dibawah adalah urutan di server saya.
  1. Saya pakai distro TSL 3.05, dengan squid squid-2.7.STABLE3
  2. ./configure \
    --sysconfdir=/etc/squid \
    --prefix=/usr \
    --enable-async-io \
    --enable-removal-policies=lru,heap \
    --disable-delay-pools \
    --disable-wccp \
    --disable-wccp2 \
    --enable-kill-parent-hack \
    --enable-snmp \
    --enable-default-err-languages=English --enable-err-languages=English \
    --enable-linux-netfilter \
    --disable-auth
     
  3. config hasil parsing ^# dari squid.conf
    acl all src all
    acl manager proto cache_object
    acl localhost src 127.0.0.1/32
    acl to_localhost dst 127.0.0.0/8
    acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
    acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
    acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
    acl SSL_ports port 443
    acl Safe_ports port 80 # http
    acl Safe_ports port 21 # ftp
    acl Safe_ports port 443 # https
    acl Safe_ports port 70 # gopher
    acl Safe_ports port 210 # wais
    acl Safe_ports port 1025-65535 # unregistered ports
    acl Safe_ports port 280 # http-mgmt
    acl Safe_ports port 488 # gss-http
    acl Safe_ports port 591 # filemaker
    acl Safe_ports port 777 # multiling http
    acl CONNECT method CONNECT
    http_access allow manager localhost
    http_access deny manager
    http_access deny !Safe_ports
    http_access deny CONNECT !SSL_ports
    http_access allow localnet
    http_access deny all
    icp_access allow localnet
    icp_access deny all
    http_port 3128 transparent
    hierarchy_stoplist cgi-bin ?
    cache_mem 6 MB
    maximum_object_size_in_memory 32 KB
    memory_replacement_policy heap GDSF
    cache_replacement_policy heap LFUDA
    cache_dir aufs /nfs/cache 20000 16 256
    maximum_object_size 64 MB
    cache_swap_low 98
    cache_swap_high 99
    access_log /var/log/squid/access.log squid
    cache_log /var/log/squid/cache.log
    cache_store_log none
    log_fqdn off
    storeurl_rewrite_program /etc/squid/store_url_rewrite
    acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
    acl store_rewrite_list url_regex ^http://(.*?)/videodownload\?
    storeurl_access allow store_rewrite_list
    storeurl_access deny all
    cache allow store_rewrite_list
    acl QUERY urlpath_regex cgi-bin \?
    cache deny QUERY
    refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    refresh_pattern ^http://(.*?)/videodownload\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    refresh_pattern ^ftp: 1440 20% 10080
    refresh_pattern ^gopher: 1440 0% 1440
    refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
    refresh_pattern . 0 20% 4320
    quick_abort_min 0
    quick_abort_max 0
    quick_abort_pct 98
    acl apache rep_header Server ^Apache
    broken_vary_encoding allow apache
    vary_ignore_expire on
    cache_effective_user squid
    cache_effective_group squid
    log_icp_queries off
    ipcache_size 2048
    ipcache_low 98
    ipcache_high 99
    memory_pools off
    reload_into_ims on
    coredump_dir /usr/var/cache
    pipeline_prefetch on
Caching photobucket

Kontribusi apit (Ym-id relative_04), caching untuk photobucket yang banyak di pakai di friendster.

di store_url_rewrite

$url =~s@^http://(.*?)/albums\?&.*@squid://images.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?$@squid://images.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?&.*@squid://videos.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?$@squid://videos.photobucket.INTERNAL/ID=$3@;

di squid.conf

acl store_rewrite_list url_regex ^http://i(.*?).photobucket.com/albums/(.*?)/(.*?)/(.*?)\?
acl store_rewrite_list url_regex ^http://vid(.*?).photobucket.com/albums/(.*?)/(.*?)\?

refresh_pattern ^http://i(.*?).photobucket.com/albums/(.*?)/(.*?)/(.*?)\? 43200 90% 999999 override-expire ignore-no-cache ignore-private
refresh_pattern ^http://vid(.*?).photobucket.com/albums/(.*?)/(.*?)\? 43200 90% 999999 override-expire ignore-no-cache ignore-private

Hasilnya

TCP_HIT/200 5474813 GET http://vid264.photobucket.com/albums/ii163/shannonwiseman12/DSCN0212.flv - NONE/- text/plain



Update script

Diperkirakan youtube merubah sistem mereka, sekitar quartal pertama tahun 2009.
Akibatnya script diatas sudah tidak berfungsi, untuk mengatasinya perlu diubah script dan beberapa bagian di konfigurasi.

Untung saja sudah ada panduannya di http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube/Discussion

konfigurasi di bawah saya coba dimesin vmware dengan os centos 5.2, juli 2009

Untuk mempermudah saya sertakan squid.conf yang sudah dimodifikasi dan script url rewriternya.

acl all src all
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl to_localhost dst 127.0.0.0/8
acl localnet src 10.0.0.0/8
acl localnet src 172.16.0.0/12
acl localnet src 192.168.0.0/16
acl SSL_ports port 443
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443
acl Safe_ports port 70
acl Safe_ports port 210
acl Safe_ports port 1025-65535
acl Safe_ports port 280
acl Safe_ports port 488
acl Safe_ports port 591
acl Safe_ports port 777
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localnet
http_access deny all
icp_access allow localnet
icp_access deny all
http_port 3128 transparent
hierarchy_stoplist cgi-bin ?
cache_mem 6 MB
maximum_object_size_in_memory 32 KB
memory_replacement_policy heap GDSF
cache_replacement_policy heap LFUDA
cache_dir aufs /cache 20000 16 256
maximum_object_size 64 MB
cache_swap_low 98
cache_swap_high 99
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log none
log_fqdn off

#storeurl_rewrite_program /etc/squid/store_url_rewrite
#acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
#acl store_rewrite_list url_regex ^http://(.*?)/videoplayback\?

acl store_rewrite_list urlpath_regex \/(get_video\?|videodownload\?|videoplayback.*id) \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\? \/ads\?
acl store_rewrite_list_web url_regex ^http:\/\/([A-Za-z-]+[0-9]+)*\.[A-Za-z]*\.[A-Za-z]*
acl store_rewrite_list_path urlpath_regex \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)$

acl store_rewrite_list_web_CDN url_regex ^http:\/\/[a-z]+[0-9]\.google\.com doubleclick\.net
acl QUERY2 urlpath_regex get_video\? videoplayback\? \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?
cache allow QUERY2
cache allow store_rewrite_list_web_CDN

acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

storeurl_access allow store_rewrite_list
#this is not related to youtube video its only for CDN pictures
storeurl_access allow store_rewrite_list_web_CDN
storeurl_access allow store_rewrite_list_web store_rewrite_list_path
storeurl_access deny all
#rewrite_program path is base on windows so use use your own path
storeurl_rewrite_program /etc/squid/cacheyoutube2.pl
storeurl_rewrite_children 1
storeurl_rewrite_concurrency 10

refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
refresh_pattern ^http://(.*?)/videoplayback\? 10080 90% 999999 override-expire ignore-no-cache ignore-private

refresh_pattern -i (get_video\?|videoplayback\?id|videoplayback.*id) 161280 50000% 525948 override-expire ignore-reload
#and for pictures
refresh_pattern -i \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)(\?|$) 161280 3000% 525948 override-expire reload-into-ims

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
quick_abort_min 0
quick_abort_max 0
quick_abort_pct 98
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
vary_ignore_expire on
cache_effective_user squid
cache_effective_group squid
log_icp_queries off
ipcache_size 2048
ipcache_low 98
ipcache_high 99
memory_pools off
reload_into_ims on
coredump_dir /usr/var/cache
pipeline_prefetch on

sedangkan untuk storeurl programnya sebagai berikut

isi file cacheyoutube2.pl

#!/usr/bin/perl
$|=1;
while (<>) {
@X = split;
$x = $X[0];
$_ = $X[1];
$u = $X[1];

if (m/^http:\/\/([0-9.]{4}|www\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?(videoplayback\?id=.*?|video_id=.*?)\&(.*?)/) {
$z = $2; $z =~ s/video_id=/get_video?video_id=/; # compatible to old cached get_video?video_id
print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $z . "\n";
# new youtube

} elsif (m/^http:\/\/([0-9.]{4}|www\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(id=[a-zA-Z0-9]*)/) {
print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "\n";

} elsif (m/^http:\/\/www\.google-analytics\.com\/__utm\.gif\?.*/) {
print $x . "http://www.google-analytics.com/__utm.gif\n";
#cache high latency ads
} elsif (m/^http:\/\/(.*?)\/(ads)\?(.*?)/) {
print $x . "http://" . $1 . "/" . $2 . "\n";

# spicific servers starts here....
} elsif (m/^http:\/\/(www\.ziddu\.com.*\.[^\/]{3,4})\/(.*?)/) {
print $x . "http://" . $1 . "\n";
#rapidshare
} elsif ( ($u =~ /rapidshare/) && (m/^http:\/\/(([A-Za-z]+[0-9-.]+)*?)([a-z]*\.[^\/]{3}\/[a-z]*\/[0-9]*)\/(.*?)\/([^\/\?\&]{4,})$/)) {
print $x . "http://cdn." . $3 . "/SQUIDINTERNAL/" . $5 . "\n";

} elsif ( ($u =~ /maxporn/) && (m/^http:\/\/([^\/]*?)\/(.*?)\/([^\/]*?)(\?.*)?$/)) {
# $z = $1; $z =~ s/[A-Za-z]+[0-9-.]+/cdn/;
print $x . "http://" . $1 . "/SQUIDINTERNAL/" . $3 . "\n";

#like porn hub variables url and center part of the path, filename etention 3 or 4 with or withour ? at the end
} elsif ( ($u =~ /tube8|pornhub/) && (m/^http:\/\/(([A-Za-z]+[0-9-.]+)*?)\.([a-z]*[0-9]?\.[^\/]{3}\/[a-z]*)(.*?)((\/[a-z]*)?(\/[^\/]*){4}\.[^\/\?]{3,4})(\?.*)?$/)) {
print $x . "http://cdn." . $3 . $5 . "\n";
#...spicific servers end here.
#general purpose for cdn servers. add above your specific servers.
} elsif (m/^http:\/\/([0-9.]*?)\/\/(.*?)\.(.*)\?(.*?)/) {
print $x . "http://squid-cdn-url//" . $2 . "." . $3 . "\n";
#for yimg.com
} elsif (m/^http:\/\/(.*?)\.yimg\.com\/(.*?)\.yimg\.com\/(.*?)\?(.*?)/) {
print $x . "http://cdn.yimg.com/" . $3 . "\n";
#generic http://variable.domain.com/path/filename."ext" or "exte" with or withour "?"
} elsif (m/^http:\/\/( ([A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*?)\.([^\/\?\&]{3,4})(\?.*)?$/) {
print $x . "http://cdn." . $3 . "." . $4 . "/" . $5 . "." . $6 . "\n";
# generic http://variable.domain.com/...
} elsif (m/^http:\/\/( ([A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*)$/) {
print $x . "http://cdn." . $3 . "." . $4 . "/" . $5 . "\n";
# spicific extention that ends with ?
} elsif (m/^http:\/\/(.*?)\/(.*?)\.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv|on2)\?(.*)/) {
print $x . "http://" . $1 . "/" . $2 . "." . $3 . "\n";
# all that ends with ;
} elsif (m/^http:\/\/(.*?)\/(.*?)\;(.*)/) {
print $x . "http://" . $1 . "/" . $2 . "\n";

} else {
print $x . $_ . "\n";
}
}

Jangan lupa di chmod +x agar file perl-nya bisa di exekusi.

0 comments:

Post a Comment

Please Enable JavaScript!
Mohon Aktifkan Javascript![ Enable JavaScript ]