A little introduction
Everything started from an non-planing stuff done on #opsyria. To give you some
context, we have a bot there, named ii, that's help us with information
management.
Birth and death of a bot
ii's birth dates back to the second phase of opsyria, the phase were we go wild
and try to get some contacts with Syrians. It was first a greetings bots,
telling new comers some safety tips in Syrian (because we still do not speak
Syrian).
Then, we fired up a tweeter account, and so, we add twitter functions to ii. And
status.net also (for our status.net platform). And then,
we added it the possibility to repeat interesting stuff ii saw on those platform
(publishing on IRC the thing he saw in its following list on both platforms).
Then, we had some problem with the micro bloging thing. 140 characters is short,
especially when you use arabic and weird unicode chars. So, we build a news
functionality, that leads us to our news website where
we still publish real time news form the ground, due to our contacts help.
After that, things went crazy. Lots of videos were posted online and we
started indexing them. here came the videos functionality (and later on the
pics one, same thing, but with pictures) and we started building an index of all
videos related to Syrian events.
So, this is how we built on 6 month, our database of information, with dates,
places and comments of each videos, pictures or news we can find. We build
different websites using these and, one day, we realized that, it could be nice
for preservation of the data, to extract them from the website they are located
to be sure they will always be online.
We had fears that Syrian officials (or Assad's supporters) could manage to get
youtube or facebook accounts closed, and then have the videos unavailable and
lost for everyone.
The archiving idea
At the 28C3, we already had a somewhat big databases. And a script that could
download each video, and stores them on a website, as 'static file' with a
non-friendly user interface (apache directory listing) located here:
http://syria-videos.ceops.eu/
Some journalists just told us that it was nice, but not really usable (no way
to easily parse stuff, or to find events related to one particular date, and so
on). So, we started to think about how we could do that.
Parsing it by hand was out of questions, there was more than 600 videos, that is
more than 4GB of files to watch, and some of them are harsh and crude to watch.
Besides, we're still unable to understand arabic in the text, so the only data
we could use was the one in the flat files provided by ii.
Let's compile html
And, at the time, I was playing a lot with ikiwiki, which is a markdown
compilation to build static html page. So, I started looking at that. After all,
it can generate html5, so it should be easy to add some <video> tag inside a
template, generating the pages form flat text is easy to do in bash and then, I
just have to use git to push it and make the magic of ikiwiki works.
We will have pure html website, with smart URL, easily mirrorable (hey, no
?static=yes&wtf=ya&unknownparam&yetanotherfrckingstuff url, just 2012/02/11 for
the 11st of February of 2012 events page), with a tagging system and full html5.
This was the concept. And since ikiwiki provides a local.css system, we could
even asks gently and harass some designers to have a logo and some design around
it (I can leave with pure HTML, but a lot of people do like fancy and rounded
stuff...)
Enough talk, do it
So, first, installing what we need. I'm on a debian openvz squeeze kernel and
I'm gonna use nginx to serve it. Ineed to add the unstable version of ffmpeg to
support .ogv
aptitude install ikiwiki nginx ffmpeg
Th setup of ikiwiki is preety easy to do, I'll paste you all the uncommented
line of TelecomixBroadcastSystem.setup:
So, let's start with some naming stuff, the name of the wiki, the mail of the
admin and the username of the admin/
wikiname => 'Telecomix Broadcast System',
adminemail => 'okhin@bloum.net';
adminuser => [qw{a_user_admin}],
Since there's no user function available, this should be empty.
banned_users => [],
Where I'll puth the markdown files
srcdir => '/var/ikiwiki/TelecomixBroadcastSystem',
Where ikiwki will put the
destdir => '/var/www/tbs',
What will be teh url of the website
url => 'http://broadcast.telecomix.org',
The plugins I wanna add. Goodstuff is a package with a lot of usefull plugins
for ikiwki. The goodstuff plugins page
on ikiwiki website will give you more details.
I wanted a sidebar (for hosting the navigation), a calendar (to enable the
calendar generation) and a favicon (because they are nice). As I do not want the
site to be editable, I deactivate the recentchanges plugin.
add_plugins => [qw{goodstuff sidebar calendar favicon}],
disable_plugins => [qw{recentchanges}],
Some system directory and default that I've kept.
templatedir => '/usr/share/ikiwiki/templates',
underlaydir => '/usr/share/ikiwiki/basewiki',
indexpages => 0,
discussionpage => 'Discussion',
default_pageext => 'mdwn',
timeformat => '%c',
numbacklinks => 10,
hardlink => 0,
wiki_file_chars => '-[:alnum:]+/.:_',
allow_symlinks_before_srcdir => 0,
HTML 5 is nice and fun to play with, we should use it more
html5 => 1,
A link for the post-update git wrapper (that is, once the repo received an
update, automatically generates the new wiki)
git_wrapper => '/var/git/TelecomixBroadcastSystem.git/hooks/post-update',
atom => 1,
I want a sidebar for all the pages
global_sidebars => 1,
I want to autogenerate tagpage, and to stores them in the tag/ directory.
tagbase => 'tag',
tag_autocreate => 1,
There's a lot more things to change, but you should have a look at the ikiwiki
documentation.
Now, we have to create the various directory
''/var/ikiwiki/TelecomixBroadcastSystem'' and ''/var/www/tbs'', making them
writable and owned by the user you're going to use to generate it, and to give
''/var/www/tbs'' permission to be read by the nginx user.
And let(s setup the wiki:
ikiwiki --setup /path/to/your/Wiki.setup file
Let's tweak some templates
So, now, I need some templates to work with the videos repo. One for video, one
for pictures (to add a specific CSS class around them), and one for the
'regular' page, because I wanted a logo in top of all of them.
Video template
I added a ''template'' directory into the wiki root (so,
//var/ikiwiki/TelecomixBroadcastSystem/template) and I create the video.tmpl
file.
The tempaltes of ikiwiki use the
HTML::Toolkit
system to create the needed templates, and the one I need were realtively
simples one. OI think comments are not needed
<article class="video">
<video controls="controls" type="video/ogg" width="480" src="/videos/<TMPL_VAR file>" poster="/pics/SVGs/tbs_V1.svg"><TMPL_VAR alt></video>
<p><TMPL_VAR alt></p>
<p><a href="/videos/<TMPL_VAR file>">Direct Link to the file</a> ||
<a href="<TMPL_VAR original>">Original link</a></p>
</article>
So, fixed width video, in HTML5, the files must be in a /videos/ webdir and
there will be a poster displayed on the video before playing it with one nice
logos. Some more links to add context, and we're set-up.
Notice the mime format used here: video/ogg, I want to use really free web
format, that will need transcoding (but that's a later problem). The same goes
for the pictrues template.
Page template
So, the page template is a huge (and complex) one, so just a patch:
--- templates/page.tmpl 2012-03-07 15:35:45.000000000 +0000
+++ /usr/share/ikiwiki/templates/page.tmpl 2011-03-28 23:46:08.000000000
+0000
@@ -30,7 +30,6 @@
</head>
<body>
-<div id="logo"><a href="/" title="Dirty Bytes of Revolutions Since 1337"><img src="./media/868997d1.tbs_V2.png" data-original-source="http://about.okhin.fr/pics/PNGs/tbs_V2.png" alt="Dirty Bytes of Revolutions Since 1337" /></a></div>
<TMPL_IF HTML5><article class="page"><TMPL_ELSE><div class="page"></TMPL_IF>
<TMPL_IF HTML5><section class="pageheader"><TMPL_ELSE><div class="pageheader"></TMPL_IF>
@@ -134,7 +133,6 @@
</TMPL_UNLESS>
</div>
-<div class="clearfix"></div>
<TMPL_IF HTML5><footer id="footer" class="pagefooter"><TMPL_ELSE><div id="footer" class="pagefooter"></TMPL_IF>
<TMPL_UNLESS DYNAMIC>
The clearfix div is here for the goddamn IE browser (at least, that's why the
CSS integrator guy told me). And above, there's the pictures.
Let's build special pages
Sidebar.mdwn
So, the sidebar plugins, grants me the use of a sidebar.mdwn file in the root
folder of the wiki.
First, some useful links (back to home, the pure text news and our webchat)
\# Quick Links
\* \[Back to Home\](/index.html)
\* \[News from the ground\](http://syria.telecomix.org)
\* \[Webchat\](https://new.punkbob.com/chat)
What did happened this month
\# This month events
[[!calendar type="month" pages="2011/* or 2012/*"\]]
And all the page since the start of the year.
\# Events month by month
[[!calendar type="year" year="2011" pages="2011/*"\]]
[[!calendar type="year" year="2012" pages="2012/*"\]]
Index.mdwn
Next step is to build a nice index.mdwn page with some speech, the tag cloud and
a global map of everything. I'll skip to the interesting parts (maps and
tagcloud).
Thepage list use the map directive to find all the page under 2011 and 2012
directories (one per year), that will lead to a list of all the daily pages
# Page list
[[!map pages="2011/* or 2012/*"\]]
This will go through all of the tag of the page, and do some computational to
generate a nice cloud
# Tag cloud
[[!pagestats pages="tag/*" \]]
Fancyness
I then added a favicon.ico file along with a local.css to the repository, the
local.css need to be copied manually into the ''/var/www/tbs'' directory. And
now, the basic setup is done.
Commiting
So, now use git to add all those files and commit and push them. Easy to do,
that will generates some files into /var/www/tbs/.
Yeepee, now, we need to populate this.
Bashing accross videos
So, I have a list of videos soemwhere
here of the form:
2011-12-04 homs/al-meedan http://www.youtube.com/watch?v=-qjNo0uqSM8 Random gunfires during the night
(And yes, sometimes, Arabic characters all over the place). So, I have, date,
location (that will be used for tags), URL and some comments to add. Thanks to
ii's magic (and the huge work done for month). We already add some python
scripts for downloading the video, but, for this kind of things, I wanted to use
something I know: bash. It will be split in 2. One half to parse the youtube's hell
pages and to download the .webm, this part is still inpython, works well and I
was too lazy to rewrite it; the second half will get the video info and add the
necessary information to the wiki.
And then, I'll need to transcode it.
So, script. Let's start with some variable, will need them later
#!/bin/bash
# We want to download everything.
export VIDEOS_LINK='https://telecomix.ceops.eu/material/ii/videos.txt'
export VIDEOS_RAW_DIR='/var/tbs/tbs/raw/'
export VIDEOS_OGV_DIR='/var/tbs/tbs/videos/'
export VIDEOS_WIKI_ROOT='/var/ikiwiki/TelecomixBroadcastSystem'
export VIDEOS_LIST=${VIDEOS_WIKI_ROOT}/videos.lst
export VIDEOS_NEW=${VIDEOS_WIKI_ROOT}/new_videos.lst
Let's make some cleaning, and backup, needed to now what's new
<span class="createlink"><a href="/ikiwiki.cgi?page=_-e___36____123__VIDEOS_LIST__125__.old_&from=posts%2FBroadcasting&do=create" rel="nofollow">?</a> -e ${VIDEOS LIST}.old </span> && rm -rf ${VIDEOS_LIST}.old
<span class="createlink"><a href="/ikiwiki.cgi?page=_-e___36__VIDEOS_LIST_&from=posts%2FBroadcasting&do=create" rel="nofollow">?</a> -e $VIDEOS LIST </span> && mv $VIDEOS_LIST ${VIDEOS_LIST}.old
Get the new version of the file list
cd $VIDEOS_WIKI_ROOT
wget $VIDEOS_LINK --no-check-certificate -O $VIDEOS_LIST
Update the git repository (we probably add tags since last time, so new pages)
and find the new videos part (a dirty diff, with only the added lines).
git pull 2>&1 > /dev/null
diff -N $VIDEOS_LIST ${VIDEOS_LIST}.old | grep -e '^<' > $VIDEOS_NEW
Loop in all the news videos to add them to the wiki.
while read LINE
do
This is a bash array if you did not know how they worked
VIDEO=( $LINE )
DATE=${VIDEO[1]}
TTAGS=${VIDEO[2]}
Let's split TAGS in different words separated by space not by slash
TAGS=$(echo $TTAGS | tr '/' ' ')
LINK=${VIDEO[3]}
This is how I get the same thing than [4:] in python (from 4th fields to the end
of teh array)
COMMENTS=${VIDEO[@]:4:${#VIDEO[@]}}
The date is YYYY-MM-DD in the file, I want it to be YYYY/MM/DD for creating my
file in the good place (YYYY/MM/DD.mdwn), like that I have an automagick
hierarchy, plus, you can get to /2012/02/14 URL quite easily.
The filename is the video link with only alphanumeric characters, will be good
enough for me.
VIDEO_PATH=$(echo ${DATE}.mdwn | tr '-' '/')
VIDEO_FILENAME=$(echo $LINK | tr -dc '[:alnum:]')
So, if the directory (which is YYYY/MM) dos not exist, let's create it. If the
file does not exist, it means this is the first time we see something for the
day. We must create the page, and add some stuff (notably the date of creation
must be juked, also we add a nice title). Once the file is create, git add it to
the repo.
# We have only updates which is nice, no need to check if the videos already exist
<span class="createlink"><a href="/ikiwiki.cgi?page=___33___-d___36____40__dirname___36____123__VIDEOS_WIKI_ROOT__125__%2F__36____123__VIDEO_PATH__125____41___&from=posts%2FBroadcasting&do=create" rel="nofollow">?</a>${VIDEO PATH}) </span> && mkdir -p $(dirname ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH})
if [ ! -e ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH} ]
then
echo "[[!meta title=\"TBS - $DATE\" \]]" > ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}
echo "[[!meta date=\"$DATE\" \]]" >> ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}
git add ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}
fi
Add some tags to the page, along with the video template (one line, really fun),
note the .ogv part added to the filename.
echo "[[!tag $TAGS\]]" >> ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}
echo "[[!template id=\"video\" file=\"/${VIDEO_FILENAME}.ogv\" alt=\"$COMMENTS\" original=\"$LINK\"\]]" >> ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}
And now, download the file. I need to add a dot at the end of it, because the
download scripts add the extension (without the .) to the file. I download it
in a raw dir, where I'll next transcode all the video into the proper format and
directory.
# And now, download it
python ${VIDEOS_WIKI_ROOT}/scripts/multiproc_videos_dl.py ${VIDEOS_RAW_DIR} "${VIDEOS_RAW_DIR}/${VIDEO_FILENAME}." "$LINK" 2>&1 > /dev/null &
done < $VIDEOS_NEW
Commit al the change at once, and push it.
# While we're at it, just publish the file
git commit -a -m "VIDEO updated" 2>&1 > /dev/null
git push 2>&1 > /dev/null
We're done, just transcoding now, which is pretty easy, and done in another
script. Nothing special here, looping across all the file in raw dir to
transcode them into the video dir.
#!/bin/bash
# Transcoding a video into ogv
export ORIG='/var/tbs/tbs/raw'
export DEST='/var/tbs/tbs/videos'
for RAW in $(ls -1 $ORIG)
do
NAME=${RAW%.*}
echo "transcoding $NAME"
<span class="createlink"><a href="/ikiwiki.cgi?page=_-e___36__DEST%2F__36____123__NAME__125__.ogv_&from=posts%2FBroadcasting&do=create" rel="nofollow">?</a>${NAME}.ogv </span> || ffmpeg -i $ORIG/$RAW -acodec libvorbis -ac 2 -ab 96k -b 345k -s 640x360 $DEST/${NAME}.ogv
rm $ORIG/$RAW
done
Bashing across pictures
Same format as video, so same scripts, almost. Won't detail it, just do sed
VIDEO/PICTURE and you're almost done. Also, the dl is done using wget
--no-check-certificate.
Bashing the news
Same kind of things, except that I add the timstamp to it, but besides that, just the
same thing.
Cronjobs everywhere
I just now need to auto-exec the 3 jobs above, the transcoding and some
ikiwki-internal command to update the calendars, I've got 2 cronjobs for that
executed every 6 hours
0 */6 * * * /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_news.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_pictures.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_video.bash 2>&1 > /dev/null && /var/tbs/transcode.sh > /dev/null 2>/dev/null
0 1/6 * * * ikiwiki-calendar /var/ikiwiki/TelecomixBroadcastSystem.setup "2011/* or 2012/*" 2012
This is the end
Now the wiki auto-build itself. I then just needed to tweak the nginx to suit my needs bt that was really easy to do. I just
need to keep in mind that I'm in need of two aliases (one for /videos, one for
/pictures) because I did not wanted to commit all the videos in the git directory
(that eat a lot of space), and to tell it that .ogv aare indeed video files.
server {
listen 80; ## listen for ipv4
listen [::]:80 default ipv6only=on; ## listen for ipv6
server_name broadcast.telecomix.org;
access_log off;
location / {
root /var/www/tbs;
index index.html index.htm;
}
location /pictures {
alias /var/tbs/pictures;
autoindex off;
}
location /videos {
alias /var/tbs/videos;
autoindex off;
}
}
And I just need to edit the mime.types file to add those line at the end of the
file:
video/ogg ogm;
video/ogg ogv;
video/ogg ogg;
That's it, everything worked fine now. A final thing was needed, to spread it easily
(and that's why I wanted static pages), ease the process of mirroring. The best way
to do this is to use rsync in daemon mode with three modules read-only.
Installation of rsync is piece of cake:
aptitude install rsync
You then need to enable it in debian, for this, editing the file /etc/default/rsync
is the way to go. I wanted to throttle it down and to keep it nice on the I/O
(because I already have too much process that eat my cpu like, transcoding), so I've
enabled those options in the same file:
RSYNC_ENABLE=true
RSYNC_OPTS='--bwlimit 200'
RSYNC_NICE='10
RSYNC_IONICE='-c3'
And then, in the /etc/rsyncd.conf, I've added those modules
max connections = 10
log file = /dev/null
timeout = 200
[tbs]
comment = Telecomix Broadcast System
path = /var/www/tbs
read only = yes
list = yes
uid = nobody
gid = nogroup
[videos]
comment = Telecomix Broadcast System - videos
path = /var/tbs/videos
read only = yes
list = yes
uid = nobody
gid = nogroup
[pictures]
comment = Telecomix Broadcast System - pictures
path = /var/tbs/pictures
read only = yes
list = yes
uid = nobody
gid = nogroup
ANd that's it, people can now duplicate the whole thing on a simple web server (they
just need space) without anything else on it that serving webpage.