PROJET AUTOBLOG


Shaarli - Les discussions de Shaarli

Archivé

Site original : Shaarli - Les discussions de Shaarli du 23/07/2013

⇐ retour index

Python-Goose - Article Extractor

jeudi 6 novembre 2014 à 18:29
CAFAI Liens en Vrac 06/11/2014
Goose was originally an article extractor written in Java that has most recently (aug2011) been converted to a scala project. This is a complete rewrite in python. The aim of the software is to take any news article or article-type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.
(Permalink)