PROJET AUTOBLOG


Shaarli - Les discussions de Shaarli

Archivé

Site original : Shaarli - Les discussions de Shaarli

⇐ retour index

Python-Goose - Article Extractor

jeudi 6 novembre 2014 à 18:29
CAFAI Liens en Vrac 06/11/2014
Goose was originally an article extractor written in Java that has most recently (aug2011) been converted to a scala project. This is a complete rewrite in python. The aim of the software is to take any news article or article-type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.
(Permalink)

I'm richer than you! infinity loop