froscon2008 - 1.1

Free and Open Source Software Conference

Isabel Drost
Day Day 1 (2008-08-23)
Room HS3
Start time 16:30
Duration 01:00
ID 194
Event type lecture
Track Java
Language used for presentation en

Apache Mahout

Industrial strength machine learning

Mahout is a new Apache project under the umbrella of Lucene. Mahout's goal is to build scalable, Apache licensed machine learning libraries. The talk will give an overview of the project: It will cover the rather young history of the project itself, the algorithms available so far and the plans for future developments.

In the recent past, it became very easy for people to create and publish new information in digital form. The amount of digital, unstructured data increased tremendously over the last few years. Separating relevant information from spam, learning from users' behaviour, grouping texts in a meaningful way, etc became more and more important.

In recent years a rather large community of researchers has treated the problem of learning from text. Be it classifying texts into categories, clustering texts to form groups that make sense to users or proposing new items to customers based on previous purchases. The goal of Mahout is to build a strong community of developers to build a well-documented, scalable, commercial-friendly suite of machine learning libraries based on Hadoop.

This talk starts with an overview of the history of the project. It presents the developments and achievements made during the first few months: The talk details the algorithms available so far and those that are still under development or planned. Finally a brief overview of topically related projects is given.