In the trend where the data generated from massive users, tons of data is everywhere. Blog, Facebook, YouTube, Twitter, ...
We have to deal with them everyday. Your physical brain is designed to processing a lot of news, information, work ,,.. at same time for filter what is useful information , the knowledge you should capture and then the Wisdom (http://www.systems-thinking.org/dikw/dikw.htm)
=>Stress, overloaded, ... or the limit of biological brain.
On the way to implement my idea "My Second Brain" project http://code.google.com/p/my-second-brain/
to change the world, at least I should change my life first, and then share them for all.
First, how to extract the content of local news, and rank the best keywords. ==> http://code.google.com/p/boilerpipe/
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.
OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
The Apache Mahout™ machine learning library's goal is to build scalable machine learning libraries.
Fifth, the Google Cloud & some tools
Hooking to browsing job, http://code.google.com/chrome/extensions/overview.html. Private cloud storage, cheap and cool, the Gmail https://mail.google.com/
Sixth, the Jetty, how your personal service running http://jetty.codehaus.org/jetty/ , http://code.google.com/p/i-jetty/
Seventh, mobile way how information is collected and consumed, http://www.phonegap.com/about
Eighth, finally, visualization your personal information http://mbostock.github.com/protovis/ ,http://thejit.org/ , https://github.com/mbostock/d3
The big picture in one photo