Saturday, December 17, 2011

Active Functor and Active Information Graph

I made 2 new blogs to keep things more clear.
Top priority project will be active functor, a new way to think about the future of computing. Data and function will be unified in one concept , the active functor. (Information with the soul inside)

http://activefunctor.blogspot.com/
http://activeinfograph.blogspot.com/

Saturday, November 19, 2011

Resources for the next generation of scientists and engineers


There are many good materials, free ebooks, lectures, ... but here is a few of things come with me many years when I started learning "Hello World" in Computer Science.
Thanks to Google University (I have started using Google since 2004)

the Foundations,

Foundations of Computer Science

Introduction to Programming in Java

Algorithms, 4th Edition

Advanced level:

Mining of Massive Datasets

Agile Patterns: The Technical Cluster (a good engineer should understand both theory and practice)

Networks, Crowds, and Markets:Reasoning About a Highly Connected World
for self-study student like me http://www.cs.cornell.edu/Courses/cs6850/2011sp/

Video lectures and should check it out :

Enterprise infoQ for CS engineering

Wednesday, November 16, 2011

Think and Make it happen

dedicate these words to someone is reading my blog :)




Navigate the waters of your inner self with emotion. 

Turn the simple things into a spectacle for the eyes. 

Never give up on life or on those you love.

Never grow old in the territory of emotion!
Disappointment, frustration, and loss will always occur,
But work through each pain as an opportunity  to grow.
Find an oasis in each your desserts!
Contemplate beauty. Unleash creativity.
Manage your thoughts. Protect your emotions.
Live in enterprising lifestyle.
Train your amazing mind to be brilliant. You deserve it!


Play with Akka, better & scalable concurrency model for Java

You have a cool algorithm to compute relationship between your friends and your hobbies, interests.You want to scale it to 10 million users (scalability). blah ..blah ...Here is the ideas (functional programming - Haskell, Erlang , map&reduce - Hadoop,Google ,  Actor Pattern - Akka framework ) The Actor Model - Towards Better Concurrency

Tuesday, October 25, 2011

Active Info Graph

Graph is the beautiful data structure both in Computer Science and Maths, I love this when first got some basic ideas in school.
One of the best ways to innovate is  forgetting all things you have learned in school or trap yourself in the mind someone  already make it happen ?

Here some academic writings (and maybe Google, or big guys implemented this ???)
http://videolectures.net/slsfs05_hofmann_lsvm/
http://en.wikipedia.org/wiki/Tf%E2%80%93idf (in Lucene http://lucene.apache.org/java/docs/index.html)

what I can dream
Small World Phenomenon (http://introcs.cs.princeton.edu/java/45graph/)
http://www.amazon.com/Computer-Brain-John-Von-Neumann/dp/0300007930

Case study
In the passive info world, I have to google to search for some useful things
E.g: http://www.vietnamworks.com/jobseekers/searchresults.php?search=true&industry=35&gclid=CM_Y3MSz7JICFQgaewodSE5zwg&lang=2
Captured from vietnamworks

Can this info actively come to me ? 

Sunday, October 16, 2011

Data Scientist - Books, Links, Papers, Tools, Projects,

on the way to prepare & study for new job, new trends after the post web 2.0 era. I still think about what should I do, study, research , blah.. blah ... to be a Data Scientist , ya truly science job.


In the trend where the data generated from massive users, tons of data is everywhere. Blog, Facebook, YouTube, Twitter, ...
We have to deal with them everyday. Your physical brain is designed to processing a lot of news, information, work ,,.. at same time for filter what is useful information , the knowledge you should capture and then the Wisdom (http://www.systems-thinking.org/dikw/dikw.htm)
=>Stress, overloaded, ... or the limit of biological brain.


On the way to implement my idea "My Second Brain" project http://code.google.com/p/my-second-brain/


http://www.infogineering.net/data-information-knowledge.htm
As the name, it should help me processing tons of email, blogs, RSS , local news to find the keywords , the trends. That can save me time manually reading, classifying , tagging, the key information. So I can focus all my energy to do cool things, making decisions to improve my skills, also  my career.,
to change the world, at least I should change my life first, and then share them for all.




First, how to extract the content of local news, and rank the best keywords. ==> http://code.google.com/p/boilerpipe/



The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.


Secondhttp://incubator.apache.org/opennlp/

OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.
OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.

Thirdhttp://lucene.apache.org/java/docs/index.html
Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.


Forthhttp://mahout.apache.org/
The Apache Mahout™ machine learning library's goal is to build scalable machine learning libraries.

Fifth, the Google Cloud & some tools
Hooking to browsing job, http://code.google.com/chrome/extensions/overview.html. Private cloud storage, cheap and cool, the Gmail https://mail.google.com/

Sixth, the Jetty, how your personal service running http://jetty.codehaus.org/jetty/ , http://code.google.com/p/i-jetty/

Seventh, mobile way how information is collected and consumed, http://www.phonegap.com/about
http://www.livestream.com/facebookeducation/video?clipId=pla_e86b0c30-8796-4b54-8c52-d43440f84068

Eighth, finally, visualization your personal information http://mbostock.github.com/protovis/ ,http://thejit.org/ , https://github.com/mbostock/d3

The big picture in one photo
http://en.wikipedia.org/wiki/DIKW

Tuesday, September 20, 2011

Big Data & My second brain project

Những thông tin dự báo xu hướng mà chúng ta nên xem qua:
http://www.readwriteweb.com/archives/life_in_the_future_with_data_livestreaming_oreilly.php
http://www.mckinsey.com/mgi/publications/big_data/index.asp

Trong thời kì hậu Web 2.0 này (post-web2.0 era) , việc xử lý & phân tích các dữ liệu mà bất kì ai, những user như mình (Facebook, Google, Banbe, Gmail, Picasa Photo, Youtube, ....) quá nhiều và cực kỳ phân mảnh rải rác khắp nơi. Sự thật là:

Bạn có bao giờ đối mặt với chuyện đi tìm cái link về private data (những data mà Google không thể index duoc vì 0 có password) về công việc, 1 địa điểm ăn uống trên internet, hay 1 bức mail trong Gmail, 1 cái feed ban share từ Facebook, 1 tấm hình mà bạn của bạn tag vào, ...

Những dữ liệu mang tính riêng tư cao, được "cất giấu", lưu trữ đâu đó trong các dịch vụ Web mà bạn sử dụng. Vì tính riêng tư như vậy, chả ai muốn Google index nó và phơi ra cho public. Nhưng nhu cầu tìm kiếm là có, bạn có thể dùng search của FB để tìm bạn bè, hoặc photo, ... nhưng chỉ là FB.

=> cái chúng ta cần là 1 hệ thống tìm kiếm mang tính riêng tư cao, và cho ra ket quả của tất cả dữ liệu cá nhân của bạntrong thế giới số này, 

Bạn có bao giờ nghĩ sẽ có cái search như Google với 1 box nhỏ nhỏ, và 2 cái nút để tìm những dữ  liệu cực kỳ cá nhân (privacy) và Google không thể nào tìm ra.


1 ý tưởng đã từ khá lâu, và công việc hằng ngày của 1 công nhân thế hệ @ (knowledge worker)  trong thời đại mà tất cả dữ liệu cá nhân chúng ta đang được số hóa, tất cả làm mình cần phải làm 1 điều gì đó thay đổi bản thân và sắp xếp lại dữ liệu cá nhân mà tất cả các dịch vụ Web trên internet đang lưu trữ.



Saturday, August 6, 2011

Info Marker Extension cho Google Chrome

Thỉnh thoảng bạn đi lướt web đâu đó, bạn bắt gặp những thông tin hay, hữu ích cho cá nhân, bạn đánh dấu nó lại (bookmark), và tất nhiên sau đó chia sẻ cho một ai đó sẽ làm chúng ta trở lên vui hơn.

Ước muốn là chúng ta chỉ cần 1 cái Browser + 1 cái nút để đánh dấu nó lại lên 1 trang profile cá nhân ở mạng xã hội nào đó, hoặc 1 bài blog, hoac 1 nơi lưu trữ dễ dành xem lại khi cần.

Info Marker được tạo ra như 1 nhu cầu cá nhân của bản thân mình, chỉ 1 cái nút nhỏ, và nó nhúng vào bất kỳ 1 trang web nào đó, thế là hành động "chia sẻ", "thích" 1 thông tin trở nên thật đơn giản chỉ với 1-2 cái click chuột.

Với version đầu tiên alpha 1, được tích hợp với dùng social plugins mạng xã hội banbe.net, và G+ button của Google Plus.
Bạn có thể download, cài đặt ở đây và dùng thử
http://dl.dropbox.com/u/4074962/chrome-ext-plugins.crx

Định hướng của Info Marker sẽ đi từ Browser web đến mobile web, nơi mà những chiếc điện thoại thông minh sẽ trở thành công cụ ghi nhận, và chia sẻ những khoảnh khắt đáng nhớ , những bức ảnh, những địa điểm và tag mọi thứ.



Thursday, July 7, 2011

Friend-Relationship Ranking Algorithm

Friend-Relationship Ranking Algorithm: thuật to�n xếp hạng c�c mối quan hệ, bạn be

Saturday, May 21, 2011

Imagine and keep it simple (Hãy tưởng tượng và giữ nó đơn giản)



Trước hết, tôi xin kể một kinh nghiệm nhỏ như sau
Như chúng ta đã biết, Prof. Einstein là một nhà khoa học đáng kính, người đã thay đổi chúng ta về cách hiểu về vũ trụ, năng lượng và vật chất. Nhưng vì sao Prof.Einstein có thể giải quyết nhiều vấn đề phức tạp như vậy.
Đơn giản là về phương pháp suy nghĩ, tư duy và giải quyết vấn đề


Phát thảo vấn đề ở mọi nơi có thể

Học theo cách ghi note của John Nash, và Mark Zuckenberg

1 bản design của hệ thống search engine cool hơn Google đây

Cuốn "Khảo lược Adam Smith" khá hay về kinh tế"
Trí tuệ đám đông nói lên những bí mật đằng sau thuật toán PageRank của Google

Blackboard là 1 người bạn tốt

Monday, May 2, 2011

What we can see after the Web 2.0 era

(Bài blog này dành cho những cho những người như tôi, mong muốn  biến dịch vụ điện toán thế hệ post Web 2.0 trở nên gần gũi và giúp ít nhiều hơn với người dùng end-user.)


When 2.0 was a buzz word for years, the technologies have been evolved rapidly, and I think we can adapt these techs in a small enterprise.
The technologies, cloud computing platform (E.g: Google Office Docs and Google Data API), open source frameworks (E.g: Apache Projects, PHP CodeIgniter, ...), could be cheap solution for Enterprise Information System.
As an engineer in a start-up company, I must always improve my skills, more agile for working, thinking and dealing with an uncertain world. Yes, everything can change day by day.

In this 2010, the open source community have see the acquisition of Sun and Oracle,
One is the symbol of innovation in IT in 1990s, Sun Micro-system and one is the "proprietary company", Oracle.
MySQL as the database solution for cheap storage in Web 2.0 era has been moved to be "proprietary software" of Oracle.
"Has been", yeah I think Oracle can use their power to make MySQL is day after day "less open a little", "less freedom for me".

In my mind now, is how to find alternative for MySQL as database engine for Web Application. Maybe not now but in 2 or 3 years

The problem is how to scale more but less costs.
The relational database was invented in 1970s, now in 2011, with the massive data  is everywhere, the Facebook and others is create a very big platform for users, developers and third parties create the content.
What we can see now is our data is stored in very distributed way. We use FB, Twitter, blog, Picasa,Gmail, Flickr, ... the web 2.0 era has been made the Internet is bigger about data, but not about knowledge. 
I think about the personal database which unify all my data in the cloud, add simple structure, taxonomy, semantic to my data. So index all them, find them more easily .


In my mind, the vision of this system could solve the following problems:

  • Data is everywhere but not easy find the useful information, relationship or hidden knowledge. A search engine as Google  only help you searching the public data in Internet with some keywords, but in personal data like in Facebook, Blog, or daily activity, I imagine we can do more with this data by letting computer "know"
  • Privacy of users is more important. Like a paradox, we want computer help us search all information, both public and private. "Don't be a evil!" let them have time, they will be.
  • Personal suggestion useful info when we need, just like a true assistant, secretary.



Still thinking and  finding the answer


Info Flow in simple DB about jobs in cloud (Blogger is good)

Wednesday, March 16, 2011

Monday, March 14, 2011

Ensuring Product Quality at Google

InfoQ: Ensuring Product Quality at Google

we rarely attempt to ship a large set of features at once. In fact, the exact opposite is often the goal: build the core of a product and release it the moment it is useful to as large a crowd as feasible, then get their feedback and iterate,”

Friday, February 25, 2011

Estimation Toolkit for Agile Developer

From the article InfoQ: Estimation Toolkit , I have some thoughts
Each user story can be a function (mathematical view), or service (SOA view).
With some input methods (user form, event triggers, links, ... )

Tuesday, February 22, 2011

How to scan ISBN and post your book to blogger

When I need to find a book in my library, it's so boring  because it's not there. So I should do something that make my life easier with my Droid.


Every book has a ISBN number, which's encoded in barcode, just like this one. You can see here, my book "Dealing with Darwin" has ISBN 971591842149.



Step 1: I  write a small Android app "OrganizeMyStuff" , call to the Barcode Scanner App in my phone.




Step 2: Got the ISBN number, then my app search it from the Google Book.



Step 3: Finally, my app posts the book data to blogger.








You can see at this http://mybooks-database.blogspot.com/

That's it.

Monday, February 14, 2011

VietnameseAnalyzer.rb


require 'unicode'

# Normalizes token text to lower case.
class UnicodeLowerCaseFilter
def initialize(token_stream)
@input = token_stream
end

def text=(text)
@input.text = text
end

def next()
t = @input.next()

if (t == nil)
return nil
end

t.text = Unicode.downcase(t.text)
return t
end
end

class VietnameseAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis # Standard Character mappings to remove all special characters # so only default ASCII characters get indexed CHARACTER_MAPPINGS = { ['á','à','ạ','ả','ã','ă','ắ','ằ','ặ','ẳ','ẵ','â','ấ','ầ','ậ','ẩ','ẫ'] => 'a',
['đ'] => 'd',
['é','è','ẹ','ẻ','ẽ','ê','ế','ề','ệ','ể','ễ'] => 'e',
['í','ì','ị','ỉ','ĩ'] => 'i',
['ó','ò','ọ','ủ','õ','ơ','ớ','ờ','ợ','ở','ỡ','ô','ố','ồ','ộ','ổ','ỗ'] => 'o',
['ú','ù','ụ','ů','ũ','ư','ứ','ừ','ự','ử','ữ'] => 'u',
['ý','ỳ','ỵ','ỷ','ỹ'] => 'y',
} unless defined?(CHARACTER_MAPPINGS)

def token_stream(field, str)
ts = StandardTokenizer.new(str)
ts = UnicodeLowerCaseFilter.new(ts)
ts = MappingFilter.new(ts, CHARACTER_MAPPINGS)
end
end


How to implement a search engine for Vietnamese language.
Cool, I found a solution!

Tuesday, January 18, 2011

Play with Mac OS X on my old PC

From Thoughts & Works & Life

Số là dù không ưa gì Mr.Jobs này, và mấy tên iFans nhưng Mac OS và iOS là những hệ điều hành hay, và phải công nhận viết App cho iPhone thì dễ dụ mấy tên iFans mua hơn.
Với Android thì ta nên sử dụng mô hình Free with Ads .

Mỗi platform đều có cái hay và dở. Nhưng nếu ý tưởng  "Drive Safe" ra đời, tốt nhất là support luôn cả 2.
HTML5 chính là key. Tuy nhiên framework cho 2 platform này vẫn còn thiéu, và mình sẽ là 1 cái.

Wednesday, January 5, 2011

I'm studying Scala - a COOL hybrid Object Oriented and Functional programming language


My idea is "Using Blogger as Storage for semantic and self-descriptive document".
Just a small module in my own open source project http://code.google.com/p/job-online-engine/
I found Scala is more cool language, true FP for my need.
Well, learning now.

Rethinking the Mobile Web

Get the Internet more accessible for everyone