Sunday, May 27, 2007

I knew I was going to miss a day...

Well, at least I'll make it less than two days. Here is the current forecast for next Saturday's weather: Partly cloudy, high 82°F, low 54°F, chance of precipitation 10% Partly sunny, High: 74°F Low: 54°F (only partial): Mostly sunny, high 75°F

University of Washington: Not really sure where their forecast comes from, and also I'm not sure how to interpret it, so I'll paste it in and then think about the interpretation some other time: SATURDAY...MOSTLY SUNNY. HIGHS IN THE MID 70S TO LOWER 80S. && TEMPERATURE / PRECIPITATION PUYALLUP 60 43 70 / 50 40 10 TACOMA 60 42 69 / 50 40 10 SEATTLE 59 48 66 / 50 40 10 BREMERTON 59 42 68 / 50 40 10 EDMONDS 58 47 66 / 50 50 10 EVERETT 57 47 65 / 50 40 10 $$ (also partial): mostly sunny. Highs in the mid 70s to lower 80s.

KOMO TV (also partial): mostly sunny. High 77°F

I guess that's it... Now writing this takes a lot of extra time, but I'll keep trying. One difference now is that there is no rain forecast on any of them. There was no rain forecast for today but I woke up and it's raining...

Anyway, yesterday was a busy day. I even got a phone call from a friend that I haven't talked to in a LONG time. But I missed the call! I'll try to call him later today, and will write about it some other time (some odd coincidences that I've decided to discuss when I know more details).

Friday, May 25, 2007

Continuing the weather countdown

Not many changes today (as of 9:20 PM PST): Mostly Cloudy, max 70°F, min 51°F, chance of precipitation 10% Cooler with rain, High: 62°F, Low: 47°F

Countdown for the barbecue

So on next Saturday Amy and I are hosting a barbecue here at home. Barbecues are fun, but they tend to be very susceptible to weather variations. So, just because I'm a scientist, I decided to start a countdown with the weather forecast from multiple sources and see how they vary as we get closer to the date.

Today, May 25, 12:40 AM: Mostly cloudy, low 55˚F, high 72˚F, precipitation chance 10% Rain, low 49˚F, high 64˚F

Each source is a little different in the way they provide weather. Some only have a 7-day forecast, so I'll try to keep adding them as they enter the range. I'll just hope that is more correct than accuweather.

Wednesday, May 23, 2007


Lately one of my favorite things to do at night is to just pick up a subject and brainstorm about it, writing down whatever I feel like is relevant. It's quite interesting, because after I finish the activity I read back what I've written and enjoy how naive and contradictory all my ideas are.

Let me give you a hypothetical example. Let's say that today's subject is knowledge acquisition. So I start by writing:

- Knowledge is defined by the relationship between elements
- There are no elements, just the relationships
- The absolute is defined by a relative sense to what is culturally or personally defined as the absolute point

And there is goes... It's a fusion of not very actionable pieces of ideas. Not very exciting then, but it continues, and gets worse:

- Branch traversal is interrupted when relationships are not found or when they become too low in interest to continue
- Interest is defined by the types of relationships between things
- Types are also relationships to moods or goals

Conclusion, I'm back to saying that there have to be some absolute elements to knowledge: here the "moods and goals". How can you think of knowledge without being able to point to something and say: the book... the table... the book is on the table (sorry for you native English speakers).

Anyway, it's fun. And what makes it more interesting is that I don't expect to get anything out of it. I'm through with creating a new project every other day like a couple of weeks ago. Time to relax and just keep my mind active.

Talking about keeping my mind active, I was reading a paper earlier today: "Mining Nonabiguous Temporal Patterns for Interval-Based Events" by Shin-Yi Wu and Yen-Liang Chen. It's an interesting paper where the authors propose methods to find patterns on the relations between interval-based events by classifying pair-wise relations using a very simple set of 7 possible relations. It's pretty and all, but when you get to real world case, the stock analysis, they make a whole set of simplifying transformations that make the problem, let's say, silly. They use three "event types": (1) the stock price increases for at least 3 days, (2) the stock price decreases for at least 3 days and (3) the stock price increases and decreases at least 3 times. Also they discuss 3 period lengths: week, month and season. Talk about arbitrary definitions here. All stock prices go up and down at least 3 times in a week. They usually do that in a 5 minute period.

In any way, there are some interesting ideas in the paper, like the process to try and predict stock movement with their correlation patterns that was found. Interestingly some of their graphs show an almost random predictive accuracy for the interesting things and very good accuracy for behaviors like "season trends". Not very meaningful, I guess. Also what I liked about the paper that sparked the brainstorming that I've mentioned before is that what they mine is not the events themselves, but the relationship between the events.

Tuesday, May 15, 2007

Learning using generative approaches

On my way back home today (much earlier than usual), I started thinking about learning methods. Learning is both one of the most interesting things that you can think of in the computer science side of the world, but also one of the most traveled paths. Everybody wants to teach their computer to be a little smarter and not expect to just repeat what you say.

So, with all this already done, why did I decide to think about it? Do I have an answer to the machine learning problem? Yea, right! I never have answers, but I do have questions and the will to read papers and pursue things that make my evenings more meaningful. And today what I'm looking at are generative models.

Like with all research, you have to start with defining what you mean by the names you use. So, the generative models that I'm talking about are the ones that the system itself generates inputs to itself. The idea behind it is that you learn by doing it. Not necessarily actually doing it, but by rehearsing doing it inside your world model, your brain. Actually, we are very good at that! We can even understand intangibles, like other people's emotions, by trying to map their experiences and facial expressions to what we would do and determine what we would be feeling if we did it, thus what the person should be feeling.

Also, another interesting example is why are people usually scared during scary movies, or sick during bloody scenes? It's because we are constantly trying to understand it by applying what happens to ourselves and we do feel scared, we do feed the sickness of our pain that isn't there.

So, back to computers: I believe (like many other researchers that have tackled this problem) that one of the key methods for robust learning (and I'm not talking here of any learning - there are many ways for computers to learn, some very good), is to allow our learners to replay and internalize what happens.

This is much easier said than done, actually. It's very easy to think of learning in the normal learning way: synchronous. You present a case and potentially the answer or a hint about the answer and you let the learner take one step towards learning the model. Then you present the next one and so on. The problem of generative models is that the "will to learn" has to be an action from the learner. The learner should determine what it wants to learn and maybe generate what it thinks it should learn.

This post is already getting much longer than anybody should handle, so I'll try to make it easier and think of an example. Let's say that you want to teach a computer to play Sudoku.

Supervised method:
The "teacher" shows a Sudoku puzzle and then a solution (that can be a step towards the solution, or a piece of the puzzle with a step towards the solution). Then it shows another puzzle and a solution. It keeps showing different puzzles (well, sometimes you can repeat a puzzle to make sure it takes another step towards the solution of that puzzle) and solutions until you decide to stop and show some new puzzles and ask for the solution to see if it learned.

Reinforcement learning:
This is actually a type of supervised learning. It's focus is either on delayed gratification: you let the computer try a couple of things and then you zap it if it's not doing very well; or you give it candy if it's doing well. Also another possibility is not providing the next correct step, but just say if it's right or wrong. It feels much more like nature teaches animals, but it is limited to what saying right or wrong can make you learn. My Ph.D. research started with looking at reinforcement learning techniques and they are slow to learn and usually not very robust (well, if you can claim robustness on something that converges in way too many iterations)

Unsupervised learning:
In this type, you allow the computer to see the different games and let it find patterns in them by itself. Then it can use these patterns to solve other games. It's usually also based on showing the learner a set of examples but not saying anything about them. It's interesting, but it's usually very limited in what it can be applied to. I'm not sure it would create a good Sudoku player.

Generative learning:
In this case you can start with any of them methods. But then you allow the learner to either pass back to the teacher a whole new puzzle and ask for a solution, or request a recall of a specific puzzle, or even stop looking for puzzles and trying to predict what the next puzzle would be. Actually prediction is a very interesting consequence of these types of approaches. You are not really any more trying to answer the question like A + B = ?, but you are now trying to look at things like A + ? = C. You know what C should be because of your learning, but now you are trying to find other Bs that satisfy the same model. Then you try to look at other As. And then you try to vary C and look again. You build the model by constructing the question and not the answer.

Again, as you must have already realized, I quickly left the realm of Sudoku. So you can't try to implement what I've just written here. Yes, and I'm aware that nobody even thought of doing it besides me - and I haven't actually implemented anything myself, just written a lot of notes on OmniOutliner about what questions I'm trying to answer. And, of course, with no answers themselves. Things like:
  • How to make a learner use a 4x4 Sudoku as a learning ground for a 9x9?
  • Should the learner actually learn position and movement too? E.g., should it interact with the outside world like: show me the element to the right of the element I've just seen
  • Should learning involve separate learning modules for bad and good examples?
  • How much can you predict before seeing an example? (how much should you learn from the instruction manual - sort of like the ontology duality of intent/extent)

Oh, well... At least I have fun and keep my mind occupied! :-)

Tuesday, May 08, 2007

The public web

I was reading the news this morning and I couldn't let this one get away without me posting it on my blog:

Woman denied degree because of MySpace profile

A classic!

There are lots of interesting things that happens in a world where things you do are more publicly accessible. It's similar to the keynote speech by Jon Kleinberg I've heard during the last SIGIR: all the 6-degree-separation endless discussion has to be revalidated. Now that social networking sites makes your social network publicly visible, all numbers and goals change. It's much easier for a person living in a cave to have hundreds of friends. On the same lines, it's much easier for a person that is trying to know more about another person to find people or direct evidence out there. In the past you had to hire a private investigator or things like that.

I could now start reciting a number of science fiction authors that predicted this shift on the concept of privacy, but I'll just end this post and start my day.

Monday, May 07, 2007

So many ideas, so little time...

Lately I have been suffering from the old idea burst. I'm trying to write down all the ideas and all the things I have to do for each of the ideas, but I feel bad that I never get to actually execute any of them. My ideas don't come from a vacuum - it's worse, they compound on projects that had not yet been finished.

For example, I'm working on a metadata vision document. As I start to work on it, I decide that I need some examples of what I'm saying, so I start a project on building a sample ontology with the concepts that I'm trying to outline in the document. Then this weekend I look at what I'm doing and decide that this won't be enough. I need an application that makes use of all this structured (or not-so-structured -- and that's part of the document) information and does something fun, like organizing your purchases, or helping on researching for products. And there I went...

Another thing that is going on is that one paper that I've sent to a journal only now came back (about 1.5 years after sending) with some requests for changes. So I went through the paper and found out that some references are clearly outdated and that I need to work on the paper again. So there I went to sketch out the changes that I need...

I've also worked a little bit on "low-level" work stuff, like cleaning up things that I needed to clean from a long time ago. A good thing is that at least this I got done this weekend!

Anyway, that's all I have to say right now. I have something like two posts in Draft now that I can't seem to be able to finish them. One is a little long-ish, but the other probably is too centered on a couple of experiences I had in the last few weeks and I'm always a little worried on the wording I use when talking about other people.