Wednesday, October 22, 2014

In continuation of our posts on the applications of machine learning from the book Programming Collective Intelligence, we will now see an interesting application of optimization in the real flight searches. The Kayak API is a popular vertical search engine for travel. After registering for developer access to the Kayak package which gives us a developer key, we get a new Kayak Session from the Kayak API server. With this session, we can do different flight searches. Note that the need for a session, the use of XML based response, and the extra long URLs based with query strings comprising of several different search parameters, are all reminiscient of the APIs of those times. With just the query parameters for session Id, destination and depart date, it returns a search id with no results that can be used for polling. With the search id and the session id, we can write the polling function as one that requests the results until there are no more. There is a tag called morepending which has a value true until its all done. The prices, departures and arrivals list retrieved from the query is zipped into a tuple list. The flights are returned primarily in the order of price and secondly in order of time. When we try to create the schedule for the arrivals and departures of the Glass family where the family members visit from different places and leave after the holiday, this flight search is simply performed over and over again for their outbound and return flights for each individual.  Then we can optimize the flights using the actual flight data in the costing function.  The Kayak searches can take a while, we are merely considering this search for the first two passengers. The costing function now takes the domain as a range multiplied by the count of the outbound and return flights of these two passengers and run the same optimization as discussed earlier using this domain. The actual flight data gives us more interesting data ways to expand our optimization.
This far we have discussed optimization based on costs. However, there is such a thing as preferences which may or may not be directly tied to costs. Preferences are user attributed.and this is a ranking that makes the people happy or as little annoyed as possible. The information is taken from the users and combined to produce the optimal result. For example, everyone may want a window seat on the same flight but that may not be possible.  If we make the cost function to return very high values for invalid outcomes, we can resolve this conflict. 

No comments:

Post a Comment