Thursday, January 31, 2013

Threads and Process


"An operating system thread is a unit of control without additional private OS context and without a private address space. Each OS thread has full access to the memory of the other threads executing within the same multithreaded OS process. Thread execution is scheduled by the operating system kernel scheduler and these threads are often called "kernel threads" or k-threads.
A Lightweight Thread package is an application-level construct that supports multiple threads within a single OS process. Unlike OS threads scheduled by the OS, lightweight threads are scheduled by an application-level thread scheduler. The difference between a lightweight thread and a kernel thread is that a lightweight thread is scheduled in user-space without kernel scheduler involvement or knowledge. The combination of the user-space scheduler and all of its lightweight threads run withing a single OS process and appears to the OS scheduler as a single thread of execution.  Lightweight threads have the advantage of faster thread switches when compared to OS threads since there is no need to do an OS kernel mode switch to schedule the next thread. Lightweight threads ahve the disadvantage, however, that any blocking operation such as a synchronous I/O by any thread will block all threads in the process. This prevents any of the other threads frommaking progress while one thread is blocked waiting for an OS resource. Lightweight thread packages avoid this by (1) issuing only asychronous (non-blocking) I/O requests and (2) not invoking any OS operations that could block. Generally, lightweight threads offer a more difficult programming model than writing software based on either OS processes or OS threads. Some DBMSs implement their own lightweight thread (LWT) packages. These are a special case of general LWT packages. We refer to these threads as DBMS threads and simply threads when the distinction between DBMS, general LWT and OS threads are unimportant to the discussion. A DBMS client is the software component that implements the API used by the application programs to communicate with a DBMS. Some example database access APIs are JDBC, ODBC, and OLE/DB. In addition, there are a wide variety of proprietary database access API sets. Some programs are written using embedded SQL, a technique of mixing programming language statements with database access statements. This was first delivered in IBM COBOL and PL/I and, much later, in SQL/J which implements embedded SQL for Java. Embedded SQL is processed by preprocessors that translate the embedded SQL statements into direct calls to data access APIs. Calls made to these APIs are marshaled by the DBMS client component and sent to the DBMS over some communications protocol. The protocols are usually proprietary and often undocumented. In the past, there have been several efforts to standardize client-to-database communication protocols, with Open Group DRDA being perhas the best known, but none have achived broad adoption. A DBMS worker is the thread of execution in the DBMS that does work on behalf of a DBMS client. A 1:1 mapping exists between a DBMS worker and a DBMS client: the DBMS worker handles all SQL requests from a single DBMS client. The DBMS client sends SQL requests to the DBMS server. The worker executes each request and returns the result to the client."
Reference: Architecture of a database system - Hellerstein, Stonebraker, Hamilton

Wednesday, January 30, 2013

Python NLTK processing

The NLTK short form for natural language toolkit library available in Python programming language defines the following constructs:
concordance : this provides all occurrances of the word in the text block along with their context.
similar : this provides all the words similar to the one specified in concordance that appear in the same range of contexts.

Tuesday, January 29, 2013

Lock free

Lock primitives such as mutual exclusion locks are popular but when they number more than a handful, their benefits wane. Programmers try to maintain correctness with conflicting operations  but in serializing some of these, they also serialize some non-conflicting operations. Similarly, programmers try to maintain live-ness but these locks are held longer than would otherwise be necessary. Also, without scheduler support, programmers need to be aware of priority inversion problems. Lastly, for high performance programmers must balance the granularity at which locking operates against the time that the application will spend acquiring and releasing locks.
Hence, three different APIs  are proposed by Fraser and Harris, each of which attempts to improve the shortcomings mentioned above. Further, they are non-blocking and generally allow direct-access parallelism. They all follow a common optimistic style.
The first API provides multi-word compare-and-swap (MCAS)  which generalizes single-word CAS operation found on many processors. It automatically updates one or more memory locations from a set of expected values to a set of new values. This API comes with two methods MCASRead and MCAS where the former allows for the values to be read and remembered until the subsequent MCASRead while the latter allows for single update.
The second API provides word-based software transactional memory (WSTM) which avoids some of these problems by allowing a series of reads and writes to be grouped as a software transaction and applied to heap automatically. This API comes with two methods WSTMRead and WSTMWrite ( one for read and another for write ).
The Third API provides an object-based software transactional memory (OSTM) which allows a thread to 'open' a set of objects for transactional access and, once more, to commit updates to them atomically. Each object is accessed through an OSTMHandle which must be subject to an OSTMOpenForReading or OSTMOpenForWriting call in order to obtain access to underlying data.
As a comparison between the three APIS, here are some of the criteria to use:
First, disjoint-access parallelism is demonstrated by MCAS when accessing disjoint sets of words and probabilistically by WSTM. OSTM shows that when accessing disjoint sets of objects.
Second, MCAS shows no read parallelism whereas both WSTM and OSTM allows the same.
Third, as a space cost, two bits are reserved in each word by MCAS and a fixed size table of 65,536 double-word entries is used by WSTM. OSTM uses up one word in each object handle.
Fourth, MCAS has no compos-ability whereas both WSTM and OSTM allow compos-ability.

Saturday, January 26, 2013

Data Mining (Continued)


Decision Tree

A useful data structure for mining the logical data model is the decision tree. Structure involves interior nodes = set (A1, … An) of categorical attributes . The leaf is the class label from domain(C). The edge is a value from domain(Ai), Ai associated with parent node. The property is a search tree. The tuples in R -> leafs in class labels . The decision tree's property is that it associates the tuples in R to the leafs i.e. class labels. The advantage of using a decision tree is that it can work with heterogenous data and the decision boundary are parallel to the axis.

Decision Tree Mining
There are efficient algorithms to mine decision trees. Given a table R (A1, A2, ... An, C), find a decision tree to minimize classification error. The constraint is to prefer a smaller tree and no unique columns.
One algorithm is the naive exhaustive enumeration. It involves the following steps. 1. Generate candidate trees. 2. Scan R to compute their classification error. 3. Select a tree with small classification error and small size. The downside is its complexity is exponential (number of columns of R).
A better algorithm is ID3. The ID3 is a greedy algorithm where each node reduces entropy. It involves the following steps 1. Select an attribute for root node. Choose attributes with highest information gain Partition tuples over domain of chosen attributes. 2. Use recursion on each child. 3. The stopping criteria is don't recurse on nodes with few samples. The advantages of using ID3 is that it is faster than exhaustive but the disadvantage is that it is heuristic that is there is no guarantee of solution quality.

Clustering

Clustering is a technique for categorization and segmentation of tuples. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R. Find a set of those groups of rows in R with the objectives that the groups should be cohesive and not coupled. The tuples within a group are similar to each other. The tuples across group are dissimilar. The constraint is that the number of clusters may be given and the clusters should be significant.

The choices for similarity measures include distance functions such as euclidean, manhattan, string edits,  graph-distance etc. and with L2 metrics.

The choices for group representations are made with finding center such as with  mean, medoid, (mean and standard deviation) or with boolean expression on attributes or with named subsets of rows.

Cluster mining

There are efficient algorithms to mine cluster centers. Given a table R(A1, A2, ..., An) find cluster with say center-points and minimize average-distance with the constraint on the number of clusters. The average distance is computed on the rows of R, nearest center-points. The center-points belong to R

A naive algorithm is to use exhaustive enumeration. If the domains of the attributes are small, use the following steps : 1) Enumerate cross-products of domains of attributes 2) Enumerate size K subsets of tuple and 3) choose size-K subset with smallest average distance to R.

An alternative idea is to use the K-mean algorithm which uses a greedy strategy - each iteration reduces average distance to R. 1. Select K points at random. 2. Assign tuples of R to the nearest point. 3. Compute center of each cluster. 4. Repeat steps 2 and 3 to reach local minima. The advantage of cluster based outliers is that it is faster than exhaustive but it is heuristic i..e there is no guarantee of solution quality and it may miss outliers.

Outliers

Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given. 
The choices for similarity measures between rows include distance functions such as Euclidean, manhattan, string-edits, graph-distance etc and L2 metrics. The choices for aggregate dissimilarity measures is the distance of K nearest neighbours, density of neighborhood outside the expected range and the attribute differences with nearby neighbours.

Outlier mining:

A naive algorithm could compute dissimilarity measure for each row in R with the rest of R, then choose the top-K rows, or those with scores outside the expected range.

An alternative idea is clustering based where we use the clusters as a summary of data to reduce computational costs. It involves the following steps 1. Cluster  R eg via K-means, 2.  Compute distance of each tuple in R to nearest cluster center  and 3. choose top-K rows, or those with scores outside the expected range.

The advantage of using the cluster based outliers  is that it is faster but the disadvantages are the same as with clustering i.e. it is heuristic ( there is no guarantee of solution quality) and may miss outliers.


 

Friday, January 25, 2013

Data Mining


Data Mining

Applications use data mining to answer questions for Market Analysis such as which items sell together? (Diaper and beer) Who buys iPhones? (customer segmentation ). Should this credit card application be approved? etc.

The idea in data mining is to apply a data driven, inductive and backward technique to identifying model. Get model from learning data and check with test data, refine the model if there’s a mismatch (prediction, reality. This is different from forward deductive methods in that those build model first, then deduce conclusions and then match with data. If there’s a mismatch between the model prediction and reality, the model would then be refined.

The history of Data Mining is that it was started in Statistics and Machine Learning. It is often main memory based software and uses patterns such as classification and clustering. Database researchers contributed to this field by adding scalability to large databases i.e. with main memories. They also introduced new patterns called association rules and integration with SQL, data warehouse etc.

The conceptual data model involves tables for items (instance-id, item-type-id, attribute1, attribute2, … ) and transactions ( xid, subset of item-types, customer-id, timestamp ) and item-types (item-type-id, type-name, … ).

With this data model, a family of patterns called Associations, are established. They involve subsets of similar item-types such as say customer buying item-type ITI also buys item-type IT2. Another example is sequential associations where the customers buying item-type IT1 will buy item-type

Another Pattern family is Categorization / Segmentation that partition item-types into groups. Unsupervised clustering is one where number of groups is not given as for example, market segmentation. Supervised classification is one where there is optional table category (instance-id, class label). To grant or decline a credit card application is an example of supervised classification.

Here are a few more examples and their classification:

People buying book b1 at Amazon.com also bought book b2 => Association pattern

Smoking causes lung cancer => Sequential association pattern

Apparels have t-shirts and pants => segmentation

Customers who have a credit rating above 3.0 and a balance of over hundred dollars (supervised classification)

Associations have association rules. Let I be a set of items, T be a set of transactions. Then an association A is defined as a subset of I occurring together in T. Support (S1) is a fraction of T containing S1. Let S1 and S2 be subsets of I, then association rule to associate S1 to S2 has a support(S1->S2) defined as Support(S1 union S2) and a confidence (S1->S2) = Support(S1 union S2)/ Support(S1)

There are variations to association rules which include the following: hierarchical set of items, negative associations and continuous attributes. Association rules can also be compared with statistical concepts such as conditional probability and correlation.

Association rules apply to logical data model as well as physical data model. Logical data model extend relational model and SQL. It specifies patterns over a relational schema and one can specify data mining request to extract. An example is the data-cube model for DWH where the pattern meets certain criteria. Due to its popularity, there are common data models for many families of patterns and syntax for specifying pattern. Statistical data mining is supported in SQL.

Physical data model is used when efficient algorithms to mine association rules are required.  As an example, given an item-set I, a transaction set T and thresholds, find association rule S1->S2 with support above a given threshold.  The naive algorithm would be to do an exhaustive enumeration. The steps would include such things as generate candidate subsets of I, scan T to compute their support, select subsets with adequate support and the downside is complexity exponential (size(I)). A better algorithm may be the “Apriori” algorithm. This method generates candidate in order of size. It combines subsets which differ only in the  last item which results in an efficient enumeration. The idea is to prune the candidate families as soon as possible. Support is monotonic on the size of candidates and pruning can be applied in each iteration. Sorted candidate sets  are efficient for apriori method and a hash tree can be used for candidate set of different sizes

The data mining such as association rules mining, clustering mining and decision tree mining,  is always on the physical data model. The association rule, clustering and decision trees pertain to the logical data model.

 

Tuesday, January 22, 2013

Query Processor optimizations




The query optimizer is a software module that transforms internal representation of a query into an efficient query plan for executing the query. This query plan is a flow chart for table data to pipe through and the chart contains operators such as sort, scan or join. These operations can be very tricky to perform when the table data is very large. Consider a join of two tables where the number of records in both left and right tables is in the order of millions. A join is a match between the records of two tables based on a common satisfying condition such as for example, a record in both tables that have the same value in one of their columns. Now a query optimizer must choose what is called an appropriate plan size for finding all such matches. Either it could use the records on the left table as a basis for performing a match with those on the right, so the resulting set will have all the records of the left and a few from the right or it could switch the tables so that the resulting set will have all the records of the right with a few from the left. Alternatively, the query optimizer could also compute a Cartesian product of all the records from both tables. A query optimizer could choose a “left-deep” query plan and “postpone Cartesian products” ensuring that the Cartesian products appear only after all the joins in a dataflow. This was being done several years ago. Today the systems use “bushy” trees i.e. a join based on nested right-hand inputs  and early use of Cartesian product which are useful in some cases. Both may be used by a query optimizer and the plan space is an import consideration for the query optimizer. Another factor for consideration by the query optimizer is the “Selectivity estimation”. The cardinality of a table or index in a database can be considered a simple selectivity estimation parameter. Most systems however go beyond that. They summarize and analyze such things as the distribution of values in attributes via histograms and other summary statistics. Since this involves visiting every value in each column, it’s quite expensive and is typically done by sampling instead of a complete scan of all the data. Now this can come in useful for the selectivity estimation of a join where the tables are not compared but their histograms are.  There are more sophisticated schemes as well but their adoption was hampered because a query optimizer along with the rest of the software modules in a database server has to meet “performance benchmarks” – an accepted industry standard for high performance and the tests in these benchmarks used mostly statistically independent generated data instead of “real” data. Another challenge for a query optimizer is that the errors in optimization made earlier in a plan tree cascade down the plan tree with significant hit to performance or mistakes in subsequent estimations. While talking about factors affecting query optimization, it’s also worthwhile to mention search algorithms. Algorithms could choose one or the other of the strategies such as dynamic programming or “top-down” search. Each of these algorithms has a positive and negative effect on a query optimizer’s use of memory or processing cycles. Top-down search for example can lower the number of plans considered by an optimizer but increases the memory consumption.  Another factor for consideration is parallelism where the workload can be shared across one or more CPUs and reduce the time taken for the computations. These are measured with a parameter called the degree of parallelism (DOP)
Role of MVCC for for in-memory applications to boost performance and concurrency as opposed to OCC. In MVCC, each transaction generates a unique start-timestamp from a monotonically increasing counter and when it ends, it gets a unique end-timestamp. The transaction commits only if there is no overlap with another start/end-transaction pair. This way locking is avoided and a version is generated for each change. The role of MVCC is different from OCC even though both may be an add-on to two-phase locking (2PL) to ensure isolation. In OCC, multiple transactions are allowed to read and update an item without blocking. Instead, transactions maintain histories of their reads and writes which are checked for isolation conflicts before the transaction commits. If a conflict is detected, one of the conflicting transactions is rolled back. The cost of rollbacks and retries increases with conflicts in the OCC scheme. The 2PL refers to the method of locking where a shared lock is obtained on all items for reading and an exclusive lock is acquired on all items written to. The transaction blocks on a wait queue while waiting to acquire a lock. In order to reduce locking, hierarchical locks may be provided. Talking about isolation levels, here are the ones ANSI SQL defines:
1. READ UNCOMMITTED: Any version of the data can be read without taking locks. This is achieved in a locking implementation by read requests proceeding without acquiring any locks.
2. READ COMMITTED: A transaction may read any committed version of the data. Repeated reads of an object may result in different versions being read.
3. REPEATABLE READ: A transaction will read only one version of committed data. Once the transaction reads an object, it will always read the same version of that object until the end of the transaction.
4. SERIALIZABLE: All reads and writes are serialized.

The phantom problem is described as follows:  Repeated access of the transaction see tuples that were not seen on the first access. This happens because the locks are for the tuples and does not prevent insert of new tuples. Two phase locking of tables prevents phantoms, but table level locking can be restrictive in cases where transaction access only a few tuples with an index.

Other than the ANSI SQL levels, these additional levels are also popular.
CURSOR STABILITY: This is used to fix the "lost update" nuance of read committed. The lost update is when the changes made by one transaction  are lost because another makes an update. This is solved by holding a lock on the most recently read item on a cursor and dropped when the cursor is moved or the transaction terminates.
SNAPSHOT ISOLATION uses the MVCC to version every change by a transaction and prevent the overlap by any two start/end-transaction pairs.
READ CONSISTENCY is another MVCC where the locks are acquired at the statement level so the statements see the most recently committed value at the start and uses that logical version. A single transaction could use many different logical version. Only the most recent version is stored and older versions are retrieved using undo log records. Modifications are maintained via long term write locks and first writer wins.
 

In-memory applications

Now back to the post on in-memory applications. Let's look at some of the data structures that lends themselves to better concurrent programming and traditional data-store data structures. Skip List or balanced search trees etc are suitable because of the locks per node and the order in which they are applied.Let's take a look at skip lists and see how these are faster than say B+-trees. First, the skip lists have locks on each node for the purposes of localized updates. Second, the locks are needed only for two or three nodes at the same time for an insert or delete operation. Third, the locks are acquired in the order of the traversal. The readers and writers traversing the skip list will not pass each other. The locks taken in specific order will not cause deadlocks. In the case of the balanced search tree and say a red-black tree, each of the nodes have the locks but insert or delete does not acquire all the locks up until the root as this would throttle concurrency. The rotations and recoloring required for re-balancing the red black trees are performed with only three or four nodes at the same time. This is a bit more involved than skip lists.
As a practical example for say search over text where all the words have been populated in tuples of the word and the list of their occurrences, the skip lists enables for searching over a range of words by finding adjacent nodes in the skip lists and allowing for fast search, insert, delete operations.  Data parallelism can also be used to make performance improvements to the same.  Concurrency is also about hardware as much as it is for code. If the architecture involves memory synchronization between threads so that changes made by one is visible to another, there is a cost involved. In a shared nothing model, this cost is removed. Also, in a shared nothing model, the nodes can be added or removed from a cluster without affecting other operations. And moreover, in a model where the concurrency is distributed over shared nothing servers, the system can scale out to several thousands of nodes. Concurrency, however, is not the only thing to improve performance with.

Sunday, January 20, 2013

Search over text is enabled by creating tuples of (word, documentID, position), and building a B+-tree index over the word column. Sure words may need to be canonicalized and additional per-tuple attributes to aid in rank-ordering search results. For improved performance, some search optimizations include tuples to have each word appear once and a list of their occurences as in (word, list<documentID, position>). This is helpful especially given that skewed distribution of words in documents. However, the text search implementation described above may run slower by an order of magnitude than custom text indexing engines. This is true for both full-text documents and over short textual attributes in tuples. In most cases, the full-text index is updated asynchronously  ("crawled") rather than being maintained transactionally. In transactional queries, the semantics of relational queries with ranked document search results need to be bridged.

Log Manager

This may not be relevant to the previous blog post. But this is one of the things I came across today and blogging about it here. The log manager in a DBMS shows the changes committed to the database, helps with rollback of aborted changes and recovery from system failure. The standard way to do this is called the Write Ahead Logging (WAL). The WAL protocol states three rules:
1) Each change to the database should generate a log record.
2) Database log records should be flushed in order.
3) Log record must be flushed before the commit request for the change (or transaction) completes.
These rules of Write Ahead Logging are not restricted to a database. They are used by file systems as well. Logging faces challenges when performance is desired. To enable so called fast path to boost performance, database systems uses a "direct, steal  / not-force" mode. This rule states that
1) the data objects are updated in place
2) unpinned buffer pool frames can be "stolen"
3) buffer pool pages need not be forced or flushed
The log manager then has to handle both the undoing the flushes of stolen pages from aborted transactions and redoing the changes to not-forced pages of committed transactions.
Another challenge is to keep the log records small. The log manger could log the logical operations for optimization instead of the physical operations but redo and undo are onerous. In practice, physical operations are logged to support redo while logical operations are used to support  undo. This is how the crash state is detected during recovery and the transaction from that point are rolled back. Log records are written with incrementing log sequence numbers.
To optimize crash recovery, not all of the history is replayed. The start point for the recovery is chosen as the oldest of these two log records 1) one describing the change to the oldest dirty page in the buffer pool and 2) the log record describing the start of the oldest transaction in the system.  This start point is called the recovery LSN and is computed at what is called checkpoints.  Since this computation could take some time,  frequent checkpoints are suggested. Another way to make this efficient is to write out the buffer pool pages asynchronously instead of synchronously. Logging and recovery work not only on data pages but a variety of other information related to internal data structures.

Saturday, January 19, 2013

drive towards in-memory applications

I frankly don't get the drive towards in-memory applications. Please don't mistake me. I admire the thinking and the advantages. Faster running times, ability to use code generation, variety of data stores to use, code only application, support from solid state devices, etc etc are all great and I don't have any buts. In fact, my confusion is merely to say should there be an ecosystem to support such applications. We have an ecosystem of applications using external storage and processing. And yes we have .Net. Unless there is value only for very specific dedicated technology such as for niche segments, should there be something else to support these ?

Friday, January 18, 2013

Asp.net membership providers

Asp.Net uses a provider model design pattern - a different back-end provider can be "plugged in" to change the mechanism used to save and retrieve data. .Net abstracts the actual storage mechanism from the classes that manipulate the data. The provider class is the one that stores the data on behalf of the other classes that manipulate the data. For example, the membership class uses a secondary class called a membership provider that actually knows the details of a particular data store and implements all the supporting logic to read and write data to/from int. Two built in providers are available for the membership system. One is the SqlMembershipProvider and another is the ActiveDirectoryMemberhsipProvider. This uses the LDAP proto col to communicate with the PDC server.  This protocol is used for accessing and maintaining distributed directory information services over Internet protocol. Directory Services provide a hierarchical organization of members. LDAPv3 allows the use of Transport Layer Security for a secure connection. Another alternative mechanism of securing LDAP connection is the SSL Tunnel but this is now retired with LDAPv2. LDAP supports operations such as Bind, Search, Compare, Add, Delete, Modify, Abandon, Extend and Unbind. LDAP itself is a binary protocol and entries are specified with the LDAP data interchange format.

Thursday, January 17, 2013

Hadoop

Hadoop consists of a data storage and a data processing service. The former is named HDFS, short for Hadoop Distributed File System. The latter is named MapReduce and uses a high performance parallel data processing technique. On this framework is built a database named Hbase, a data warehouse named Hive, and a query language named Pig. Hadoop scales horizontally in that additional commodity hardware can be added without interruption and node failures can be compensated with redundant data. It does not guarantee ACID properties and supports forward only parsing. Hadoop is used to store unstructured data where data is stored in delimited flat files where the column names, column count and column data types don't matter. Data is retrieved with code in two steps - a Map function and a Reduce function. The Map function selects keys from each line and the values to hold resulting in a big hashtable. A Reduce function that aggregates results. Together these operations gives a blob of mapped and reduced data. Writing this code for MapReduce is easier with PIG queries. This key value set is stored in the Hbase store and this is the NOSql (read as 'not only SQL'). Hbase stores key-value as columns in a column family and each row can have more than one column family. Each row need not have the same number of columns. Hive is a data warehouse system. It uses a hive query language on joins of Hbase tables. SQL Server has a SCOOP connector to Hadoop which makes data transfer easy between HDFS and RDBMS. SCOM, AD and BI tools are also being integrated with Hadoop. Hadoop uses a user account named Isotope on all the windows nodes for running jobs.

Thursday, January 10, 2013

Signed requests for Amazon Web Services API



Signed requests for Amazon Web Services API
Amazon AWS APIs require requests to be signed. By signing the requests now carry non-readable signature or hash. The hash is computed from the operation and timestamp so as to make each api call differentiated. The hash is computed with a secret by an algorithm that is specified upfront. Typically these are Hash Based Message Authentication Code (HMAC) and as an example the SHA256 hash function that produces a 256 bit hash can be used. Both the AccessKeyId and the secret are issued separately to each user at the time of his or her account registration for use of these APIs.  This signature, the timestamp and the AccessKeyId are all specified in the SOAP header or REST URI. These are included in the SOAP header by the message inspector. This message inspector is registered with the EndpointBehavior for the client. The EndpointBehaviour is in turn returned by the BehaviourExtensionElement. All of these are System.ServiceModel namespace types and can be specified in the configuration file itself along with the address, binding and contract. The contract can be created by the WCF service utility by pointing it to the wsdl of the service that should be available online as the API. This configuration helps in instantiating a proxy and for making direct calls to the API.
Here’s an example of REST AWS API

Wednesday, January 9, 2013

Tor project

If you wanted to be anonymous on the web, use different IP address, encyrpt your internet traffic and use dynamic routing and proxies, download the browser from https://www.torproject.org and start using it.

Saturday, January 5, 2013

difference between ip spoofing and web proxy

IP spoofing lets you conceal the ip address of the sender or impersonate another host in a system. If you want to spoof your ip address to your ISP provider, you could try IP spoofing but more than likely your ISP will reject these packets.
A web proxy on the other hand sits in between the ISP and the website you are connecting to. You are more likely to be anonymous to the website you connect if you use a web proxy.

Thursday, January 3, 2013

XmlReader

The Read method of the XmlReader reads the next node from the stream
The MoveToElement method of the XmlReader moves to the element that contains the current attribute node