Cluster computing

Sunday, December 15, 2019

Pravega serves as a stream store. Its control path is available at 9090 port in standalone mode with REST API. The data path is over Flink connector to segment store port 6000
The netty wire commands to segment store are best suited for FlinkPravegaReader and FlinkPravegaWriter as demonstrated in https://1drv.ms/w/s!Ashlm-Nw-wnWvEMNlrUUJvmgd5UY?e=bnWCdU
The REST API data path can make it simpler to send data to Pravega over HTTP. It just needs to translate a POST request data to a netty WireCommand or it could bridge http to netty over even higher level as shown in https://github.com/ravibeta/JavaSamples/tree/master/tcp-connector-pravega-workshop Generally lower levels are preferred internally for performance and efficiency.
At the minimum, there needs to be an HTTP Get and Post method corresponding to the read and write operation on the data path involving a stream. The create,update and delete of the stream fall in the control path and are already provided as REST APIs by the metadata controller.
The Post method implementation for example may look like this:
@Override
public CompleteableFuture<Void> createEvent(String scopeName, String streamName, String message) {
final ClientFactoryimpl clientFactory = new ClientFactoryImpl(scopeName, this);
final Serializer<String> serializer = new JavaSerializer<>();
final Random random = new Random();
final Supplier<String> keyGenerator = () -> String.valueOf(random.nextInt());
EventStreamWriter<String> writer = clientFactory.createEventWriter(streamName, serializer, EventWriterConfig.builder().build());
return writer.writeEvent(keyGenerator.get().message);
}
The Get method may similarly utilize a suitable stream event reader.
We will talk about lower-level bridging of HTTP requests with Pravega wire protocol next assuming that the Get and Post method implementations may be wired up to the service and exposed via the “/v1/events” @Path annotation. Yes, and now returning to the wire protocol, Pravega has its own and the post request data may directly translate to the self-contained messages in the wire protocol or it could choose to make use of request headers for the wire protocol fields and the use of entire request data as the payload. There will be two headers corresponding to the message type and length in the wire protocol
Therefore, we not only see a need for using request headers and parameters to suitably capture all the fields of the metdata and the data enclosed within the message but also see that the message itself varies a lot across requests and responses.
Let us take a look at some of these messages from the wire protocol:
The request type messages include the following:
1) Read Segment request type which is equivalent to our typical Get method usage. This has the following fields:
• Segment
• Offset
• SuggestedLength
• DelegationToken
• RequestId
2) Setup Append request type to establish a connection to the host which includes the following fields:
• RequestID
• WriterID
• Segment
• DelegationToken
3) AppendBlock request type which includes the following fields:
• WriterID
• Data
• RequestID
4) Partial Event request type to be applied the end of an Append block which did not fully fit within the Append block
5) the Event request for the full data that completely fits within a block
6) the Segment attribute request that takes a
• RequestID
• SegmentName
• AttributeID
• DelegationToken
7) the ReadTable request where there is a tableKey involved in the list below:
• RequestID
• Segment
• DelegationToken
All these request types are also paired with their corresponding response types.
Looking at the approach above where the messages are all existing and currently being processed by the wire protocol, the http layer may simply encapsulate in the request body with a PUT method while retaining only the message type and message length as headers. This can then send the response body directly to the lower layer as a WireCommand which can parse itself from the bytes. The other approach could involve listing all the distinct fields from all the messages and choosing an appropriate header name for each. This allows the request to include the data as the entire payload but will require its own processor for fulfilling the requests with responses. It would therefore appear that having a segment postion based read and an append only based write of typed events would be sufficient to allow data transmission over HTTP and significant expansion of audience for the store.

Cluster computing

Sunday, December 15, 2019

No comments:

Post a Comment