R
- The type of the streampublic interface CacheStream<R> extends Stream<R>
Stream
that has additional operations to monitor or control behavior when used from a Cache
. Note that
you may only use these additional methods on the CacheStream before any intermediate operations are performed as
a Stream
is returned from those methods.
Whenever the iterator or spliterator methods are used the user must close the Stream
that the method was invoked on after completion of its operation. Failure to do so may cause a thread leakage if
the iterator or spliterator are not fully consumed.
When using stream that is backed by a distributed cache these operations will be performed using remote distribution controlled by the segments that each key maps to. All intermediate operations are lazy, even the special cases described in later paragraphs and are not evaluated until a final terminal operation is invoked on the stream. Essentially each set of intermediate operations is shipped to each remote node where they are applied to a local stream there and finally the terminal operation is completed. If this stream is parallel the processing on remote nodes is also done using a parallel stream.
Parallel distribution is enabled by default for all operations except for iterator()
&
spliterator()
. Please see sequentialDistribution()
and
parallelDistribution()
. With this disabled only a single node will process the operation
at a time (includes locally).
Rehash aware is enabled by default for all operations. Any intermediate or terminal operation may be invoked
multiple times during a rehash and thus you should ensure the are idempotent. This can be problematic for
forEach(Consumer)
as it may be difficult to implement with such requirements, please see it for
more information. If you wish to disable rehash aware operations you can disable them by calling
disableRehashAware()
which should provide better performance for some operations. The
performance is most affected for the key aware operations iterator()
,
spliterator()
, forEach(Consumer)
. Disabling rehash can cause
incorrect results if the terminal operation is invoked and a rehash occurs before the operation completes. If
incorrect results do occur it is guaranteed that it will only be that entries were missed and no entries are
duplicated.
Any stateful intermediate operation requires pulling all information up to that point local to operate properly. Each of these methods may have slightly different behavior, so make sure you check the method you are utilizing.
An example of such an operation is using distinct intermediate operation. What will happen
is upon calling the terminal operation a remote retrieval operation will be ran using all of
the intermediate operations up to the distinct operation remotely. This retrieval is then used to fuel a local
stream where all of the remaining intermediate operations are performed and then finally the terminal operation is
applied as normal. Note in this case the intermediate iterator still obeys the
distributedBatchSize(int)
setting irrespective of the terminal operator.
Modifier and Type | Interface and Description |
---|---|
static interface |
CacheStream.SegmentCompletionListener
Functional interface that is used as a callback when segments are completed.
|
Stream.Builder<T>
Modifier and Type | Method and Description |
---|---|
<R1,A> R1 |
collect(Collector<? super R,A,R1> collector) |
CacheStream<R> |
disableRehashAware()
Disables tracking of rehash events that could occur to the underlying cache.
|
Stream<R> |
distinct() |
CacheStream<R> |
distributedBatchSize(int batchSize)
Controls how many keys are returned from a remote node when using a stream terminal operation with a distributed
cache to back this stream.
|
CacheStream<R> |
filterKeys(Set<?> keys)
Filters which entries are returned by only returning ones that map to the given key.
|
CacheStream<R> |
filterKeySegments(Set<Integer> segments)
Filters which entries are returned by what segment they are present in.
|
void |
forEach(Consumer<? super R> action) |
Iterator<R> |
iterator() |
Stream<R> |
limit(long maxSize) |
CacheStream<R> |
parallelDistribution()
This would enable sending requests to all other remote nodes when a terminal operator is performed.
|
CacheStream<R> |
segmentCompletionListener(CacheStream.SegmentCompletionListener listener)
Allows registration of a segment completion listener that is notified when a segment has completed
processing.
|
CacheStream<R> |
sequentialDistribution()
This would disable sending requests to all other remote nodes compared to one at a time.
|
Stream<R> |
skip(long n) |
Stream<R> |
sorted() |
Stream<R> |
sorted(Comparator<? super R> comparator) |
Spliterator<R> |
spliterator() |
CacheStream<R> |
timeout(long timeout,
TimeUnit unit)
Sets a given time to wait for a remote operation to respond by.
|
allMatch, anyMatch, builder, collect, concat, count, empty, filter, findAny, findFirst, flatMap, flatMapToDouble, flatMapToInt, flatMapToLong, forEachOrdered, generate, iterate, map, mapToDouble, mapToInt, mapToLong, max, min, noneMatch, of, of, peek, reduce, reduce, reduce, toArray, toArray
close, isParallel, onClose, parallel, sequential, unordered
CacheStream<R> sequentialDistribution()
Parallel distribution is enabled by default except for iterator()
&
spliterator()
CacheStream<R> parallelDistribution()
Parallel distribution is enabled by default except for iterator()
&
spliterator()
CacheStream<R> filterKeySegments(Set<Integer> segments)
Stream.filter(Predicate)
method as this can control what nodes are
asked for data and what entries are read from the underlying CacheStore if present.segments
- The segments to use for this stream operation. Any segments not in this set will be ignored.CacheStream<R> filterKeys(Set<?> keys)
Stream.filter(Predicate)
if any keys must be retrieved remotely or if a
cache store is in use.keys
- The keys that this stream will only operate on.CacheStream<R> distributedBatchSize(int batchSize)
iterator()
, spliterator()
,
forEach(Consumer)
. Please see those methods for additional information on how this value
may affect them.
This value may be used in the case of a a terminal operator that doesn't track keys if an intermediate
operation is performed that requires bringing keys locally to do computations. Examples of such intermediate
operations are sorted()
, sorted(Comparator)
,
distinct()
, limit(long)
, skip(long)
This value is always ignored when this stream is backed by a cache that is not distributed as all values are already local.
batchSize
- The size of each batch. This defaults to the state transfer chunk size.CacheStream<R> segmentCompletionListener(CacheStream.SegmentCompletionListener listener)
This method is designed for the sole purpose of use with the iterator()
to allow for
a user to track completion of segments as they are returned from the iterator. Behavior of other methods
is not specified. Please see iterator()
for more information.
Multiple listeners may be registered upon multiple invocations of this method. The ordering of notified listeners is not specified.
listener
- The listener that will be called back as segments are completed.CacheStream<R> disableRehashAware()
Most terminal operations will run faster with rehash awareness disabled even without a rehash occuring. However if a rehash occurs with this disabled be prepared to possibly receive only a subset of values.
CacheStream<R> timeout(long timeout, TimeUnit unit)
If a timeout does occur then a TimeoutException
is thrown from the terminal
operation invoking thread or on the next call to the Iterator
or Spliterator
.
Note that if a rehash occurs this timeout value is reset for the subsequent retry if rehash aware is enabled.
timeout
- the maximum time to waitunit
- the time unit of the timeout argumentvoid forEach(Consumer<? super R> action)
This operation is performed remotely on the node that is the primary owner for the key tied to the entry(s) in this stream.
NOTE: This method while being rehash aware has the lowest consistency of all of the operators. This
operation will be performed on every entry at least once in the cluster, as long as the originator doesn't go
down while it is being performed. This is due to how the distributed action is performed. Essentially the
distributedBatchSize(int)
value controls how many elements are processed per node at a time
when rehash is enabled. After those are complete the keys are sent to the originator to confirm that those were
processed. If that node goes down during/before the response those keys will be processed a second time.
This method is ran distributed by default with a distributed backing cache. However if you wish for this
operation to run locally you can use the stream().iterator().forEachRemaining(action)
for a single
threaded variant. If you
wish to have a parallel variant you can use StreamSupport.stream(Spliterator, boolean)
passing in the spliterator from the stream. In either case remember you must close the stream after
you are done processing the iterator or spliterator..
Iterator<R> iterator()
Usage of this operator requires closing this stream after you are done with the iterator. The preferred usage is to use a try with resource block on the stream.
This method has special usage with the CacheStream.SegmentCompletionListener
in
that as entries are retrieved from the next method it will complete segments.
This method obeys the distributedBatchSize(int)
. Note that when using methods such as
Stream.flatMap(Function)
that you will have possibly more than 1 element mapped to a given key
so this doesn't guarantee that many number of entries are returned per batch.
Note that the Iterator.remove()
method is only supported if no intermediate operations have been
applied to the stream and this is not a stream created from a Cache.values()
collection.
iterator
in interface BaseStream<R,Stream<R>>
Spliterator<R> spliterator()
Usage of this operator requires closing this stream after you are done with the spliterator. The preferred usage is to use a try with resource block on the stream.
spliterator
in interface BaseStream<R,Stream<R>>
Stream<R> sorted()
This operation is performed entirely on the local node irrespective of the backing cache. This
operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior.
Beware this means it will require having all entries of this cache into memory at one time. This is described in
more detail at CacheStream
Any subsequent intermediate operations and the terminal operation are also performed locally.
Stream<R> sorted(Comparator<? super R> comparator)
This operation is performed entirely on the local node irrespective of the backing cache. This
operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior.
Beware this means it will require having all entries of this cache into memory at one time. This is described in
more detail at CacheStream
Any subsequent intermediate operations and the terminal operation are then performed locally.
Stream<R> limit(long maxSize)
This intermediate operation will be performed both remotely and locally to reduce how many elements are sent back from each node. More specifically this operation is applied remotely on each node to only return up to the maxSize value and then the aggregated results are limited once again on the local node.
This operation will act as an intermediate iterator operation requiring data be brought locally for proper
behavior. This is described in more detail in the CacheStream
documentation
Any subsequent intermediate operations and the terminal operation are then performed locally.
Stream<R> skip(long n)
This operation is performed entirely on the local node irrespective of the backing cache. This
operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior.
This is described in more detail in the CacheStream
documentation
Depending on the terminal operator this may or may not require all entries or a subset after skip is applied to be in memory all at once.
Any subsequent intermediate operations and the terminal operation are then performed locally.
Stream<R> distinct()
This operation will be invoked both remotely and locally when used with a distributed cache backing this stream.
This operation will act as an intermediate iterator operation requiring data be brought locally for proper
behavior. This is described in more detail in the CacheStream
documentation
This intermediate iterator operation will be performed locally and remotely requiring possibly a subset of all elements to be in memory
Any subsequent intermediate operations and the terminal operation are then performed locally.
<R1,A> R1 collect(Collector<? super R,A,R1> collector)
Note when using a distributed backing cache for this stream the collector must be marshallable. This
prevents the usage of Collectors
class. However you can use the
CacheCollectors
static factory methods to create a serializable wrapper, which then
creates the actual collector lazily after being deserialized. This is useful to use any method from the
Collectors
class as you would normally.
collect
in interface Stream<R>
R1
- collected typeA
- intermediate collected type if applicablecollector
- CacheCollectors
Copyright © 2015 JBoss, a division of Red Hat. All rights reserved.